MENU

How to: schedule snapshot for datasource on Pydio Cells

2024 年 02 月 14 日 • 其他

Foreword

In Pydio Cells Home Edition (aka free version), you won't be able to add/modify jobs (schedules). This includes the automatically-generated datasource snapshot jobs. If you use flat-structure datasource configuration, you should be aware that index snapshot is relatively important in allowing you to recover the full file structure in case of recovery using the datasource only. Without index, your flat-structure datasource is practically useless - it's just a huge bunch of files with random UUIDs as filenames, all sitting inside a single folder.

The problem is that, due to the limitation of Home Edition, you MUST manually trigger the snapshot job. Discussion on official forum seems to suggests that you could utilize cells binary to trigger the snapshot job, but after several hours of testing and head banging I have decided that the magic simply won't work. There's no way to trigger the snapshot job automatically. This has left me for the very final solution - the hack-hack way.

The Hack-Hack Way

When I wanted to use Siyuan with my S3, I dig through the codes and modify it, bypassing all VIP checks, and recompile it. It took me almost 2 days including setting up the github actions to compile the app. Luckily for this case, it's a lot easier.

First, we check the location of where the jobs data is kept. After several cd and ls, we can see an interesting file here at PYDIO_HOME_DIR/services/pydio.grpc.jobs:
2024-02-13T18:43:16.png
From the above picture, notice the jobs.db file. Per the official docs on https://pydio.com/en/docs/developer-guide/scheduler, it is a BoltDB file. Some googling and we got ourselves a free cli-based BoltDB manipulator: boltbrowser (https://www.reddit.com/r/golang/comments/4kw70s/excelent_cli_for_bolt_db_couldnt_find_it_googling/?rdt=60622)

Make a backup of your original jobs.db. Now, stop your pydio cells (or else the file is locked as it is in use by Pydio), and open the jobs.db using boltbrowser.
2024-02-13T18:47:58.png

Now, use arrow keys up/down/left/right to navigate/open. Let's go to jobs, and find your datasource jobs. It should have the label "Snapshot DB index", which you can check from your schedule list (you need to turn back on pydio and login to check, remember to exit boltbrowser first!)
2024-02-13T18:50:41.png
2024-02-13T18:50:55.png

So you have identified the entries for your snapshot job. Next, we will prepare the data which we are going to insert. We can refer to other jobs which has a daily/hourly schedule:
2024-02-13T18:52:26.png
2024-02-13T18:54:29.png
For example for this job, it is scheduled to be run every 24 hours at 10.30am. The schedule code is as per the following:

"Schedule": {"Iso8601Schedule": "R/2012-01-01T01:30:00.828Z/PT24H"}

A date time (UTC+0) is specified and duration at the back. We leave the date intact, and modify the time as per your liking. Let's say for my case, I want it to run everyday 6am, then it would be:

"Schedule":{"Iso8601Schedule":"R/2012-01-01T22:00:00.828Z/PT24H"}

Now, head over to your snapshot job, and ENTER to modify it:
2024-02-13T18:59:36.png
Add it after the last entry but inside the curly bracket, remember to put a comma. So in this case, add a comma after "MaxConcurrency":1 and paste your line:
2024-02-13T19:01:18.png
After done, ENTER and ESC to exit.
2024-02-13T19:02:33.png
Now your changes are saved, start pydio cells and login to check!

Before:
2024-02-13T19:03:34.png
After:
2024-02-13T19:03:53.png
2024-02-13T19:06:29.png
Snapshot autorun by job schedule:
2024-02-13T19:11:01.png
2024-02-13T19:11:22.png
2024-02-13T19:12:23.png

Profit :sunglasses:
Now stay low and keep this a secret! XD