- Scala 93.3%
- Shell 6.7%
| src | ||
| .editorconfig | ||
| .gitignore | ||
| .mill-version | ||
| .scalafmt.conf | ||
| build.sc | ||
| mill | ||
| README.md | ||
Fakesync
Fakesync is a decentralized file synchronizer based on the Vector Time Pair algorithm.
Features
- File syncing: synchronizes a file tree directly between machines;
- Decentralized: no need for a central server;
- Sync anytime anywhere: you can run a sync anytime between any set of machines, on any subtree;
- Bandwith saving (not really): utilizes the rsync protocol to send modified portions only, with gzip over all communications;
- History tracking: records versions of a file to determine the versions of the same file;
- Conflict reporting: reports a conflict when two machines in a sync modifies the same file, or otherwise creates a merge conflict, reports the conflict and asks for the user;
- Merge tracking: the same conflict would be only reported once, given that the resolution is propagated;
- Tracking file moves/renames: knows about file rename and would not treat a rename as a deletion and a creation.
- File ignoring: you can tell it to exclude a path regex;
- Client-Server architecture: uses an intermediary to coordinate a sync.
Implementation
The program is a one-shot program, not a daemon. Metadata is persisted using external databases. The program is ran on each sync.
The main program is a coordinator for the sync process. Typically it connects to a remote RPC server on the remote machine and use a "fake" RPC mechanism on the local side; but it could also connect to two RPC servers or use two local instances, which is useful for maintaining locally duplicated files.
The virtual filesystem (vfs) layer permits easy customization of filesystem logic; therefore support could easily be added for network filesystem protocols like S3 or WebDAV. The database layer is also abstracted to allow different database implementations to be used.
A file is represented as a Stat structure, indexed by inode numbers. An inode number is an authority machine ID and a machine-local numerical identifier. A file may have more than one inode numbers, if two files are initially numbered by different machines but later on it found out that these two files are in fact the same file. Typically it happens when the same file was created by some external program on both of the two machines. In these cases the smallest inode number in the set is used for the primary key.
The sync process has 4 stages:
- Starts sync on a given path (well, actually inode);
- Updates file stat (metadata) on both machines;
- Calculates action to perform, possibly reporting a conflict;
- Copies file data for regular files, or recurses into directories.
As a graph:
sync root -> sync queue <-------------------+
| stat |
+----------+----------+ |
v v |
stat A stat B |
+----------+----------+ |
| triage |
v |
+ one of: -----------------------------+ |
| no action needed | |
| conflict | |
| copy A to B / copy B to A | |
| create on A / delete on B | |
| create on B / delete on A | |
| send directory children to sync queue | |
+---------------------------------------+ |
| take action |
+--------------------------+