rsync directory so all changes appear atomically
You can use the --link-dest=
option. Basically you would create a new folder, all files are hard-linked to the new one. When everything is done, you can just swap the folder names and remove the old one.
It is impossible to do this 100% atomic in Linux since there is no kernel/VFS support for it. However, swapping the names is actually only 2 syscalls away so it should take way less than 1 second to complete it. It is possible only on Darwin (MAC/OSX) with the exchangedata system call on HFS filesystems.
I do something similar with rsync
backups [to disk] and I've encountered the same problem due to a daemon updating files while the backup is running.
Unlike many programs, rsync has many different error codes [See the man page bottom]. Of interest are two:
23 -- partial transfer due to error
24 -- partial transfer due to vanished source files
When rsync is doing a transfer and encounters one of these situations, it doesn't just stop immediately. It skips over and continues with the files it can transfer. At the end, it presents the return code.
So, if you get error 23/24, just rerun the rsync. The subsequent runs will go much faster, usually just transferring the missing files from the previous run. Eventually, you'll get [or should get] a clean run.
As to being atomic, I use a "tmp" dir during transfer. Then, when rsync run is clean, I rename it [atomically] to <date>
I also use the --link-dest
option, but I use that to keep delta backups (e.g. --link-dest=yesterday
for daily)
Although I've not used it myself, the --partial-dir=DIR
may keep the hidden files from cluttering up the backup directory. Be sure that DIR is on the same filesystem as your backup directory so renames will be atomic
While I do this in perl, I written a script that summarizes what I've been saying with a bit more detail/precision for your particular situation. It's in tcsh-like syntax, [untested and a bit rough], but treat it as pseudo-code to write your own bash
, perl
, python
script as you choose. Note that it has no limit on retries, but you can add that easily enough, according to your wishes.
#!/bin/tcsh -f
# repo_backup -- backup repos even if they change
#
# use_tmp -- use temporary destination directory
# use_partial -- use partial directory
# use_delta -- make delta backup
# set remote server name ...
set remote_server="..."
# directory on server for backups
set backup_top="/path_to_backup_top"
set backup_backups="$backup_top/backups"
# set your rsync options ...
set rsync_opts=(...)
# keep partial files from cluttering backup
set server_partial=${remote_server}:$backup_top/partial
if ($use_partial) then
set rsync_opts=($rsync_opts --partial-dir=$server_partial)
endif
# do delta backups
if ($use_delta) then
set latest=(`ssh ${remote_server} ls $backup_backups | tail -1`)
# get latest
set delta_dir="$backup_backups/$latest"
if ($#latest > 0) then
set rsync_opts=($rsync_opts --link-dest=${remote_server}:$delta_dir)
endif
endif
while (1)
# get list of everything to backup
# set this to whatever you need
cd /local_top_directory
set transfer_list=(.)
# use whatever format you'd like
set date=`date +%Y%m%d_%H%M%S`
set server_tmp=${remote_server}:$backup_top/tmp
set server_final=${remote_server}:$backup_backups/$date
if ($use_tmp) then
set server_transfer=$server_tmp
else
set server_transfer=$server_final
endif
# do the transfer
rsync $rsync_opts $transfer_list $server_transfer
set code=$status
# run was clean
if ($code == 0) then
# atomically install backup
if ($use_tmp) then
ssh ${remote_server} mv $backup_top/tmp $backup_backups/$date
endif
break
endif
# partial -- some error
if ($code == 23) then
continue
endif
# partial -- some files disappeared
if ($code == 24) then
continue
endif
echo "fatal error ..."
exit(1)
end