Favorite rsync tips and tricks
Solution 1:
Try to use rsync version 3 if you have to sync many files! V3 builds its file list incrementally and is much faster and uses less memory than version 2.
Depending on your platform this can make quite a difference. On OSX version 2.6.3 would take more than one hour or crash trying to build an index of 5 million files while the version 3.0.2 I compiled started copying right away.
Solution 2:
Using --link-dest
to create space-efficient snapshot based backups, whereby you appear to have multiple complete copies of the backedup data (one for each backup run) but files that don't change between runs are hard-linked instead of creating new copies saving space.
(actually, I still use the rysnc
-followed-by-cp -al
method which achieves the same thing, see http://www.mikerubel.org/computers/rsync_snapshots/ for an oldish-but-still-very-good run down of both techniques and related issues)
The one major disadvantage of this technique is that if a file is corrupted due to disk error it is just as corrupt in all snapshots that link to that file, but I have offline backups too which would protect against this to a decent extent. The other thing to look out for is that your filesystem has enough inodes or you'll run out of them before you actually run out of disk space (though I've never had a problem with the ext2/3 defaults).
Also, never forget the very very useful --dry-run
for a little healthy paranoia, especially when you are using the --delete*
options.
Solution 3:
If you need to update a website with some huge files over a slowish link, you can transfer the small files this way:
rsync -a --max-size=100K /var/www/ there:/var/www/
then do this for the big files:
rsync -a --min-size=100K --bwlimit=100 /var/www/ there:/var/www/
rsync has lots of options that are handy for websites. Unfortunately, it does not have a built-in way of detecting simultaneous updates, so you have to add logic to cron scripts to avoid overlapping writes of huge files.
Solution 4:
I use the --existing option when trying to keep a small subset of files from one directory synced to another location.
Solution 5:
--time-limit
When this option is used rsync will stop after T minutes and exit. I think this option is useful when rsyncing a large amount of data during the night (non-busy hours), and then stopping when it is time for people to start using the network, during the day (busy hours).
--stop-at=y-m-dTh:m
This option allows you to specify at what time to stop rsync.
Batch Mode
Batch mode can be used to apply the same set of updates to many identical systems.