How to make rsync of ~2M files from remote server performant for regular backups
After an initial sync time of 40 hours to download and sync all of the data a subsequent scan and sync of the same data (just to pull in updates) only took 6.5 hours. The command used to run the rsync
was:
rsync -a --quiet USER@REMOTE_SERVER:ROOT/FOLDER/PATH/ /LOCAL/DESTINATION
I think my large initial time for download was twofold:
The initial dataset is 270GB and ~2M files, which is a lot to scan and download over the internet (in our case we have a 100mbit synchronous connection and this was connecting to a large CDN provider)
I had the -P option enabled and -v options on the initial sync which caused a lot of local console chatter displaying every file being synced and progress information.
So, the answer here: Just use rsync
with not so many verbosity options (and --quiet
ideally) and it's quite efficient - even to huge datasets.