rolling diffs for storage of highly similar files?

Two backup tools that can store binary diffs are rdiff-backup and duplicity. Both are based on librsync, but above that they behave quite differently. Rdiff-backup stores the latest copy and reverse diffs, while duplicity stores traditional incremental diffs. The two tools also offer a different set of peripheral features.


Lately I've been trying out storing database dumps in git. This may get impractical if your database dumps are really large, but it's worked for me for smallish databases (Wordpress sites and the like).

My backup script is roughly:

cd /where/I/keep/backups && \
mysqldump > backup.sql && \
git commit -q -m "db dump `date '+%F-%T'`" backup.sql

You could do something like this (with a.sql as your weekly backup).

mysqldump > b.sql
diff a.sql b.sql > a1.diff
scp a1.diff backupserver:~/backup/

Your diff files will become larger by the end of the week.

My suggestion though is just gzip it (use gzip -9 for maximum compression). We do this at the moment and that gives use a 59MB gz-file while the original is 639MB.