`cp -al` snapshot whose hard links get directed to a new file when edited

That's how hardlinks work. But, there are ways around it:

A couple of options come to mind:

  • Use a filesystem with support for copy-on-write files, like btrfs. Of course, were you using btrfs, you'd just use its native snapshots... If your filesystem supports it, you can use cp --reflink=always. Unfortunately, ext4 doesn't support this.
  • Only share hardlinks across your snapshots, not with the original. That is, the first time you see a given version of a file, copy it to the snapshot. But the next time, link it to the one in the previous snapshot. (Not sure what program I used to do this—a decade ago—but searching turns up dirvish, obnam, storebackup, and rsnapshot)
  • Depending on how your files are being changed, you might be able to guarantee that a write temp/rename is used to change them, then that will break the hardlink—so the version in the snapshot will remain pristine. This is less safe, though, as bugs could corrupt your snapshot.
  • Take LVM snapshots of the entire filesystem.

Of course, there is the other option—use a proper backup system. Most all of them can manage to only backup changed files.


What you're looking for is a form of copy-on-write, where multiple files that have the same content use the same space on the disk until one of them is modified. Hard links only implement copy-on-write if the application that does the writing deletes the file and creates a new file by the same name (which is usually done by creating a new file by a different name, then moving it into place). The application you're using is evidently not doing this: it's overwriting the existing file.

Some applications can be configured to use the replacement strategy. Some applications use the replacement strategy by default, but use the overwrite strategy when they see a file with multiple hard links, precisely so as not to break the hard links. Your current snapshot technique will work if you can configure your application to replace instead of overwriting.

Fl-cow modifies programs to systematically use the replacement strategy on files with multiple hard links.

Alternatively, you may store your files on a filesystem that performs copy-on-write or deduplication, or have a snapshot feature, and not worry about hard links: Btrfs or Zfs. Depending on your partitioning scheme, using LVM snapshots may be an option.

My recommendation is to use a proper snapshot tool. Making reliable backups is surprisingly difficult. You probably want rsnapshot.


The following is a ruby script that I wrote that wraps the "cp -al" and rsync into a nice script that can be run manually or via cron. Destination can be local or remote (via ssh):

Ghetto Timemachine

The basic answer to your question, as mentioned in a previous comment, the source needs to be kept apart from the hard links. Ex, assume a daily backup of your home directory:

Source:

  • /home/flakrat

Destination:

  • /data/backup/daily
    • /monday
    • /tuesday
    • /wednesday
    • /thursday
    • ...

The hard links are created by running "cp -al" against yesterday's backup. Say it's Tuesday morning when you run it:

cd /data/backup/daily

rm -rf tuesday

cp -al monday tuesday

rsync -a --delete /home/flakrat /data/backup/daily/tuesday/