Is there an easy way to replace duplicate files with hardlinks?

rdfind does exactly what you ask for (and in the order johny why lists). Makes it possible to delete duplicates, replace them with either soft or hard links. Combined with symlinks you can also make the symlink either absolute or relative. You can even pick checksum algorithm (md5 or sha1).

Since it is compiled it is faster than most scripted solutions: time on a 15 GiB folder with 2600 files on my Mac Mini from 2009 returns this

9.99s user 3.61s system 66% cpu 20.543 total

(using md5).

Available in most package handlers (e.g. MacPorts for Mac OS X).


Use the fdupes tool:

fdupes -r /path/to/folder gives you a list of duplicates in the directory (-r makes it recursive). The output looks like this:


filename1
filename2

filename3
filename4
filename5


with filename1 and filename2 being identical and filename3, filename4 and filename5 also being identical.


There is a perl script at http://cpansearch.perl.org/src/ANDK/Perl-Repository-APC-2.002/eg/trimtrees.pl which does exactly what you want:

Traverse all directories named on the command line, compute MD5 checksums and find files with identical MD5. IF they are equal, do a real comparison if they are really equal, replace the second of two files with a hard link to the first one.