Compare directories but not content of files
rsync, by default, compares only file metadata. that means timestamp, size, and attributes. among others. but not content of files.
rsync -n -a -i --delete source/ target/
explanation:
-n
do not actually copy or delete <-- THIS IS IMPORTANT!!1-a
compare all metadata of file like timestamp and attributes-i
print one line of information per file--delete
also report files which are not in source
note: it is important to append the directory names with a slash. this is an rsync thing.
if you also want to see lines printed for files that are identical then provide -i
twice
rsync -n -a -ii --delete source/ target/
example output:
*deleting removedfile (file in target but not in source)
.d..t...... ./ (directory with different timestamp)
>f.st...... modifiedfile (file with different size and timestamp)
>f+++++++++ newfile (file in source but not in target)
.f samefile (file that has same metadata. only with -ii)
remember that rsync only compares metadata. that means if the file content changed but metadata stayed the same then rsync will report that file is same. this is an unlikely scenario. so either trust that when metadata is same then data is same, or you have to compare file data bit by bit.
bonus: for progress information see here: Estimate time or work left to finish for rsync?
Use the -q
(--brief
) option with diff -r
(diff -qr
). From the info
page for GNU diff
:
1.6 Summarizing Which Files Differ
When you only want to find out whether files are different, and you don't care what the differences are, you can use the summary output format. In this format, instead of showing the differences between the files,
diff' simply reports whether files differ. The
--brief' (`-q') option selects this output format.This format is especially useful when comparing the contents of two directories. It is also much faster than doing the normal line by line comparisons, because `diff' can stop analyzing the files as soon as it knows that there are any differences.
This will not compare line by line, but rather the file as a whole, which greatly speeds up the processor (what' you're looking for).
Here's a quick python script that will check that the filenames, mtimes, and file sizes are all the same:
import os
import sys
def getStats(path):
for pathname, dirnames, filenames in os.walk(path):
for filename in ( os.path.join(pathname, x) for x in filenames ):
stat = os.stat(filename)
yield filename[len(path):], stat.st_mtime, stat.st_size
sys.exit(tuple(getStats(sys.argv[1])) != tuple(getStats(sys.argv[2])))