How to count differences between two files on linux?
If using Linux/Unix, what about comm -1 file1 file2
to print lines in file1 that aren't in file2, comm -1 file1 file2 | wc -l
to count them, and similarly for comm -2 ...
?
Since every output line that differs starts with <
or >
character, I would suggest this:
diff file1 file2 | grep ^[\>\<] | wc -l
By using only \<
or \>
in the script line you can count differences only in one of the files.
diff -U 0 file1 file2 | grep -v ^@ | wc -l
That minus 2 for the two file names at the top of the diff
listing. Unified format is probably a bit faster than side-by-side format.
If you want to count the number of lines that are different use this:
diff -U 0 file1 file2 | grep ^@ | wc -l
Doesn't John's answer double count the different lines?