Is cmp faster than diff -q?
Prompted by @josten, I ran a comparison on the two. The code is on GitHub. In short1:
The User+Sys time taken by cmp -s
seemed to be a tad more than that of diff
in most cases. However, the Real time take was pretty much arbitrary - cmp
ahead on some, diff
ahead on some.
Summary:
Any difference in performance is pure coincidence. Use whatever you wish.
1The images are 1920x450, so do open them in a tab to see them in their full glory.
Using similar, but larger files from Anthon (100M lines, with a difference only on the last one):
yes | head -n 100000000 >aa
sed '$ s/d/e/' >ab
I get indistinguishable timings for diff -q
and cmp -s
:
/tmp% time diff -q aa ab
Files aa and ab differ
diff -q aa ab 0.04s user 0.33s system 99% cpu 0.370 total
/tmp% time cmp -s aa ab
cmp -s aa ab 0.04s user 0.36s system 99% cpu 0.403 total
cmp
is slower than cmp -s
. Presumably counting the line numbers is a significant burden.
/tmp% time cmp aa ab
aa ab differ: char 499999999, line 100000000
cmp aa ab 0.84s user 0.36s system 97% cpu 1.225 total
This is on Debian wheezy amd64, all running from RAM (on tmpfs).
cmp -s
has the advantage of being supported by all POSIX platforms and by BusyBox.
No, diff -q
seems to be faster and you can easily test that:
$ wc x1 x2
10000000 10000000 50000000 x1
10000000 10000000 50000000 x2
20000000 20000000 100000000 total
Two files with 10 million lines of 4 chars each.
$ cat x1 x2 > /dev/null
$ diff x1 x2
9999999c9999999
< abcd
---
> abce
Differing only in the one before last line.
$ time diff -q x1 x2
Files x1 and x2 differ
real 0m0.043s
user 0m0.012s
sys 0m0.031s
$ time cmp x1 x2
x1 x2 differ: byte 49999994, line 9999999
real 0m0.085s
user 0m0.048s
sys 0m0.036s
diff -q
is almost twice as fast in real time, and stays faster that way when using repeated execution.