How to use "cmp" to compare two binaries and find all the byte offsets where they differ?
I think cmp -l file1 file2
might do what you want. From the manpage:
-l --verbose
Output byte numbers and values of all differing bytes.
The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:
4531 66 63
4532 63 65
4533 64 67
4580 72 40
4581 40 55
[...]
So the first difference is at offset 4531, where file1's decimal octal byte value is 66 and file2's is 63.
Method that works for byte addition / deletion
diff <(od -An -tx1 -w1 -v file1) \
<(od -An -tx1 -w1 -v file2)
Generate a test case with a single removal of byte 64:
for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2
Output:
64d63
< 40
If you also want to see the ASCII version of the character:
bdiff() (
f() (
od -An -tx1c -w1 -v "$1" | paste -d '' - -
)
diff <(f "$1") <(f "$2")
)
bdiff file1 file2
Output:
64d63
< 40 @
Tested on Ubuntu 16.04.
I prefer od
over xxd
because:
- it is POSIX,
xxd
is not (comes with Vim) - has the
-An
to remove the address column withoutawk
.
Command explanation:
-An
removes the address column. This is important otherwise all lines would differ after a byte addition / removal.-w1
puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.-tx1
is the representation you want, change to any possible value, as long as you keep 1 byte per line.-v
prevents asterisk repetition abbreviation*
which might interfere with the diffpaste -d '' - -
joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next- we use parenthesis
()
to definebdiff
instead of{}
to limit the scope of the inner functionf
, see also: How to define a function inside another function in bash
See also:
- https://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux
- https://unix.stackexchange.com/questions/59849/diff-binary-files-of-different-sizes
The more efficient workaround I've found is to translate binary files to some form of text using od
.
Then any flavour of diff
works fine.