How to use "cmp" to compare two binaries and find all the byte offsets where they differ?

I think cmp -l file1 file2 might do what you want. From the manpage:

-l  --verbose
      Output byte numbers and values of all differing bytes.

The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:

4531  66  63
4532  63  65
4533  64  67
4580  72  40
4581  40  55

So the first difference is at offset 4531, where file1's decimal octal byte value is 66 and file2's is 63.

Method that works for byte addition / deletion

diff <(od -An -tx1 -w1 -v file1) \
     <(od -An -tx1 -w1 -v file2)

Generate a test case with a single removal of byte 64:

for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2


<  40

If you also want to see the ASCII version of the character:

bdiff() (
  f() (
    od -An -tx1c -w1 -v "$1" | paste -d '' - -
  diff <(f "$1") <(f "$2")

bdiff file1 file2


<   40   @

Tested on Ubuntu 16.04.

I prefer od over xxd because:

  • it is POSIX, xxd is not (comes with Vim)
  • has the -An to remove the address column without awk.

Command explanation:

  • -An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
  • -w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
  • -tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
  • -v prevents asterisk repetition abbreviation * which might interfere with the diff
  • paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next
  • we use parenthesis () to define bdiff instead of {} to limit the scope of the inner function f, see also: How to define a function inside another function in bash

See also:


The more efficient workaround I've found is to translate binary files to some form of text using od.

Then any flavour of diff works fine.



