md5sum prepends '\' to the checksum
This is documented, for Coreutils’ md5sum
:
If file contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names.
(file is the filename, not the file’s contents).
b2sum
, sha1sum
, and the various SHA-2 tools behave in the same way as md5sum
. sum
and cksum
don’t; sum
is only provided for backwards-compatibility (and its ancestors don’t produce quoted output), and cksum
is specified by POSIX and doesn’t allow this type of output.
This behaviour was introduced in November 2015 and released in version 8.25 (January 2016), with the following NEWS
entry:
md5sum
now ensures a single line per file for status on standard output, by using a '\' at the start of the line, and replacing any newlines with '\n'. This also affectssha1sum
,sha224sum
,sha256sum
,sha384sum
andsha512sum
.
The backslash at the start of the line serves as a flag: escapes in filenames are only processed if the line starts with a backslash. (Unescaping can’t be the default behaviour: it would break sums generated with older versions of Coreutils containing \\
or \n
in the stored filenames.)
Stephen Kitt's answer covers the what and I will try to cover why this change was implemented. First, someone observed that a filename containing newlines1 could result in ambiguous output. For example, consider this output:
d41d8cd98f00b204e9800998ecf8427e foo
25af89c92254a806b2e93fffd8ac1814 bar
Does this mean there were two files foo
and bar
, or only one file whose filename is "foo\n25af89c92254a806b2e93fffd8ac1814 bar"
? Granted, this latter possibility is highly unlikely, but it is possible. To resolve the ambiguity the developers chose to escape newlines with a backslash (\
). The output then becomes distinguishable. However, then there is a further ambiguity:
764efa883dda1e11db47671c4a3bbd9e foo\nbar
Does this file's name contain a newline, or a backslash followed by an n
? To resolve this we need to escape backslashes too, so that the latter case becomes:
764efa883dda1e11db47671c4a3bbd9e foo\\nbar
Finally, they elected to prepend each output line which contains such escapes with a \\
to make it easy for a parser to detect whether escaping has been done. Presumably this was done to allow parsers to handle output both from escaping versions of md5sum
and from non-escaping versions (non-GNU). The flag also means that "costly" un-escaping does not need to be done when not necessary. You can see an example of this parsing in action in md5sum.c
itself (line 382 in the linked version).
1 By newline I mean the character \n
which is sometimes also specifically referred to as a linefeed or LF; see md5sum.c
.