Why are end-of-file and end-of-input-signal treated differently by sha256sum?
The difference is the newline. First, let's just collect the sha256sums of abc
and abc\n
:
$ printf 'abc\n' | sha256sum
edeaaff3f1774ad2888673770c6d64097e391bc362d7d6fb34982ddf0efd18cb -
$ printf 'abc' | sha256sum
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad -
So, the ba...ad
sum is for the string abc
, while the ed..cb
one is for abc\n
. Now, if your file is giving you the ed..cb
output, that means your file has a newline. And, given that "text files" require a trailing newline, most editors will add one for you if you create a new file.
To get a file without a newline, use the printf
approach above. Note how file
will warn you if your file has no newline:
$ printf 'abc' > file
$ file file
file: ASCII text, with no line terminators
And
$ printf 'abc\n' > file2
$ file file2
file2: ASCII text
And now:
$ sha256sum file file2
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad file
edeaaff3f1774ad2888673770c6d64097e391bc362d7d6fb34982ddf0efd18cb file2
sha256sum # enter, to read input from stdin abc ^D
so I tried inserting
^D
twice this time (instead of using newline)
When you press ^D
(VEOF
) on a tty in canonical mode (the default in any command line window, xterm, etc), the terminal driver ("line discipline") immediately makes available the data already buffered to the process reading from the tty, without waiting for a newline.
When you enter abc
, <newline>
, then ^D
, sha256sum
will read the "abc\x0a"
string (i.e. terminated by a LF
) after the <newline>
, and the empty string ""
(i.e. a read of size 0) after the ^D
, which sha256sum
will interpret as end-of-file.
When you enter abc
, then ^D
twice, sha256sum
will read the "abc"
string after the first ^D
, and then again the empty string ""
after the second ^D
.
So the output will have an extra newline in the former case, and the sha256sum
checksum will be different.
In the case of a regular file, sha256sum
will keep reading until it reaches the end-of-file, where, just in the two cases above, a read will return an empty string. The situation is similar, and sha256
is completely unaware that its input is a terminal, pipe or regular file.