Is sha-1 hash always the same?

Hash functions are deterministic: same input yields the same output. Any implementation of a given hash function, regardless of the language it is implemented in, must act the same.

However, note that hash functions take sequences of bits as input. When we "hash a string", we actually convert a sequence of characters into a sequence of bits, and then hash it. There begins the trouble. Consider the string "café": among all the possible conversions to bits, all of the following are common:

63 61 66 e9                             ISO-8859-1 ("latin-1")
63 61 66 ca a9                          UTF-8
63 61 66 65 cc 81                       UTF-8 (NFD)
ef bb bf 63 61 66 ca a9                 UTF-8 (with BOM)
ef bb bf 63 61 66 65 cc 81              UTF-8 (NFD with BOM)
63 00 61 00 66 00 e9 00                 UTF-16 little-endian
00 63 00 61 00 66 00 e9                 UTF-16 big-endian
ff fe 63 00 61 00 66 00 e9 00           UTF-16 little-endian (with BOM)
fe ff 00 63 00 61 00 66 00 e9           UTF-16 big-endian (with BOM)
63 00 61 00 66 00 65 00 01 03           UTF-16 little-endian (NFD)
00 63 00 61 00 66 00 65 03 01           UTF-16 big-endian (NFD)
ff fe 63 00 61 00 66 00 65 00 01 03     UTF-16 little-endian (NFD with BOM)
fe ff 00 63 00 61 00 66 00 65 03 01     UTF-16 big-endian (NFD with BOM)

and all will yield very different hash values when processed with a given hash function. You have to be very precise about what you do when dealing with cryptographic functions; every bit counts.


I'm not exactly sure what you mean, but yes. The output of a properly written hash function should be the same regardless of language.

The only difference between the hashes of different programming languages libraries and on different platforms will be speed. Although in properly written libraries - the difference will be trivial.


Yes, the exact same "byte sequence" will always yield the exact same digest value regardless of implementation (assuming it's a correct implementation!)

The key word is this is always true for "byte sequence", but not always for "string" as you wrote. Depending on a lot of things, strings can be generated differently on different systems. There is the potential for a lot of white space or line ending differences, or ASCII vs Unicode UTF-16 encoding issues.

Also, be aware that when you display the digest value, you run into similar issues. Different implementations might represent hexadecimal digits with either upper case or lower case values, so a string equality test might fail.

Tags:

Hash

Sha