What is the difference between a Hash Function and a Cryptographic Hash Function?
Every cryptographic hash function is a hash function. But not every hash function is a cryptographic hash.
A cryptographic hash function aims to guarantee a number of security properties. Most importantly that it's hard to find collisions or pre-images and that the output appears random. (There are a few more properties, and "hard" has well defined bounds in this context, but that's not important here.)
Non cryptographic hash functions just try to avoid collisions for non malicious input. Some aim to detect accidental changes in data (CRCs), others try to put objects into different buckets in a hash table with as few collisions as possible.
In exchange for weaker guarantees they are typically (much) faster.
I'd still call MD5 a cryptographic hash function, since it aimed to provide security. But it's broken, and thus no longer usable as a cryptographic hash. On the other hand when you have a non cryptographic hash function, you can't really call it "broken", since it never tried to be secure in the first place.
There are some properties that cryptographically secure hash functions strongly require, that are not so strongly required for non-cryptographically secure hash functions:
- preimage resistance (given a hash
h
it must be difficult to find a messagem
that yieldsh
when hashed - weak collision resistance (given a message
m1
it must be difficult to find a different messagem2
so thatm1
andm2
yield the same hash) - strong collision resistance (it should be difficult to find any messages
m1
andm2
that yield the same hash)
In those points, you see a lot of difficult, which is a qualitative measure instead of a quantitative one. The best answer here is feasibility: there is a fuzzy line when something becomes feasible and those lines move over time (as computation capabilities grow exponentially according to Moore's Law, once difficult problems can now be solved by your cell phone).
In general it's good practice to assume that difficult means that the time to achieve some goal is NP-complete. This means the time required to break the hash grows strongly as you increase the hash length.
Another point is, that a cryptographically secure hashing algorithm can be useful in some applications, but not in others. It depends on the model of your attacker, the nature of the information you want to protect and things like performance requirements (as a general rule, the better the cryptographic properties of a hash, the worse it's runtime behaviour).
I'd say that the two key things to understand here are:
- The term "hash function" is vague—or more precisely, polysemous: it has a "family" of meanings that are closely related but distinct. If somebody labels a function as a "hash function," the label simply doesn't tell you what properties that function must have. You have to examine the context where the term is used and the requirements of that context.
- The term "cryptographic hash function" is a slight misnomer—it looks like a description, but it has an involved technical definition that the term itself doesn't actually describe. To put it simply, there are functions like message authentication codes (MAC) that are often labeled as hash functions and offer some form of cryptographic security, but are not "cryptographic hash functions" in the conventional definition.
The term "cryptographic hash function" is conventionally used to refer to what might be better labeled as collision-resistant hash functions, which are public functions ("public" = don't require a secret key) that are required to have these three properties:
- Second preimage resistance: For a random value
m1
chosen by an honest party, it's very costly for an attacker to find any valuem2 ≠ m1
such thathash(m1) = hash(m2)
. - Preimage resistance: For a random value
h
chosen by an honest party, it's very costly for an attacker to find any valuem
such thathash(m) = h
. - Collision resistance: It's very costly for an attacker to find any pair of values
m1 ≠ m2
such thathash(m1) = hash(m2)
.
There's a fourth property that older cryptographic hash functions fail trivially but which newer ones like SHA-3 and Blake2 are designed to achieve:
- Random oracle indifferentiability: This one is all but impossible to explain briefly, but let's dumb it down to this: it's very costly for an attacker to find any non-chance correlations between the outputs of inputs of their choice.
The random oracle property (when correctly formulated) implies the three previous properties, as well as additional ones like the absence of efficient length extension attacks. (Length extensions are the most obvious reason why older hash functions like SHA-256 and SHA-512 fail the random oracle property.)