How does hashing work?

Quick, factor 1081.

Or if you prefer, answer this: what's 23 times 47?

Which one is easier? It's easier to perform a multiplication (just follow the rules mechanically) than to recover the operands given only the product. Multiplication. (This, by the way, is the foundation of some cryptographic algorithms such as RSA.)

Cryptographic hash functions have different mathematical foundations, but they have the same property: they're easy to compute going forward (calculate H(x) given x), but practically impossible to compute going backward (given y, calculate x such that H(x) = y). In fact, one of the signs of a good cryptographic hash function is that there is no better way to find x than trying them all and computing H(x) until you find a match.

Another important property of hash functions is that two different inputs have different hashes. So if H(x1) = H(x2), we can conclude that x1 = x2. Mathematically speaking, this is impossible — if the inputs are longer than the length of the hash, there have to be collisions. But with a good cryptographic hash function, there is no known way of finding a collision with all the computing resources in the world.

If you want to understand more about cryptographic hash functions, read this answer by Thomas Pornin. Go on, I'll wait.

Note that a hash function is not an encryption function. Encryption implies that you can decrypt (if you know the key). With a hash, there's no magical number that lets you go back.

The main recommended cryptographic hash functions are SHA-1 and the SHA-2 family (which comes in several output sizes, mainly SHA-256 and SHA-512). MD5 is an older one, now deprecated because it has known collisions. Ultimately, there is no mathematical proof that they are indeed good cryptographic hash functions, only a widespread belief because many professional cryptographers have spent years of their life trying, and failing, to break them.

Ok, that's one part of the story. Now a password hash is not directly a cryptographic hash function. A password hash function (PHF) takes two inputs: the password, and a salt. The salt is randomly generated when the user picks his password, and it is stored together with the hashed password PHF(password, salt). (What matters is that two different accounts always have different salts, and randomly generating a sufficiently large salt is a good way to have this property with overwhelming probability.) When the user logs in again, the verification system reads the salt from the password database, computes PHF(password, salt), and verifies that the result is what is stored in the database.

The point of the salt is that if someone wants to crack a password, they'll have to know the hash before they can start, and they have to attack each account separately. The salt makes it impossible to perform a lot of cracking work in advance, e.g. by generating a rainbow table.

This answers (2) and (3) — the legitimate verifier and the attacker find out in the same way whether the password (entered by the user, or guessed by the attacker) is correct. A final point in the story: a good password hash function has an additional property, it must be slow. The legitimate server only needs to compute it once per login attempt, whereas an attacker has to compute it once per guess, so the slowness hurts the attacker more (which is necessary, because the attacker typically has more, specialized hardware).

If you ever need to hash passwords, don't invent your own method. Use one of the standard methods: scrypt, bcrypt or PBKDF2. 


Cryptographic hash functions are mathematical objects which can be described as "a big mixing and scrambling of some bits". They take as input a sequence of bits (possibly a very long one) and offer an output of fixed size. Roughly speaking, they are so tangled that although there is nothing secret about them (that's just deterministic code), nobody can figure out how to "invert" them (find a matching input for a given output) except by the basic method called "luck": try random inputs until a match is found.

How it may happen, scientifically, that hash functions can exist at all is a good question.

Hashing is not encryption. There is no secret, no key in hashing.

Hash functions have many uses; one of them is "password storage". A hash function looks like a good thing for password storage. We do not want to store passwords directly (otherwise an occasional peek at our databases by the attacker would give him too much information; see this blog post for a discussion); we want to store password verification tokens: something which allows for the verification of a password (that the user presents) but does not reveal the password itself. So the idea is: let's store the hash of the password. When a password is to be verified, we just compute its hash and see if it matches the stored value. But guessing the password from the hash value only is hard, since the hash function is resilient against "inversion" (see above).

Since passwords are a special kind of data (that's data which humans can remember), for proper security, we need a "strengthened" hash function:

  • We want a very slow hash function.
  • We do not want one hash function, but many distinct hash functions, so that each password will be hashed with its own hash function; this is about deterring parallel attacks. This process of turning a single hash function into many variants is called salting.

See this answer for a thorough treatment of the subject of hashing passwords.


Hashing is a function from some bit string (usually variable length) to another bit string (usually smaller, and of fixed length).

Hashing is used in databases for data retrieval, and in in-memory data structures called hash tables. It allows us to reduce arbitrary data, such as a character string or a complicated object with many fields, to a binary number which can then be used directly as an index into a sparse array to fetch the associated data (with some details for handling hash collisions).

The hashing functions used in the above manner are "cousins" of cryptographic hashing functions. They are designed to different requirements. They must be fast to compute, and achieve a good distribution.

In secure computing, cryptographic hashes are used to digest data into some representative, small bitstring. Cryptographic functions have different requirements. They are designed to be difficult to reverse (to be "trap door" or "one way" functions). Not only that, but an important requirement is that it has to be difficult to find, for a given plaintext and hash value, another plaintext which produces the same hash.

Hashing can be used not only for passwords, but as a checksum for verifying data integrity and as part of the implementation of digital signatures. To digitally sign a large document, we simply have to hash the document to produce a "digest" (a name used for the output of a hashing function, when something very long is hashed). Then just this digest is put through the public key crypto-system to produce a signature. You can see the weakness there: what if an attacker succeeds in producing a document which has the same digest? Then it looks like the original signature produced over the genuine document is actually a signature of a counterfeit document: a signature-transplanting forgery has been effectively perpetrated.

Password hashing allows systems not to store the plain text version of a password, yet enables them to verify whether the user trying to gain entry knows that password. Not only does hashing allow systems not to store the plain text passwords (which would have to be very carefully guarded) but it allows for the possibility that even if the hashes are publicly exposed, the passwords are still secure (similarly to how public key crypto systems are able to reveal public keys). Though in practice, hashes are nevertheless protected from public access: for instance /etc/shadow files on Unix-like systems, supplementing world-readable /etc/passwd files.

The hashing function is anything but random. However, randomization is employed to thwart attackers who build large dictionaries of passwords and hashes, that enable them to look up a hash code and retrieve the corresponding password.

To hash a password more securely, we can simply add some random bits to it called a "salt". Different salts added to the same password, of course, lead to different hashes (hopefully with few or no collisions).

If the random salt is, say, 32 bits wide, it means that, in theory, one password can hash in over four billion different ways, making it very impractical to have a precomputed dictionary of all possible hashes of a large number of passwords.

Of course, when the user is being authenticated, she does not know anything about this salt. That is okay because the salt is stored along with the hash in the user's profile (often, combined with the hash into a single compact bitstring). When the user's password entry is being validated, the salt is added to whatever password she entered, so that the hashing is carried out with the correct salt. If the password is correct, the hash will match, since the salt being used is the right one also, having been pulled from the user's profile.

So that is how randomness is incorporated into password hashing, while still allowing it to work.

What makes hashes hard to crack is that they are built from "trap door" or "one way" functions. In mathematics, there are many examples of such things. For instance, simple addition is a trap door. If we add some integers to produce a sum, it is impossible to recover the original numbers, knowing only the sum.

Password hashes are not encrypted passwords. If an attacker has the hash and salt of a password, and happens to guess the password, then she can easily confirm this, exactly in the same way that the login authenticator software does it: she runs the password plus salt through the hashing function and sees that the correct hash emerges.

Tags:

Hash