Does it matter if a brute force search for a password returns a collision and not the password?

Steffen's answer covers this perfectly, but I just wanted to add a few more details.

Anything that gives a match is usually fine

As he says, you generally don't care about finding the actual password, because many applications will be happy to authenticate you with any string that happens to hash to the hash value stored in the database. This is most often true for web applications, but may be less often the case in other contexts. This means that if you perform a brute force search and find something with the same hash, you can login to the account even if it isn't actually the same password. This, as implied in the question, is the nature of hashing and is a direct result of the pigeonhole principle.

Until you need the actual password

However, there may be some cases where you do want to find the original password. This would be the case if for instance a hacker stole a username/password from a valueless service and wanted to try to login as the user on a more valuable service (Facebook, banks, etc...). Since people often use the same password everywhere, then in this case you really do want the original password - not just something that hashes to the same value for a given hashing algorithm (after all, different services may use different hashing methods, and most also use cryptographic salts - h/t @Taemyr).

But the difference doesn't matter

Fortunately (for our attacker), this doesn't really matter. The reason is because an exhaustive brute force is effectively impossible. Instead a hacker will try things that are likely to be passwords (word lists, common passwords, etc...). As a counter example imagine that despite the impossibility of it you have a hash from a web service and manage to perform a brute force search on all possible 256 bit ASCII strings. You find three inputs that hash to the same value as the user's password hash:

  1. BD3EDF42F6D3AF2DAAE93313EB534
  2. 7AF7B8B8F84443872C48EC372DBD1
  3. password

Which would you guess is the actual password? The answer is clearly #3. I mean, there is technically a chance that the user just happened to pick an extremely strong password (i.e. BD3EDF42F6D3AF2DAAE93313EB534) that just happened to hash to the same value as password, but the odds of that are effectively zero.

In this sense the attacker has a nice advantage. They would prefer to find the actual password, and it turns out that because people are bad at picking random passwords, the best way to do that isn't by checking everything anyway - it's by checking things that look like passwords. This makes brute forcing searching much, much, much, much more effective, and also gives the attacker a much more useful result (the actual password, rather than some random string that happens to have the same hash).


The main goal of brute forcing is not to get the original password but to get a password that works. Thus it does not matter much if the found password was not the original one as long as it works.

It is very likely though that with efficient and intelligent brute forcing based on a dictionary of common passwords and typical modifications the resulting password will actually be the real one. This is simply because most users don't use long random passwords but common passwords with typical modifications and thus the real passwords will be found first when doing intelligent brute forcing.


Does it matter if a brute force search for a password returns a collision and not the password?

There are other answers addressing this question quite well, so I'll explore a different angle: It doesn't matter, because brute-force searches are unlikely to ever find a collision instead of the original password.

Assumptions:

  • The password is shorter than the maximum length by some number of characters.
  • The brute-force search tries every character, starting with the shortest possible password.
  • The implementation has a very low probability of collisions. This is true even for deprecated algorithms like MD5, but not for the algorithm you provided.
  • Collisions are uniformly distributed.

Let's say our password is "password", which has eight characters. The number of alphanumeric passwords between one and eight characters is around 2.219e+14.

Let's also say the maximum password length is ten. Between nine and ten characters, there are 8.528e+17 passwords, or about 3800 times as many as between one and eight. Even if we assume five collisions, the likelihood that all five are longer than our "password" is around 99.9%.

The number I've used for max password length is (hopefully) much smaller than what is typically used, and the number of collisions is greatly overestimated. In practice, finding a collision that is shorter than the actual password just doesn't happen.