Is it wrong to use a hash for a unique ID?

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. Best case cause the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions doesn't distribute perfectly, so they will collide more often than the theoretical max in real life.

To answer your final question:

A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

Yeah that's true sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.

As @wayne suggests, a very used approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).


How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.

Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions


Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

Tags:

Php

Hash