Can Java's hashCode produce same value for different strings?

A Java hash code is 32bits. The number of possible strings it hashes is infinite.

So yes, there will be collisions. The percentage is meaningless - there is an infinite number of items (strings) and a finite number of possible hashes.


YES. A lot.

Look at following pair

  • "FB" and "Ea"

can return same hash code even though the characters in it are not same.

Basically it is the sum of characters in a string multiplied by an integer.


if it is possible then what is the % of its possibility?

That is not a particularly meaningful question.

However, unless there is some systemic bias in the String::hashcode function or the way that you are generating the String objects, the probability that any two different (non-equal) String objects will have the same hash code will be 1 in 232.

This assumes that the Strings are chosen randomly from the set of all possible String values. If you restrict the set in various ways, the probability will vary from the above number. (For instance, the existence of the "FB" / "Ea" collision means that the probability of a collision in the set of all 2 letter strings is higher than the norm.)


Another thing to note is that the chance of 232 different strings chosen at random (from a much larger unbiased set of strings) having no hash collisions is vanishingly small. To understand why, read the Wikipedia page on the Birthday Paradox.

In reality, the only way you are going to get no hash collisions in a set of 232 different strings is if you select or generate the strings. Even forming the set by selecting randomly generated strings is going to be computationally expensive. To produce such a set efficiently, you would need to exploit the properties of the String::hashCode algorithm, which (fortunately) is specified.

Tags:

Java

Hashcode