hashCode uniqueness

Given a reasonable collection of objects, having two with the same hash code is quite likely. In the best case it becomes the birthday problem, with a clash with tens of thousands of objects. In practice objects a created with a relatively small pool of likely hash codes, and clashes can easily happen with merely thousands of objects.

Using memory address is just a way of obtaining a slightly random number. The Sun JDK source has a switch to enable use of a Secure Random Number Generator or a constant. I believe IBM (used to?) use a fast random number generator, but it was not at all secure. The mention in the docs of memory address appears to be of a historical nature (around a decade ago it was not unusual to have object handles with fixed locations).

Here's some code I wrote a few years ago to demonstrate clashes:

class HashClash {
    public static void main(String[] args) {
        final Object obj = new Object();
        final int target = obj.hashCode();
        Object clash;
        long ct = 0;
        do {
            clash = new Object();
            ++ct;
        } while (clash.hashCode() != target && ct<10L*1000*1000*1000L);
        if (clash.hashCode() == target) {
            System.out.println(ct+": "+obj+" - "+clash);
        } else {
            System.out.println("No clashes found");
        }
    }
}

RFE to clarify docs, because this comes up way too frequently: CR 6321873


Is it possible?

Yes.

Does it happen with any reasonable degree of frequency?

No.


I think the docs for object's hashCode method state the answer.

"As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)"


Think about it. There are an infinite number of potential objects, and only 4 billion hash codes. Clearly, an infinity of potential objects share each hash code.

The Sun JVM either bases the Object hash code on a stable handle to the object or caches the initial hash code. Compaction during GC will not alter the hashCode(). Everything would break if it did.

Tags:

Java

Hashcode