What hashing function does Java use to implement Hashtable class?
When a key is added to or requested from a HashMap in OpenJDK, the flow of execution is the following:
- The key is transformed into a 32-bit value using the developer-defined
hashCode()
method. - The 32-bit value is then transformed by a second hash function (of which Andrew's answer contains the source code) into an offset inside the hash table. This second hash function is provided by the implementation of HashMap and cannot be overridden by the developer.
- The corresponding entry of the hash table contains a reference to a linked list or null, if the key does not yet exist in the hash table. If there are collisions (several keys with the same offset), the keys together with their values are simply collected in a singly linked list.
If the hash table size was chosen appropriately high, the number of collisions will be limited. Thus, a single lookup takes only constant time on average. This is called expected constant time. However, if an attacker has control over the keys inserted into a hash table and knowledge of the hash algorithm in use, he can provoke a lot of hash collisions and therefore force linear lookup time. This is why some hash table implementations have been changed recently to include a random element that makes it harder for an attacker to predict which keys will cause collisions.
Some ASCII art
key.hashCode()
|
| 32-bit value
| hash table
V +------------+ +----------------------+
HashMap.hash() --+ | reference | -> | key1 | value1 | null |
| |------------| +----------------------+
| modulo size | null |
| = offset |------------| +---------------------+
+--------------> | reference | -> | key2 | value2 | ref |
|------------| +---------------------+
| .... | |
+----------------+
V
+----------------------+
| key3 | value3 | null |
+----------------------+
Hashing in general is divided into two steps: a. HashCode b. Compressing
In step a. an integer corresponding to your key is generated. This can be modified by you in Java.
In step b. a compression technique is applied by Java to map the integer returned by step a. to a slot in the hashmap or hashtable. This compression technique cannot be changed.
According to hashmap's source(java version < 8), every hashCode is hashed using the following method:
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
The reason every hashCode is hashed again is to further prevent a collision (see comments above)
HashMap also uses a method to determine the index of a hash code(java version < 8) (since length is always a power of 2, you can use & instead of %):
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}
The put method looks something like:
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
The purpose of a hash code is to provide a unique integer representation for a given object. It makes sense, then, that Integer's hashCode method simply returns the value because each value would be unique to that Integer object.
Additional Ref:
HashMap for java8
HashMap for java11
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
This is the latest hash function used by hashMap class in java