Why would a higher load factor in HashMap increase lookup cost?
Hash table's Load Factor is defined as
n/s, the ratio of the number of stored entries n and the size s of the table's array of buckets.
High performance of hash table is maintained when the number of collisions is low. When the load factor is high, the number of hash buckets needed to store the same number of entries remains lower, thus increasing the probability of collisions.
Here we should first understand what capacity and load factor means:
capacity : this is number of buckets in any hash table at any given point in time.
load factor : The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased
so more the load factor is more occupied a hash table could get before the capacity is increased.
- now given the best possible implementation of hashCode() only one value will go in one bucket here lookup cost will be minimum
- in worst case all values will go in same bucket and lookup cost would be maximum
- in an average case also, this will surely depend on hashCode() implementation but one more factor that will play here is load factor, as more occupied the collection will be, the more will be chances of collision and thus higher load factor will increase lookup cost in a non ideal scenario.
It has to do with how an HashTable is implemented under the hood, it uses hash codes and since the algorithm to calculate hash code is not perfect, you can have some collisions, increasing the load factor increase the probability to have collisions, and consequently reduce the lookup performance ...