Best HashMap initial capacity while indexing a List

If you wish to avoid rehashing the HashMap, and you know that no other elements will be placed into the HashMap, then you must take into account the load factor as well as the initial capacity. The load factor for a HashMap defaults to 0.75.

The calculation to determine whether rehashing is necessary occurs whenever an new entry is added, e.g. put places a new key/value. So if you specify an initial capacity of list.size(), and a load factor of 1, then it will rehash after the last put. So to prevent rehashing, use a load factor of 1 and a capacity of list.size() + 1.

EDIT

Looking at the HashMap source code, it will rehash if the old size meets or exceeds the threshold, so it won't rehash on the last put. So it looks like a capacity of list.size() should be fine.

HashMap<Integer, T> map = new HashMap<Integer, T>(list.size(), 1.0);

Here's the relevant piece of HashMap source code:

void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    if (size++ >= threshold)
        resize(2 * table.length);
}

Guava's Maps.newHashMapWithExpectedSize uses this helper method to calculate initial capacity for the default load factor of 0.75, based on some expected number of values:

/**
 * Returns a capacity that is sufficient to keep the map from being resized as
 * long as it grows no larger than expectedSize and the load factor is >= its
 * default (0.75).
 */
static int capacity(int expectedSize) {
    if (expectedSize < 3) {
        checkArgument(expectedSize >= 0);
        return expectedSize + 1;
    }
    if (expectedSize < Ints.MAX_POWER_OF_TWO) {
        return expectedSize + expectedSize / 3;
    }
    return Integer.MAX_VALUE; // any large value
}

reference: source

From the newHashMapWithExpectedSize documentation:

Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth. This behavior cannot be broadly guaranteed, but it is observed to be true for OpenJDK 1.6. It also can't be guaranteed that the method isn't inadvertently oversizing the returned map.


The 'capacity' keyword is incorrect by definition and isn't used in the way typically expected.

By default the 'load factor' of a HashMap is 0.75, this means that when the number of entries in a HashMap reaches 75% of the capacity supplied, it will resize the array and rehash.

For example if I do:

Map<Integer, Integer> map = new HashMap<>(100);

When I am adding the 75th entry, the map will resize the Entry table to 2 * map.size() (or 2 * table.length). So we can do a few things:

  1. Change the load factor - this could impact the performance of the map
  2. Set the initial capacity to list.size() / 0.75 + 1

The best option is the latter of the two, let me explain what's going on here:

list.size() / 0.75

This will return list.size() + 25% of list.size(), for example if my list had a size of 100 it would return 133. We then add 1 to it as the map is resized if the size of it is equal to 75% of the initial capacity, so if we had a list with a size of 100 we would be setting the initial capacity to 134, this would mean that adding all 100 entries from the list would not incur any resize of the map.

End result:

Map<Integer, Integer> map = new HashMap<>(list.size() / 0.75 + 1);