Junit test fails after exchanging implementation with stream API, why?

TL;DR your test is broken, fix that.

First of all this is more easy to re-produce with:

List<String> list = ImmutableList.of("Kumar", "Kumar", "Jens");

public static Map<String, Long> getValueItemOccurrences1(List<String> list) {
    return list
        .stream()
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

public static Map<String, Long> getValueItemOccurrences2(List<String> list) {
    Map<String, Long> occurrencesOfValueItems = new HashMap<>();

    list.forEach(item -> {
        if (occurrencesOfValueItems.containsKey(item)) {
            occurrencesOfValueItems.put(item, occurrencesOfValueItems.get(item) + 1);
        } else {
            occurrencesOfValueItems.put(item, 1L);
        }
    });

    return occurrencesOfValueItems;
}

The problem is that after the internal HashMap::hash (also called re-hash) and getting the last bits that actually matter when deciding which bucket to choose, they have the same values:

    System.out.println(hash("Kumar".hashCode()) & 15);
    System.out.println(hash("Jens".hashCode()) & 15);

In simpler words, a HashMap decides where to put an entry (bucket is chosen) based on the hashCode of your entries. Well, almost, once the hashCode is computed, internally there is another hash done - to better disperse entries. That final int value of the hashCode is used to decide the bucket. When you create a HashMap with a default capacity of 16 (via new HashMap for example), only the last 4 bits matter where an entry will go (that is why I did the & 15 there - to see the last 4 bits).

where hash is :

// xor first 16 and last 16 bits
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Now, it turns out that ["Kumar" and "Jens"] or ["Xavier", "Kenneth", "Samuel"] have the same last 4 digits after the algorithm above is applied (3 in the first case and 1 in the second case).

Now that we know this info, this actually can be simplified even further:

Map<String, Long> map = new HashMap<>();
map.put("Kumar", 2L);
map.put("Jens", 1L);

System.out.println(map); // {Kumar=2, Jens=1}

map = new HashMap<>();
map.computeIfAbsent("Kumar", x -> 2L);
map.computeIfAbsent("Jens", x -> 1L);
System.out.println(map); // {Jens=1, Kumar=2}

I've used map.computeIfAbsent because this is what Collectors.groupingBy is using under the hood.

It turns out that put and computeIfAbsent, put elements in the HashMap using a different way; this is totally allowed as a Map does not have any order anyway - and these elements end up in the same bucket anyway, which is the import part. So test your code, key by key, the previous testing code was broken.

This is even funner reading if you want:

HashMap::put will add elements in a Linked fashion (until Tree entries are created), so if you have one element existing, all others will be added like:

one --> next --> next ... so on.

elements are appended to the end of this queue as they come in to the put method.

On the other hand computeIfAbsent is a bit different, it adds elements to the beginning of the queue. If we take the example above, first Xavier is added. Then, when Kenneth is added, becoming the first:

 Kenneth -> Xavier // Xavier was "first"

When Samuel is added, it becomes the first:

 Samuel -> [Kenneth -> Xavier]

Junit test fails after exchanging implementation with stream API, why?

Tags:

Hashmap

Java 8

Java Stream

Related

Recent Posts