Junit test fails after exchanging implementation with stream API, why?
TL;DR your test is broken, fix that.
First of all this is more easy to re-produce with:
List<String> list = ImmutableList.of("Kumar", "Kumar", "Jens");
public static Map<String, Long> getValueItemOccurrences1(List<String> list) {
return list
.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}
public static Map<String, Long> getValueItemOccurrences2(List<String> list) {
Map<String, Long> occurrencesOfValueItems = new HashMap<>();
list.forEach(item -> {
if (occurrencesOfValueItems.containsKey(item)) {
occurrencesOfValueItems.put(item, occurrencesOfValueItems.get(item) + 1);
} else {
occurrencesOfValueItems.put(item, 1L);
}
});
return occurrencesOfValueItems;
}
The problem is that after the internal HashMap::hash
(also called re-hash) and getting the last bits that actually matter when deciding which bucket to choose, they have the same values:
System.out.println(hash("Kumar".hashCode()) & 15);
System.out.println(hash("Jens".hashCode()) & 15);
In simpler words, a HashMap
decides where to put an entry (bucket is chosen) based on the hashCode
of your entries. Well, almost, once the hashCode
is computed, internally there is another hash
done - to better disperse entries. That final int
value of the hashCode
is used to decide the bucket. When you create a HashMap with a default capacity of 16
(via new HashMap
for example), only the last 4 bits matter where an entry will go (that is why I did the & 15
there - to see the last 4 bits).
where hash
is :
// xor first 16 and last 16 bits
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Now, it turns out that ["Kumar" and "Jens"]
or ["Xavier", "Kenneth", "Samuel"]
have the same last 4 digits after the algorithm above is applied (3 in the first case and 1 in the second case).
Now that we know this info, this actually can be simplified even further:
Map<String, Long> map = new HashMap<>();
map.put("Kumar", 2L);
map.put("Jens", 1L);
System.out.println(map); // {Kumar=2, Jens=1}
map = new HashMap<>();
map.computeIfAbsent("Kumar", x -> 2L);
map.computeIfAbsent("Jens", x -> 1L);
System.out.println(map); // {Jens=1, Kumar=2}
I've used map.computeIfAbsent
because this is what Collectors.groupingBy
is using under the hood.
It turns out that put
and computeIfAbsent
, put elements in the HashMap
using a different way; this is totally allowed as a Map does not have any order anyway - and these elements end up in the same bucket anyway, which is the import part. So test your code, key by key, the previous testing code was broken.
This is even funner reading if you want:
HashMap::put
will add elements in a Linked
fashion (until Tree
entries are created), so if you have one element existing, all others will be added like:
one --> next --> next ... so on.
elements are appended to the end of this queue
as they come in to the put
method.
On the other hand computeIfAbsent
is a bit different, it adds elements to the beginning of the queue. If we take the example above, first Xavier
is added. Then, when Kenneth
is added, becoming the first:
Kenneth -> Xavier // Xavier was "first"
When Samuel
is added, it becomes the first:
Samuel -> [Kenneth -> Xavier]