Why is Scala hashmap slow?
Scala 2.13 (June 2019) do introduce new, faster HashMap/Set
implementations
Both immutable (d5ae93e) and mutable (#7348) versions were completely replaced. - They substantially outperform the old implementations in most scenarios. - The mutable versions now perform on par with the Java standard library's implementations.
For the immutable HashSet
and HashMap
:
The reimplementations are based upon Compressed Hash-Array Mapped Prefix-trees (CHAMP).
See paper "Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections" by Steindorfer and Vinju (OOPSLA'15) for more details and descriptions of low-level performance optimizations (a pre-print of the paper is available).
Instead of calling apply
i.e scalaMap(i)
, if you do scalaMap.get(i)
then it is as fast as javaMap.get(i)
From source, the code for apply is
def apply(key: A): B = get(key) match {
case None => default(key)
case Some(value) => value
}
which shows that apply method first calls the get
method and then pattern matches on it. Having an extra hop for each call in case of an option
does have performance penalty and has already been discussed on SO(can't find the link though)
First of all: doing JVM benchmarks using nanoTime is extremely error-prone. Use a microbenchmarking framework such as Thyme, Caliper or JMH
Second: you are comparing a mutable java hash map with an immutable scala hash map. Immutable collections can be remarkably fast, but there are some cases where they will never be as fast as mutable data structures.
Here is a proper microbenchmark of mutable java hash map vs. immutable scala hash map: https://gist.github.com/rklaehn/26c277b2b5666ec4b372
As you can see, the scala immutable map is a bit faster than the java mutable map. Note that this will not be the case once you go to larger maps, because an immutable data structure has to do some compromises to enable structural sharing. I would guess that in both cases, the dominant performance issue is boxing of the ints to Integer.
Update: if you really want a mutable hash hap with ints as keys, the right choice from the scala collections library is scala.collection.mutable.LongMap. This uses a long as key, and has much better performance than the generic Map because it does not have to box the value. See results from the gist.
Update 2: If your key extends from AnyRef (like e.g. a String), your best bet for a high performance mutable map is scala.collection.mutable.AnyRefMap