How does Scala's groupBy identity work?

To understand this just call scala repl with -Xprint:typer option:

val res2: immutable.Map[Char,String] = augmentString(str).groupBy[Char]({
   ((x: Char) => identity[Char](x))
});

Scalac converts a simple String into StringOps with is a subclass of TraversableLike which has a groupBy method:

def groupBy[K](f: A => K): immutable.Map[K, Repr] = {
    val m = mutable.Map.empty[K, Builder[A, Repr]]
    for (elem <- this) {
      val key = f(elem)
      val bldr = m.getOrElseUpdate(key, newBuilder)
      bldr += elem
    }
    val b = immutable.Map.newBuilder[K, Repr]
    for ((k, v) <- m)
      b += ((k, v.result))

    b.result
  }

So groupBy contains a map into which inserts chars return by identity function.


This is your expression:

val list = str.groupBy(identity).toList.sortBy(_._1).map(_._2)

Let's go item by function by function. The first one is groupBy, which will partition your String using the list of keys passed by the discriminator function, which in your case is identity. The discriminator function will be applied to each character in the screen and all characters that return the same result will be grouped together. If we want to separate the letter a from the rest we could use x => x == 'a' as our discriminator function. That would group your string chars into the return of this function (true or false) in map:

 Map(false -> bbbccccdd, true -> aaa)

By using identity, which is a "nice" way to say x => x, we get a map where each character gets separated in map, in your case:

Map(c -> cccc, a -> aaa, d -> dd, b -> bbb)

Then we convert the map to a list of tuples (char,String) with toList.

Order it by char with sortBy and just keep the String with the map getting your final result.


First, let's see what happens when you iterate over a String:

scala> "asdf".toList
res1: List[Char] = List(a, s, d, f)

Next, consider that sometimes we want to group elements on the basis of some specific attribute of an object.

For instance, we might group a list of strings by length as in...

List("aa", "bbb", "bb", "bbb").groupBy(_.length)

What if you just wanted to group each item by the item itself. You could pass in the identity function like this:

List("aa", "bbb", "bb", "bbb").groupBy(identity)

You could do something silly like this, but it would be silly:

List("aa", "bbb", "bb", "bbb").groupBy(_.toString)

Tags:

Scala