Scala Partition/Collect Usage
collect
(defined on TraversableLike and available in all subclasses) works with a collection and a PartialFunction
. It also just so happens that a bunch of case clauses defined inside braces are a partial function (See section 8.5 of the Scala Language Specification [warning - PDF])
As in exception handling:
try {
... do something risky ...
} catch {
//The contents of this catch block are a partial function
case e: IOException => ...
case e: OtherException => ...
}
It's a handy way to define a function that will only accept some values of a given type.
Consider using it on a list of mixed values:
val mixedList = List("a", 1, 2, "b", 19, 42.0) //this is a List[Any]
val results = mixedList collect {
case s: String => "String:" + s
case i: Int => "Int:" + i.toString
}
The argument to to collect
method is a PartialFunction[Any,String]
. PartialFunction
because it's not defined for all possible inputs of type Any
(that being the type of the List
) and String
because that's what all the clauses return.
If you tried to use map
instead of collect
, the the double value at the end of mixedList
would cause a MatchError
. Using collect
just discards this, as well as any other value for which the PartialFunction is not defined.
One possible use would be to apply different logic to elements of the list:
var strings = List.empty[String]
var ints = List.empty[Int]
mixedList collect {
case s: String => strings :+= s
case i: Int => ints :+= i
}
Although this is just an example, using mutable variables like this is considered by many to be a war crime - So please don't do it!
A much better solution is to use collect twice:
val strings = mixedList collect { case s: String => s }
val ints = mixedList collect { case i: Int => i }
Or if you know for certain that the list only contains two types of values, you can use partition
, which splits a collections into values depending on whether or not they match some predicate:
//if the list only contains Strings and Ints:
val (strings, ints) = mixedList partition { case s: String => true; case _ => false }
The catch here is that both strings
and ints
are of type List[Any]
, though you can easily coerce them back to something more typesafe (perhaps by using collect
...)
If you already have a type-safe collection and want to split on some other property of the elements, then things are a bit easier for you:
val intList = List(2,7,9,1,6,5,8,2,4,6,2,9,8)
val (big,small) = intList partition (_ > 5)
//big and small are both now List[Int]s
Hope that sums up how the two methods can help you out here!
Not sure how to do it with collect
without using mutable lists, but partition
can use pattern matching as well (just a little more verbose)
List("a", 1, 2, "b", 19).partition {
case s:String => true
case _ => false
}
The signature of the normally-used collect
on, say, Seq
, is
collect[B](pf: PartialFunction[A,B]): Seq[B]
which is really a particular case of
collect[B, That](pf: PartialFunction[A,B])(
implicit bf: CanBuildFrom[Seq[A], B, That]
): That
So if you use it in default mode, the answer is no, assuredly not: you get exactly one sequence out from it. If you follow CanBuildFrom
through Builder
, you see that it would be possible to make That
actually be two sequences, but it would have no way of being told which sequence an item should go into, since the partial function can only say "yes, I belong" or "no, I do not belong".
So what do you do if you want to have multiple conditions that result in your list being split into a bunch of different pieces? One way is to create an indicator function A => Int
, where your A
is mapped into a numbered class, and then use groupBy
. For example:
def optionClass(a: Any) = a match {
case None => 0
case Some(x) => 1
case _ => 2
}
scala> List(None,3,Some(2),5,None).groupBy(optionClass)
res11: scala.collection.immutable.Map[Int,List[Any]] =
Map((2,List(3, 5)), (1,List(Some(2))), (0,List(None, None)))
Now you can look up your sub-lists by class (0, 1, and 2 in this case). Unfortunately, if you want to ignore some inputs, you still have to put them in a class (e.g. you probably don't care about the multiple copies of None
in this case).