Spark: FlatMapValues query
flatMapValues
method is a combination of flatMap
and mapValues
.
Let's start with the given rdd.
val sampleRDD = sc.parallelize(Array((1,2),(3,4),(3,6)))
mapValues
maps the values while keeping the keys.
For example, sampleRDD.mapValues(x => x to 5)
returns
Array((1,Range(2, 3, 4, 5)), (3,Range(4, 5)), (3,Range()))
notice that for key-value pair (3, 6)
, it produces (3,Range())
since 6 to 5
produces an empty collection of values.
flatMap
"breaks down" collections into the elements of the collection. You can search for more accurate description of flatMap online like here and here.
For example,
given val rdd2 = sampleRDD.mapValues(x => x to 5)
,
if we do rdd2.flatMap(x => x)
, you will get
Array((1,2),(1,3),(1,4),(1,5),(3,4),(3,5)).
That is, for every element in the collection in each key, we create a (key, element)
pair.
Also notice that (3, Range())
does not produce any additional key element pair since the sequence is empty.
now combining flatMap
and mapValues
, you get flatMapValues
.
flatMapValues works on each value associated with key. In above case x to 5
means each value will be incremented till 5.
Taking first pair where you have (1,2)
, here key is 1 and value is 2 so there after applying transformation it will become {(1,2),(1,3),(1,4),(1,5)}
.
Hope this helps.