Which function in spark is used to combine two RDDs by keys
Just use join
and then map
the resulting rdd.
rdd1.join(rdd2).map(case (k, (ls, rs)) => (k, ls ++ rs))
I would union the two RDDs and to a reduceByKey to merge the values.
(rdd1 union rdd2).reduceByKey(_ ++ _)