Which function in spark is used to combine two RDDs by keys

Just use join and then map the resulting rdd.

rdd1.join(rdd2).map(case (k, (ls, rs)) => (k, ls ++ rs))

I would union the two RDDs and to a reduceByKey to merge the values.

(rdd1 union rdd2).reduceByKey(_ ++ _)

Tags:

Python

Scala

Apache Spark

Rdd

Related