How to print the contents of RDD?
If you want to view the content of a RDD, one way is to use collect()
:
myRDD.collect().foreach(println)
That's not a good idea, though, when the RDD has billions of lines. Use take()
to take just a few to print out:
myRDD.take(n).foreach(println)
The map
function is a transformation, which means that Spark will not actually evaluate your RDD until you run an action on it.
To print it, you can use foreach
(which is an action):
linesWithSessionId.foreach(println)
To write it to disk you can use one of the saveAs...
functions (still actions) from the RDD API