How to convert the datasets of Spark Row into string?
You can use the map
function to convert every row into a string, e.g.:
df.map(row => row.mkString())
Instead of just mkString
you can of course do more sophisticated work
The collect
method then can retreive the whole thing into an array
val strings = df.map(row => row.mkString()).collect
(This is the Scala syntax, I think in Java it's quite similar)
Here is the sample code in Java.
public class SparkSample {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("SparkSample")
.master("local[*]")
.getOrCreate();
//create df
List<String> myList = Arrays.asList("one", "two", "three", "four", "five");
Dataset<Row> df = spark.createDataset(myList, Encoders.STRING()).toDF();
df.show();
//using df.as
List<String> listOne = df.as(Encoders.STRING()).collectAsList();
System.out.println(listOne);
//using df.map
List<String> listTwo = df.map(row -> row.mkString(), Encoders.STRING()).collectAsList();
System.out.println(listTwo);
}
}
"row" is java 8 lambda parameter. Please check developer.com/java/start-using-java-lambda-expressions.html