How to export DataFrame to csv in Scala?
Easiest and best way to do this is to use spark-csv
library. You can check the documentation in the provided link and here
is the scala example of how to load and save data from/to DataFrame.
Code (Spark 1.4+):
dataFrame.write.format("com.databricks.spark.csv").save("myFile.csv")
Edit:
Spark creates part-files while saving the csv data, if you want to merge the part-files into a single csv, refer the following:
Merge Spark's CSV output folder to Single File
In Spark verions 2+ you can simply use the following;
df.write.csv("/your/location/data.csv")
If you want to make sure that the files are no longer partitioned then add a .coalesce(1)
as follows;
df.coalesce(1).write.csv("/your/location/data.csv")
Above solution exports csv as multiple partitions. I found another solution by zero323 on this stackoverflow page that exports a dataframe into one single CSV file when you use coalesce
.
df.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("/your/location/mydata")
This would create a directory named mydata
where you'll find a csv
file that contains the results.