Specifying the output file name in Apache Spark

Spark is also using Hadoop under the hood, so you can probably get what you want. This is how saveAsTextFile is implemented:

def saveAsTextFile(path: String) {
  this.map(x => (NullWritable.get(), new Text(x.toString)))
    .saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path)
}

You could pass in a customized OutputFormat to saveAsHadoopFile. I have no idea how to do that from Python though. Sorry for the incomplete answer.

Specifying the output file name in Apache Spark

Tags:

Python

Apache Spark

Related

Recent Posts