How to parse a csv that uses ^A (i.e. \001) as the delimiter with spark-csv?

If you check the GitHub page, there is a delimiter parameter for spark-csv (as you also noted). Use it like this:

val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "true") // Use first line of all files as header
    .option("inferSchema", "true") // Automatically infer data types
    .option("delimiter", "\u0001")
    .load("cars.csv")

With Spark 2.x and the CSV API, use the sep option:

val df = spark.read
  .option("sep", "\u0001")
  .csv("path_to_csv_files")

Tags:

Delimiter

Scala

Hive

Apache Spark

Spark Csv

Related