Reading TSV into Spark Dataframe with Scala API
All of the option parameters are passed in the option()
function as below:
val segments = sqlContext.read.format("com.databricks.spark.csv")
.option("delimiter", "\t")
.load("s3n://michaeldiscenza/data/test_segments")
With Spark 2.0+ use the built-in CSV connector to avoid third party dependancy and better performance:
val spark = SparkSession.builder.getOrCreate()
val segments = spark.read.option("sep", "\t").csv("/path/to/file")