Create Spark Dataset from a CSV file

Use schema inference:

val cities = spark.read
  .option("inferSchema", "true")
  ...

or provide schema:

val cities = spark.read
  .schema(StructType(Array(StructField("name", StringType), ...)

or cast:

val cities = spark.read
  .option("header", "true")
  .csv(location)
  .withColumn("number_of_people", col("number_of_people").cast(LongType))
  .as[City]

with your case class as case class City(name: String, state: String, number_of_people: Long), you just need one line

private val cityEncoder = Seq(City("", "", 0)).toDS

then you code

val cities = spark.read
.option("header", "true")
.option("charset", "UTF8")
.option("delimiter",",")
.csv(location)
.as[City]

will just work.

This is the official source [http://spark.apache.org/docs/latest/sql-programming-guide.html#overview][1]