How to create a custom Encoder in Spark 2.X Datasets?
Did you import the implicit encoders?
import spark.implicits._
http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.sql.Encoder
As far as I am aware nothing really changed since 1.6 and the solutions described in How to store custom objects in Dataset? are the only available options. Nevertheless your current code should work just fine with default encoders for product types.
To get some insight why your code worked in 1.x and may not work in 2.0.0 you'll have to check the signatures. In 1.x DataFrame.map
is a method which takes function Row => T
and transforms RDD[Row]
into RDD[T]
.
In 2.0.0 DataFrame.map
takes a function of type Row => T
as well, but transforms Dataset[Row]
(a.k.a DataFrame
) into Dataset[T]
hence T
requires an Encoder
. If you want to get the "old" behavior you should use RDD
explicitly:
df.rdd.map(row => ???)
For Dataset[Row]
map
see Encoder error while trying to map dataframe row to updated row