Access Array column in Spark
ArrayType
is represented in a Row
as a scala.collection.mutable.WrappedArray
. You can extract it using for example
val arr: Seq[Double] = r.getAs[Seq[Double]]("x")
or
val i: Int = ???
val arr = r.getSeq[Double](i)
or even:
import scala.collection.mutable.WrappedArray
val arr: WrappedArray[Double] = r.getAs[WrappedArray[Double]]("x")
If DataFrame
is relatively thin then pattern matching could be a better approach:
import org.apache.spark.sql.Row
df.rdd.map{case Row(x: Seq[Double]) => (x.toArray, x.sum)}
although you have to keep in mind that the type of the sequence is unchecked.
In Spark >= 1.6 you can also use Dataset
as follows:
df.select("x").as[Seq[Double]].rdd