Spark: Convert column of string to an array
There are various method,
The best way to do is using split
function and cast to array<long>
data.withColumn("b", split(col("b"), ",").cast("array<long>"))
You can also create simple udf to convert the values
val tolong = udf((value : String) => value.split(",").map(_.toLong))
data.withColumn("newB", tolong(data("b"))).show
Hope this helps!
Using a UDF would give you exact required schema. Like this:
val toArray = udf((b: String) => b.split(",").map(_.toLong))
val test1 = test.withColumn("b", toArray(col("b")))
It would give you schema as follows:
scala> test1.printSchema
root
|-- a: long (nullable = true)
|-- b: array (nullable = true)
| |-- element: long (containsNull = true)
+---+-----+
| a| b |
+---+-----+
| 1|[2,3]|
+---+-----+
| 2|[4,5]|
+---+-----+
As far as applying schema on file read itself is concerned, I think that is a tough task. So, for now you can apply transformation after creating DataFrameReader
of test
.
I hope this helps!