Upacking a list to select multiple columns from a spark data frame
You can typecast String to spark column like this:
import org.apache.spark.sql.functions._
df.select(cols.map(col): _*)
Use df.select(cols.head, cols.tail: _*)
Let me know if it works :)
Explanation from @Ben:
The key is the method signature of select:
select(col: String, cols: String*)
The cols:String*
entry takes a variable number of arguments. :_*
unpacks arguments so that they can be handled by this argument. Very similar to unpacking in python with *args
. See here and here for other examples.
Another option that I've just learnt.
import org.apache.spark.sql.functions.col
val columns = Seq[String]("col1", "col2", "col3")
val colNames = columns.map(name => col(name))
val df = df.select(colNames:_*)