How to change a column position in a spark dataframe?
You can get the column names, reorder them however you want, and then use select
on the original DataFrame to get a new one with this new order:
val columns: Array[String] = dataFrame.columns
val reorderedColumnNames: Array[String] = ??? // do the reordering you want
val result: DataFrame = dataFrame.select(reorderedColumnNames.head, reorderedColumnNames.tail: _*)
Like others have commented, I'm curious to know why would you do this as the order is not relevant when you can query the columns by their names.
Anyway, using a select should give the feeling the columns have moved in schema description:
val data = Seq(
("a", "hello", 1),
("b", "spark", 2)
)
.toDF("field1", "field2", "field3")
data
.show()
data
.select("field3", "field2", "field1")
.show()
A tiny different version compare to @Tzach Zohar
val cols = df.columns.map(df(_)).reverse
val reversedColDF = df.select(cols:_*)
The spark-daria library has a reorderColumns
method that makes it easy to reorder the columns in a DataFrame.
import com.github.mrpowers.spark.daria.sql.DataFrameExt._
val actualDF = sourceDF.reorderColumns(
Seq("field1", "field3", "field2")
)
The reorderColumns
method uses @Rockie Yang's solution under the hood.
If you want to get the column ordering of df1
to equal the column ordering of df2
, something like this should work better than hardcoding all the columns:
df1.reorderColumns(df2.columns)
The spark-daria library also defines a sortColumns
transformation to sort columns in ascending or descending order (if you don't want to specify all the column in a sequence).
import com.github.mrpowers.spark.daria.sql.transformations._
df.transform(sortColumns("asc"))