Pyspark : select specific column with its position
You can always get the name of the column with df.columns[n]
and then select
it:
df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])
To select column at position n
:
n = 1
df.select(df.columns[n]).show()
+---+
| b|
+---+
| 2|
| 4|
+---+
To select all but column n
:
n = 1
You can either use drop
:
df.drop(df.columns[n]).show()
+---+
| a|
+---+
| 1|
| 3|
+---+
Or select with manually constructed column names:
df.select(df.columns[:n] + df.columns[n+1:]).show()
+---+
| a|
+---+
| 1|
| 3|
+---+