Pyspark : select specific column with its position

You can always get the name of the column with df.columns[n] and then select it:

df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])

To select column at position n:

n = 1
df.select(df.columns[n]).show()
+---+                                                                           
|  b|
+---+
|  2|
|  4|
+---+

To select all but column n:

n = 1

You can either use drop:

df.drop(df.columns[n]).show()
+---+
|  a|
+---+
|  1|
|  3|
+---+

Or select with manually constructed column names:

df.select(df.columns[:n] + df.columns[n+1:]).show()
+---+
|  a|
+---+
|  1|
|  3|
+---+

Related