Python/pyspark data frame rearrange columns
If you just want to reorder some of them, while keeping the rest and not bothering about their order :
def get_cols_to_front(df, columns_to_front) :
original = df.columns
# Filter to present columns
columns_to_front = [c for c in columns_to_front if c in original]
# Keep the rest of the columns and sort it for consistency
columns_other = list(set(original) - set(columns_to_front))
columns_other.sort()
# Apply the order
df = df.select(*columns_to_front, *columns_other)
return df
If you're working with a large number of columns:
df.select(sorted(df.columns))
You can use select
to change the order of the columns:
df.select("id","name","time","city")