Python/pyspark data frame rearrange columns

If you just want to reorder some of them, while keeping the rest and not bothering about their order :

def get_cols_to_front(df, columns_to_front) :
    original = df.columns
    # Filter to present columns
    columns_to_front = [c for c in columns_to_front if c in original]
    # Keep the rest of the columns and sort it for consistency
    columns_other = list(set(original) - set(columns_to_front))
    columns_other.sort()
    # Apply the order
    df = df.select(*columns_to_front, *columns_other)

    return df

If you're working with a large number of columns:

df.select(sorted(df.columns))

You can use select to change the order of the columns:

df.select("id","name","time","city")

Python/pyspark data frame rearrange columns

Tags:

Python

Pyspark

Spark Dataframe

Related

Recent Posts