How to add suffix and prefix to all columns in python/pyspark dataframe

Use list comprehension in python.

from pyspark.sql import functions as F

df = ...

df_new = df.select([F.col(c).alias("`"+c+"`") for c in df.columns])

This method also gives you the option to add custom python logic within the alias() function like: "prefix_"+c+"_suffix" if c in list_of_cols_to_change else c

To add prefix or suffix:

Refer df.columns for list of columns ([col_1, col_2...]). This is the dataframe, for which we want to suffix/prefix column.

df.columns

Iterate through above list and create another list of columns with alias that can used inside select expression.

from pyspark.sql.functions import col

select_list = [col(col_name).alias("prefix_" + col_name)  for col_name in df.columns]

When using inside select, do not forget to unpack list with asterisk(*). We can assign it back to same or different df for use.

df.select(*select_list).show()
df = df.select(*select_list)

df.columns will now return list of new columns(aliased).

If you would like to add a prefix or suffix to multiple columns in a pyspark dataframe, you could use a for loop and .withColumnRenamed().

As an example, you might like:

def add_prefix(sdf, prefix):

      for c in sdf.columns:

          sdf = sdf.withColumnRenamed(c, '{}{}'.format(prefix, c))

      return sdf

You can amend sdf.columns as you see fit.

How to add suffix and prefix to all columns in python/pyspark dataframe

Tags:

Python

Apache Spark

Pyspark

Spark Dataframe

Related

Recent Posts