renaming columns for pyspark dataframes aggregates

Although I still prefer dplyr syntax, this code snippet will do:

import pyspark.sql.functions as sf

(df.groupBy("group")
   .agg(sf.sum('money').alias('money'))
   .show(100))

It gets verbose.

withColumnRenamed should do the trick. Here is the link to the pyspark.sql API.

df.groupBy("group")\
  .agg({"money":"sum"})\
  .withColumnRenamed("SUM(money)", "money")
  .show(100)

It's simple as:

 val maxVideoLenPerItemDf = requiredItemsFiltered.groupBy("itemId").agg(max("playBackDuration").as("customVideoLength"))
maxVideoLenPerItemDf.show()

Use .as in agg to name the new row created.

Related