Pandas aggregate with dynamic column names

If your group_color is always the same within one group, you can do:

df.pivot_table(index=['group','group_color'],aggfunc='mean')

Output:

                       val1      val2
group group_color                    
A     green        3.333333  4.666667
B     blue         4.500000  6.000000

In the other case, you can build the dictionary and pass it to agg:

agg_dict = {f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]}
df.groupby('group').agg(agg_dict)

Which output:

      group_color      val1      val2
group                                
A           green  3.333333  4.666667
B            blue  4.500000  6.000000

More easy like

df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]: 
      group_color      val1      val2
group                                
A           green  3.333333  4.666667
B            blue  4.500000  6.000000

Unfortunately you will have to apply both aggregation functions separately (that or repeat "valn": "mean" as many times as valx columns). Groupby.agg can take a dictionary but the keys must be individual columns.

The way I'd do this is using DataFrame.filter to select the subset of the dataframe with the columns following the format of valx, aggregate with the mean, and then assign new columns with the aggregated results on the other columns:

(df.filter(regex=r'^val').groupby(df.group).mean()
   .assign(color = df.group_color.groupby(df.group).first()))

         val1      val2    color
group                           
A      3.333333  4.666667  green
B      4.500000  6.000000   blue

Pandas aggregate with dynamic column names

Tags:

Python

Pandas

Aggregate

Pandas Groupby

Related

Recent Posts