Python Pandas How to assign groupby operation results back to columns in parent dataframe?

In [97]: df = pandas.DataFrame({'month': np.random.randint(0,11, 100), 'A': np.random.randn(100), 'B': np.random.randn(100)})

In [98]: df.join(df.groupby('month')['A'].sum(), on='month', rsuffix='_r')
Out[98]:
           A         B  month       A_r
0  -0.040710  0.182269      0 -0.331816
1  -0.004867  0.642243      1  2.448232
2  -0.162191  0.442338      4  2.045909
3  -0.979875  1.367018      5 -2.736399
4  -1.126198  0.338946      5 -2.736399
5  -0.992209 -1.343258      1  2.448232
6  -1.450310  0.021290      0 -0.331816
7  -0.675345 -1.359915      9  2.722156

While I'm still exploring all of the incredibly smart ways that apply concatenates the pieces it's given, here's another way to add a new column in the parent after a groupby operation.

In [236]: df
Out[236]: 
  yearmonth    return
0    201202  0.922132
1    201202  0.220270
2    201202  0.228856
3    201203  0.277170
4    201203  0.747347

In [237]: def add_mkt_return(grp):
   .....:     grp['mkt_return'] = grp['return'].sum()
   .....:     return grp
   .....: 

In [238]: df.groupby('yearmonth').apply(add_mkt_return)
Out[238]: 
  yearmonth    return  mkt_return
0    201202  0.922132    1.371258
1    201202  0.220270    1.371258
2    201202  0.228856    1.371258
3    201203  0.277170    1.024516
4    201203  0.747347    1.024516

As a general rule when using groupby(), if you use the .transform() function pandas will return a table with the same length as your original. When you use other functions like .sum() or .first() then pandas will return a table where each row is a group.

I'm not sure how this works with apply but implementing elaborate lambda functions with transform can be fairly tricky so the strategy that I find most helpful is to create the variables I need, place them in the original dataset and then do my operations there.

If I understand what you're trying to do correctly first you can calculate the total market cap for each group:

bdata['group_MarketCap'] = bdata.groupby('yearmonth')['MarketCap'].transform('sum')

This will add a column called "group_MarketCap" to your original data which would contain the sum of market caps for each group. Then you can calculate the weighted values directly:

bdata['weighted_P'] = bdata['PriceReturn'] * (bdata['MarketCap']/bdata['group_MarketCap'])

And finally you would calculate the weighted average for each group using the same transform function:

bdata['MarketReturn'] = bdata.groupby('yearmonth')['weighted_P'].transform('sum')

I tend to build my variables this way. Sometimes you can pull off putting it all in a single command but that doesn't always work with groupby() because most of the time pandas needs to instantiate the new object to operate on it at the full dataset scale (i.e. you can't add two columns together if one doesn't exist yet).

Hope this helps :)