Pandas: GroupBy Shift And Cumulative Sum

While working on this problem, as the DataFrame size grows, using lambdas on transform starts to get very slow. I found out that using some DataFrameGroupBy methods (like cumsum and shift instead of lambdas are much faster.

So here's my proposed solution, creating a 'temp' column to save the cumsum for each ID and then shifting in a different groupby:

df['temp'] = df.groupby("ID")['X'].cumsum()
df['transformed'] = df.groupby("ID")['temp'].shift()
df = df.drop(columns=["temp"])

You need using apply , since one function is under groupby object which is cumsum another function shift is for all df

temp['transformed'] = temp.groupby('ID')['X'].apply(lambda x  : x.cumsum().shift())
temp
Out[287]: 
  ID  X  transformed
0  a  1          NaN
1  a  1          1.0
2  a  1          2.0
3  b  1          NaN
4  b  1          1.0
5  b  1          2.0
6  c  1          NaN
7  c  1          1.0

You could use transform() to feed the separate groups that are created at each level of groupby into the cumsum() and shift() methods.

temp['transformed'] = \
    temp.groupby('ID')['X'].transform(lambda x: x.cumsum().shift())
  ID  X   transformed
0  a  1   NaN
1  a  1   1.0
2  a  1   2.0
3  b  1   NaN
4  b  1   1.0
5  b  1   2.0
6  c  1   NaN
7  c  1   1.0

For more info on transform() please see here:

  • https://jakevdp.github.io/PythonDataScienceHandbook/03.08-aggregation-and-grouping.html#Transformation
  • https://pandas.pydata.org/pandas-docs/version/0.22/groupby.html#transformation