Pandas equivalent of Oracle Lead/Lag function

You could perform a groupby/apply (shift) operation:

In [15]: df['Data_lagged'] = df.groupby(['Group'])['Data'].shift(1)

In [16]: df
Out[16]: 
                Date Group  Data  Data_lagged
2014-05-14  09:10:00     A     1          NaN
2014-05-14  09:20:00     A     2            1
2014-05-14  09:30:00     A     3            2
2014-05-14  09:40:00     A     4            3
2014-05-14  09:50:00     A     5            4
2014-05-14  10:00:00     B     1          NaN
2014-05-14  10:10:00     B     2            1
2014-05-14  10:20:00     B     3            2
2014-05-14  10:30:00     B     4            3

[9 rows x 4 columns]

To obtain the ORDER BY Date ASC effect, you must sort the DataFrame first:

df['Data_lagged'] = (df.sort_values(by=['Date'], ascending=True)
                       .groupby(['Group'])['Data'].shift(1))

For lead operation in pandas, one need to just use shift(-1) instead of 1

df['Data_lead'] = df.groupby(['Group'])['Data'].shift(-1)

Pandas equivalent of Oracle Lead/Lag function

Tags:

Python

Pandas

Related

Recent Posts