How to group a pandas dataframe by a defined time interval?
Using DataFrame.resample
which is a dedicated method for resampling time series, this way we dont need DataFrame.GroupBy
and pd.Grouper
:
df.resample('60min', base=30, label='right').first()
Output
data
index
2017-02-14 06:30:00 11198648.0
2017-02-14 07:30:00 11198650.0
2017-02-14 08:30:00 NaN
2017-02-14 09:30:00 NaN
2017-02-14 10:30:00 NaN
2017-02-14 11:30:00 NaN
2017-02-14 12:30:00 NaN
2017-02-14 13:30:00 NaN
2017-02-14 14:30:00 NaN
2017-02-14 15:30:00 NaN
2017-02-14 16:30:00 NaN
2017-02-14 17:30:00 NaN
2017-02-14 18:30:00 NaN
2017-02-14 19:30:00 NaN
2017-02-14 20:30:00 NaN
2017-02-14 21:30:00 NaN
2017-02-14 22:30:00 NaN
2017-02-14 23:30:00 11207728.0
Notice: when you have multiple columns in your dataframe, you have to specify the column you want to aggregate on:
df.resample('60min', base=30, label='right')['data'].first()
Use base=30
in conjunction with label='right'
parameters in pd.Grouper
.
Specifying label='right'
makes the time-period to start grouping from 6:30 (higher side) and not 5:30.
Also, base
is set to 0 by default, hence the need to offset those by 30 to account for the forward propagation of dates.
Suppose, you want to aggregate the first element of every sub-group, then:
df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).first()
# same thing using resample - df.resample('60Min', base=30, label='right').first()
yields:
data
index
2017-02-14 06:30:00 11198648.0
2017-02-14 07:30:00 11198650.0
2017-02-14 08:30:00 NaN
2017-02-14 09:30:00 NaN
2017-02-14 10:30:00 NaN
2017-02-14 11:30:00 NaN
2017-02-14 12:30:00 NaN
2017-02-14 13:30:00 NaN
2017-02-14 14:30:00 NaN
2017-02-14 15:30:00 NaN
2017-02-14 16:30:00 NaN
2017-02-14 17:30:00 NaN
2017-02-14 18:30:00 NaN
2017-02-14 19:30:00 NaN
2017-02-14 20:30:00 NaN
2017-02-14 21:30:00 NaN
2017-02-14 22:30:00 NaN
2017-02-14 23:30:00 11207728.0