Efficient way of filtering by datetime in groupby

Generally, avoid groupby().apply() since it's not vectorized across groups, not to mention the overhead for memory allocation if you are returning new dataframes as in your case.

How about finding the time threshold with groupby().transform then use boolean indexing on the whole data:

Click to copy

time_max_by_id = df.groupby('id')['time_entered'].transform('max') - pd.Timedelta('1D')
df[df['time_entered'] > time_max_by_id]

Output:

Click to copy

    id        time_entered       val
2    1 2015-02-24 18:00:00  0.978738
3    1 2015-02-25 03:00:00  2.240893
4    1 2015-02-25 12:00:00  1.867558
5    2 2015-02-25 21:00:00 -0.977278
6    2 2015-02-26 06:00:00  0.950088
11   3 2015-02-28 03:00:00  1.454274
12   3 2015-02-28 12:00:00  0.761038
13   3 2015-02-28 21:00:00  0.121675

Efficient way of filtering by datetime in groupby

Tags:

Python

Pandas

Optimization

Numpy

Pandas Groupby

Related

Recent Posts