Efficient way of filtering by datetime in groupby
Generally, avoid groupby().apply()
since it's not vectorized across groups, not to mention the overhead for memory allocation if you are returning new dataframes as in your case.
How about finding the time threshold with groupby().transform
then use boolean indexing on the whole data:
time_max_by_id = df.groupby('id')['time_entered'].transform('max') - pd.Timedelta('1D')
df[df['time_entered'] > time_max_by_id]
Output:
id time_entered val
2 1 2015-02-24 18:00:00 0.978738
3 1 2015-02-25 03:00:00 2.240893
4 1 2015-02-25 12:00:00 1.867558
5 2 2015-02-25 21:00:00 -0.977278
6 2 2015-02-26 06:00:00 0.950088
11 3 2015-02-28 03:00:00 1.454274
12 3 2015-02-28 12:00:00 0.761038
13 3 2015-02-28 21:00:00 0.121675