Pandas group by on one column with max date on another column python
Tack 1
Sort by dealer and by date before using drop_duplicates. This is blind to the issue that surfaces in Tack 2, below since there is no possibility for multiple records for each dealer in this method. This may or may not be an issue for you depending on your data and your use case.
df.sort_values(['dealer', 'date'], inplace=True)
df.drop_duplicates(['dealer', 'date'], inplace=True)
Tack 2
This is a worse way to do it with a groupby and a merge. Use groupby
to find the max date for each dealer. We use the how='inner'
parameter to only include those dealer and date combinations that appear in the groupby object that contains the maximum date for each dealer.
However, please note that this will return multiple records per dealer if the max date is duplicated in the original table. You might need to use drop_duplicates depending on your data and your use case.
df.merge(df.groupby('dealer')['date'].max().reset_index(),
on=['dealer', 'date'], how='inner')
invoice_no dealer billing_change_previous_month date
0 100 1 -41981 2017-01-30
1 5505 2 0 2017-01-30
Here https://stackoverflow.com/a/41531127/9913319 is more correct solution:
df.sort_values('date').groupby('dealer').tail(1)
You can use boolean indexing using groupby and transform
df_new = df[df.groupby('dealer').date.transform('max') == df['date']]
invoice_no dealer billing_change_previous_month date
1 100 1 -41981 2017-01-30
2 5505 2 0 2017-01-30
The solution works as expected even if there are more than two dealers (to address question posted by Ben Smith),
df = pd.DataFrame({'invoice_no':[110,100,5505,5635,10000,10001], 'dealer':[1,1,2,2,3,3],'billing_change_previous_month':[0,-41981,0,58730,9000,100], 'date':['2016-12-31','2017-01-30','2017-01-30','2016-12-31', '2019-12-31', '2020-01-31']})
df['date'] = pd.to_datetime(df['date'])
df[df.groupby('dealer').date.transform('max') == df['date']]
invoice_no dealer billing_change_previous_month date
1 100 1 -41981 2017-01-30
2 5505 2 0 2017-01-30
5 10001 3 100 2020-01-31