Pandas, how to filter a df to get unique entries?
Use groupby "type" and grab only the first object - df.groupby("type").first()
one way is to sort the dataframe and then take the first after a groupby.
# first way
sorted = df.sort_values(['type', 'value'], ascending = [True, False])
first = sorted.groupby('type').first().reset_index()
another way does not necessarily take only the first one, so potentially it would keep all IDs corresponding to the same maximum (and not take just 1 of them)
# second way
grouped = df.groupby('type').agg({'value': max}).reset_index()
grouped = grouped.set_index(['type','value'])
second = grouped.join(df.set_index(['type', 'value']))
example:
data
ID type value
1 A 8
2 A 5
3 B 11
4 C 12
5 D 1
6 D 22
7 D 13
8 D 22
first method results in
type ID value
A 1 8
B 3 11
C 4 12
D 6 22
second method keeps ID=8
ID
type value
A 8 1
B 11 3
C 12 4
D 22 6
22 8
(you can reset_index()
again here if you don't like the multiindex)
I prefer my way. Because groupby will create new df. You will get unique values. But tecnically this will not filter your df, this will create new one. My way will keep your indexes untouched, you will get the same df but without duplicates.
df = df.sort_values('value', ascending=False)
# this will return unique by column 'type' rows indexes
idx = df['type'].drop_duplicates().index
#this will return filtered df
df.loc[idx,:]
df[['type', 'value']].drop_duplicates(subset=['type'])
This works generally, if you would have more columns, you can select the interested columns, in our case we chose all, 'type', 'value'.