Python: Removing Rows on Count condition

This is one way using pd.Series.value_counts.

counts = df['city'].value_counts()

res = df[~df['city'].isin(counts[counts < 5].index)]

counts is a pd.Series object. counts < 5 returns a Boolean series. We filter the counts series by the Boolean counts < 5 series (that's what the square brackets achieve). We then take the index of the resultant series to find the cities with < 5 counts. ~ is the negation operator.

Remember a series is a mapping between index and value. The index of a series does not necessarily contain unique values, but this is guaranteed with the output of value_counts.

Here you go with filter

df.groupby('city').filter(lambda x : len(x)>3)
Out[1743]: 
  city
0  NYC
1  NYC
2  NYC
3  NYC

Solution two transform

sub_df = df[df.groupby('city').city.transform('count')>3].copy() 
# add copy for future warning when you need to modify the sub df

Python: Removing Rows on Count condition

Tags:

Python

Pandas

Indexing

Counter

Dataframe

Related

Recent Posts