Python: Removing Rows on Count condition
This is one way using pd.Series.value_counts
.
counts = df['city'].value_counts()
res = df[~df['city'].isin(counts[counts < 5].index)]
counts
is a pd.Series
object. counts < 5
returns a Boolean series. We filter the counts series by the Boolean counts < 5
series (that's what the square brackets achieve). We then take the index of the resultant series to find the cities with < 5 counts. ~
is the negation operator.
Remember a series is a mapping between index and value. The index of a series does not necessarily contain unique values, but this is guaranteed with the output of value_counts
.
Here you go with filter
df.groupby('city').filter(lambda x : len(x)>3)
Out[1743]:
city
0 NYC
1 NYC
2 NYC
3 NYC
Solution two transform
sub_df = df[df.groupby('city').city.transform('count')>3].copy()
# add copy for future warning when you need to modify the sub df