Remove rows from Pandas dataframe where value only appears once
Change the len
to count
df[df.groupby('ID').ID.transform('count') > 1]
Out[589]:
ID Month Metric1 Metric2
0 1 2018-01-01 4 3
1 1 2018-02-01 3 2
3 3 2018-01-01 4 2
4 3 2018-02-01 6 3
Try with pd.series.duplicated()
:
df1=df[df.ID.duplicated(keep=False)]
print(df1)
ID Month Metric1 Metric2
0 1 2018-01-01 4 3
1 1 2018-02-01 3 2
3 3 2018-01-01 4 2
4 3 2018-02-01 6 3
filter
I cannot vouche for the speed of this but this is what this API was intended for...
df.groupby('ID').filter(lambda d: len(d) > 1)
ID Month Metric1 Metric2
0 1 2018-01-01 4 3
1 1 2018-02-01 3 2
3 3 2018-01-01 4 2
4 3 2018-02-01 6 3
Numpy'd version of @Wen-Ben's answer
u, i = np.unique(df.ID.values, return_inverse=True)
df[np.bincount(i)[i] > 1]
ID Month Metric1 Metric2
0 1 2018-01-01 4 3
1 1 2018-02-01 3 2
3 3 2018-01-01 4 2
4 3 2018-02-01 6 3
Because I was curious...
s0 = set()
s1 = set()
for i in df.ID:
if i in s0:
s1.add(i)
s0.add(i)
df[df.ID.map(s1.__contains__)]
ID Month Metric1 Metric2
0 1 2018-01-01 4 3
1 1 2018-02-01 3 2
3 3 2018-01-01 4 2
4 3 2018-02-01 6 3