Pandas Data Frame Filtering Multiple Conditions

You could do:

mask = ~df[['year', 'month']].apply(tuple, 1).isin([(1990, 7), (1990, 8), (1991, 1)])
print(df[mask])

Output

Click to copy

   year  month  data1
2  1990      9   2500
3  1990      9   1500
5  1991      2    350
6  1991      3    350
7  1991      7    450

Even faster (roughly 3x than the elegant version of @DaniMesejo applying tuple). But also it relies on the knowledge that months are bounded to (well below) 100, so less generalizable:

Click to copy

mask = ~(df.year*100 + df.month).isin({199007, 199008, 199101})
df[mask]

# out:
   year  month  data1
2  1990      9   2500
3  1990      9   1500
5  1991      2    350
6  1991      3    350
7  1991      7    450

How come this is 3x faster than the tuples solution? (Tricks for speed):

All vectorized operations and no apply.
No string operations, all ints.
Using .isin() with a set as argument (not a list).

You can add a value for yyyymm and then use this to remove the data you want.

Click to copy

df['yyyymm'] = df['year'].astype(str) + df['month'].astype(str).zfill(2)
df = df.loc[(df.yyyymm != '199007') & (df.yyyymm != '199008') & (df.yyyymm != '199101')]

Let us try merge

Click to copy

out = df.drop(df.reset_index().merge(pd.DataFrame({'year':[1990,1990,1991],'month':[7,8,1]}))['index'])
   year  month  data1
2  1990      9   2500
3  1990      9   1500
5  1991      2    350
6  1991      3    350
7  1991      7    450

And small improvement

Click to copy

out = df.merge(pd.DataFrame({'year':[1990,1990,1991],'month':[7,8,1]}),indicator=True,how='left').loc[lambda x : x['_merge']=='left_only']
   year  month  data1     _merge
2  1990      9   2500  left_only
3  1990      9   1500  left_only
5  1991      2    350  left_only
6  1991      3    350  left_only
7  1991      7    450  left_only

Based on my test this should be fast than apply tuple method ~

Pandas Data Frame Filtering Multiple Conditions

Tags:

Python

Pandas

Filter

Related

Recent Posts