Pandas filter rows based on multiple conditions
You could also use query
:
df.query('risk in ["Small","Medium","High"]')
You can refer to variables in the environment by prefixing them with @
. For example:
lst = ["Small","Medium","High"]
df.query("risk in @lst")
If the column name is multiple words, e.g. "risk factor"
, you can refer to it by surrounding it with backticks ` `
:
df.query('`risk factor` in @lst')
query
method comes in handy if you need to chain multiple conditions. For example, the outcome of the following filter:
df[df['risk factor'].isin(lst) & (df['value']**2 > 2) & (df['value']**2 < 5)]
can be derived using the following expression:
df.query('`risk factor` in @lst and 2 < value**2 < 5')
I think you want:
df = df[(df.risk.isin(["Small","Medium","High"]))]
Example:
In [5]:
import pandas as pd
df = pd.DataFrame({'risk':['Small','High','Medium','Negligible', 'Very High']})
df
Out[5]:
risk
0 Small
1 High
2 Medium
3 Negligible
4 Very High
[5 rows x 1 columns]
In [6]:
df[df.risk.isin(['Small','Medium','High'])]
Out[6]:
risk
0 Small
1 High
2 Medium
[3 rows x 1 columns]
Another nice and readable approach is the following:
small_risk = df["risk"] == "Small"
medium_risk = df["risk"] == "Medium"
high_risk = df["risk"] == "High"
Then you can use it like this:
df[small_risk | medium_risk | high_risk]
or
df[small_risk & medium_risk]