case_when function from R to Python
You want to use np.select
:
conditions = [
(df["age"].lt(10)),
(df["age"].ge(10) & df["age"].lt(20)),
(df["age"].ge(20) & df["age"].lt(30)),
(df["age"].ge(30) & df["age"].lt(50)),
(df["age"].ge(50)),
]
choices = ["baby", "kid", "young", "mature", "grandpa"]
df["elderly"] = np.select(conditions, choices)
# Results in:
# name age preTestScore postTestScore elderly
# 0 Jason 42 4 25 mature
# 1 Molly 52 24 94 grandpa
# 2 Tina 36 31 57 mature
# 3 Jake 24 2 62 young
# 4 Amy 73 3 70 grandpa
The conditions
and choices
lists must be the same length.
There is also a default
parameter that is used when all conditions
evaluate to False
.
np.select
is great because it's a general way to assign values to elements in choicelist depending on conditions.
However, for the particular problem OP tries to solve, there is a succinct way to achieve the same with the pandas' cut
method.
bin_cond = [-np.inf, 10, 20, 30, 50, np.inf] # think of them as bin edges
bin_lab = ["baby", "kid", "young", "mature", "grandpa"] # the length needs to be len(bin_cond) - 1
df["elderly2"] = pd.cut(df["age"], bins=bin_cond, labels=bin_lab)
# name age preTestScore postTestScore elderly elderly2
# 0 Jason 42 4 25 mature mature
# 1 Molly 52 24 94 grandpa grandpa
# 2 Tina 36 31 57 mature mature
# 3 Jake 24 2 62 young young
# 4 Amy 73 3 70 grandpa grandpa