Numpy "where" with multiple conditions
Try this: Using the setup from @Maxu
col = 'consumption_energy'
conditions = [ df2[col] >= 400, (df2[col] < 400) & (df2[col]> 200), df2[col] <= 200 ]
choices = [ "high", 'medium', 'low' ]
df2["energy_class"] = np.select(conditions, choices, default=np.nan)
consumption_energy energy_class
0 459 high
1 416 high
2 186 low
3 250 medium
4 411 high
5 210 medium
6 343 medium
7 328 medium
8 208 medium
9 223 medium
You can use a ternary:
np.where(consumption_energy > 400, 'high',
(np.where(consumption_energy < 200, 'low', 'medium')))
I like to keep the code clean. That's why I prefer np.vectorize
for such tasks.
def conditions(x):
if x > 400: return "High"
elif x > 200: return "Medium"
else: return "Low"
func = np.vectorize(conditions)
energy_class = func(df_energy["consumption_energy"])
Then just add numpy array as a column in your dataframe using:
df_energy["energy_class"] = energy_class
The advantage in this approach is that if you wish to add more complicated constraints to a column, it can be done easily. Hope it helps.
I would use the cut() method here, which will generate very efficient and memory-saving category
dtype:
In [124]: df
Out[124]:
consumption_energy
0 459
1 416
2 186
3 250
4 411
5 210
6 343
7 328
8 208
9 223
In [125]: pd.cut(df.consumption_energy,
[0, 200, 400, np.inf],
labels=['low','medium','high']
)
Out[125]:
0 high
1 high
2 low
3 medium
4 high
5 medium
6 medium
7 medium
8 medium
9 medium
Name: consumption_energy, dtype: category
Categories (3, object): [low < medium < high]