pandas if else conditions on multiple columns

Use numpy.select:

df['value'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c)
print (df)
   a  b  c  value
0  0  0  6      6
1  0  3  7      3
2  1  4  8      1
3  2  5  9      2

Difference between vectorized and loop solutions in 400k rows:

df = pd.concat([df] * 100000, ignore_index=True)

In [158]: %timeit df['value2'] = np.select([df.a > 0 , df.b > 0], [df.a, df.b], default=df.c)
9.86 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [159]: %timeit df['value1'] = [x if x > 0 else y if y>0 else z for x,y,z in zip(df['a'],df['b'],df['c'])]
399 ms ± 52.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

You can also use list comprehension:

df['value'] = [x if x > 0 else y if y>0 else z for x,y,z in zip(df['a'],df['b'],df['c'])]

You can write a function that takes a row in as a parameter, tests whatever conditions you want to test, and returns a True or False result - which you can then use as a selection tool. (Though on rereading of your question, this may not be what you're looking for - see part 2 below)

Perform a Selection

apply this function to your dataframe, and use the returned series of True/False answers as an index to select values from the actual dataframe itself.

e.g.

def selector(row):
    if row['a'] > 0 and row['b'] == 3 :
        return True
    elif row['c'] > 2:
        return True
    else:
        return False

You can build whatever logic you like, just ensure it returns True when you want a match and False when you don't.

Then try something like

df.apply(lambda row : selector(row), axis=1)

And it will return a Series of True-False answers. Plug that into your df to select only those rows that have a True value calculated for them.

df[df.apply(lambda row : selector(row), axis=1)]

And that should give you what you want.

Part 2 - Perform a Calculation

If you want to create a new column containing some calculated result - then it's a similar operation, create a function that performs your calculation:

def mycalc(row):
    if row['a'] > 5 :
        return row['a'] + row['b']
    else:
        return 66

Only this time, apply the result and assign it to a new column name:

df['value'] = df.apply( lambda row : mycalc(row), axis = 1)

And this will give you that result.

pandas if else conditions on multiple columns

Tags:

Python

Pandas

Related

Recent Posts