Equivalent of R/ifelse in Python/Pandas? Compare string columns?

You have to upgrade pandas to last version, because in version 0.17.1 it works very well.

Sample (first value in column install_month is changed for matching):

print maupayment
  log_month  user_id install_month  payment
1   2013-06        1       2013-06        0
2   2013-06        2       2013-04        0
3   2013-06        3       2013-04    14994

print np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
['install' 'existing' 'existing']

One option is to use an anonymous function in combination with Pandas's apply function:

Setup some branching logic in a function:

def if_this_else_that(x, list_of_checks, yes_label, no_label):
    if x in list_of_checks:
        res = yes_label
    else: 
        res = no_label
    return(res)

This takes the x from lambda (see below), a list of things to look for, the yes label, and the no label.

For example, say we are looking at the IMDB dataset (imdb_df):

enter image description here

...and I want to add a new column called "new_rating" that shows whether the movie is mature or not.

I can use Pandas apply function along with my branching logic above:

imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))

enter image description here

There are also times we need to combine this with another check. For example, some entries in the IMDB dataset are NaN. I can check for both NaN and the maturity rating as follows:

imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: 'not provided' if x in ['nan'] else if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))

In this case my NaN was first converted to a string, but you can obviously do this with genuine NaNs as well.

Equivalent of R/ifelse in Python/Pandas? Compare string columns?

Tags:

Python

Pandas

Numpy

Related

Recent Posts