KeyError: False in pandas dataframe

If you find that your data contains spelling variations or alternative restaurant related terms, the following may be of benefit. Essentially you put your restaurant related terms in restuarant_lst. The lambda function returns true if any of the items in restaurant_lst are contained within each row of the business series. The .loc indexer filters out rows which return false for the lambda function.

restaurant_lst = ['Restaurant','restaurantes','diner','bistro']
restaurant = businesses.loc[businesses.apply(lambda x: any(restaurant_str in x for restaurant_str in restaurant_lst))]

The expression 'Restaurants' in businesses['categories'] returns the boolean value False. This is passed to the brackets indexing operator for the DataFrame businesses which does not contain a column called False and thus raises a KeyError.

What you are looking to do is something called boolean indexing which works like this.

businesses[businesses['categories'] == 'Restaurants']

The reason for this is that the Series class implements a custom in operator that doesn't return an iterable like the == does, here's a workaround

businesses[['Restaurants' in c for c in list(businesses['categories'])]]

hopefully this helps someone where you're looking for a substring in the column and not a full match.

Tags:

Python

Pandas