KeyError: False in pandas dataframe
If you find that your data contains spelling variations or alternative restaurant related terms, the following may be of benefit. Essentially you put your restaurant related terms in restuarant_lst
. The lambda
function returns true
if any of the items in restaurant_lst
are contained within each row of the business series. The .loc
indexer filters out rows which return false
for the lambda
function.
restaurant_lst = ['Restaurant','restaurantes','diner','bistro']
restaurant = businesses.loc[businesses.apply(lambda x: any(restaurant_str in x for restaurant_str in restaurant_lst))]
The expression 'Restaurants' in businesses['categories']
returns the boolean value False
. This is passed to the brackets indexing operator for the DataFrame businesses which does not contain a column called False and thus raises a KeyError.
What you are looking to do is something called boolean indexing which works like this.
businesses[businesses['categories'] == 'Restaurants']
The reason for this is that the Series
class implements a custom in
operator that doesn't return an iterable
like the ==
does, here's a workaround
businesses[['Restaurants' in c for c in list(businesses['categories'])]]
hopefully this helps someone where you're looking for a substring in the column and not a full match.