How to filter rows containing a string pattern from a Pandas dataframe
>>> mask = df['ids'].str.contains('ball')
>>> mask
0 True
1 True
2 False
3 True
Name: ids, dtype: bool
>>> df[mask]
ids vals
0 aball 1
1 bball 2
3 fball 4
df[df['ids'].str.contains('ball', na = False)] # valid for (at least) pandas version 0.17.1
Step-by-step explanation (from inner to outer):
df['ids']
selects theids
column of the data frame (technically, the objectdf['ids']
is of typepandas.Series
)df['ids'].str
allows us to apply vectorized string methods (e.g.,lower
,contains
) to the Seriesdf['ids'].str.contains('ball')
checks each element of the Series as to whether the element value has the string 'ball' as a substring. The result is a Series of Booleans indicatingTrue
orFalse
about the existence of a 'ball' substring.df[df['ids'].str.contains('ball')]
applies the Boolean 'mask' to the dataframe and returns a view containing appropriate records.na = False
removes NA / NaN values from consideration; otherwise a ValueError may be returned.
In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
ids vals
0 aball 1
1 bball 2
3 fball 4