Select rows containing certain values from pandas dataframe

You can create a boolean mask from comparing the entire df against your string and call dropna passing param how='all' to drop rows where your string doesn't appear in all cols:

In [59]:
df[df == 'banana'].dropna(how='all')

Out[59]:
        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

To test for multiple values you can use multiple masks:

In [90]:
banana = df[(df=='banana')].dropna(how='all')
banana

Out[90]:
        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

In [91]:    
apple = df[(df=='apple')].dropna(how='all')
apple

Out[91]:
       A      B      C
1  apple    NaN    NaN
2    NaN    NaN  apple
4  apple  apple    NaN

You can use index.intersection to index just the common index values:

In [93]:
df.loc[apple.index.intersection(banana.index)]

Out[93]:
       A       B     C
1  apple  banana  pear

Introduction

At the heart of selecting rows, we would need a 1D mask or a pandas-series of boolean elements of length same as length of df, let's call it mask. So, finally with df[mask], we would get the selected rows off df following boolean-indexing.

Here's our starting df :

In [42]: df
Out[42]: 
        A       B      C
1   apple  banana   pear
2    pear    pear  apple
3  banana    pear   pear
4   apple   apple   pear

I. Match one string

Now, if we need to match just one string, it's straight-foward with elementwise equality :

In [42]: df == 'banana'
Out[42]: 
       A      B      C
1  False   True  False
2  False  False  False
3   True  False  False
4  False  False  False

If we need to look ANY one match in each row, use .any method :

In [43]: (df == 'banana').any(axis=1)
Out[43]: 
1     True
2    False
3     True
4    False
dtype: bool

To select corresponding rows :

In [44]: df[(df == 'banana').any(axis=1)]
Out[44]: 
        A       B     C
1   apple  banana  pear
3  banana    pear  pear

II. Match multiple strings

1. Search for ANY match

Here's our starting df :

In [42]: df
Out[42]: 
        A       B      C
1   apple  banana   pear
2    pear    pear  apple
3  banana    pear   pear
4   apple   apple   pear

NumPy's np.isin would work here (or use pandas.isin as listed in other posts) to get all matches from the list of search strings in df. So, say we are looking for 'pear' or 'apple' in df :

In [51]: np.isin(df, ['pear','apple'])
Out[51]: 
array([[ True, False,  True],
       [ True,  True,  True],
       [False,  True,  True],
       [ True,  True,  True]])

# ANY match along each row
In [52]: np.isin(df, ['pear','apple']).any(axis=1)
Out[52]: array([ True,  True,  True,  True])

# Select corresponding rows with masking
In [56]: df[np.isin(df, ['pear','apple']).any(axis=1)]
Out[56]: 
        A       B      C
1   apple  banana   pear
2    pear    pear  apple
3  banana    pear   pear
4   apple   apple   pear

2. Search for ALL match

Here's our starting df again :

In [42]: df
Out[42]: 
        A       B      C
1   apple  banana   pear
2    pear    pear  apple
3  banana    pear   pear
4   apple   apple   pear

So, now we are looking for rows that have BOTH say ['pear','apple']. We will make use of NumPy-broadcasting :

In [66]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1)
Out[66]: 
array([[ True,  True],
       [ True,  True],
       [ True, False],
       [ True,  True]])

So, we have a search list of 2 items and hence we have a 2D mask with number of rows = len(df) and number of cols = number of search items. Thus, in the above result, we have the first col for 'pear' and second one for 'apple'.

To make things concrete, let's get a mask for three items ['apple','banana', 'pear'] :

In [62]: np.equal.outer(df.to_numpy(copy=False),  ['apple','banana', 'pear']).any(axis=1)
Out[62]: 
array([[ True,  True,  True],
       [ True, False,  True],
       [False,  True,  True],
       [ True, False,  True]])

The columns of this mask are for 'apple','banana', 'pear' respectively.

Back to 2 search items case, we had earlier :

In [66]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1)
Out[66]: 
array([[ True,  True],
       [ True,  True],
       [ True, False],
       [ True,  True]])

Since, we are looking for ALL matches in each row :

In [67]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1).all(axis=1)
Out[67]: array([ True,  True, False,  True])

Finally, select rows :

In [70]: df[np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1).all(axis=1)]
Out[70]: 
       A       B      C
1  apple  banana   pear
2   pear    pear  apple
4  apple   apple   pear

For single search value

df[df.values  == "banana"]

 df[df.isin(['banana'])]

For multiple search terms:

  df[(df.values  == "banana")|(df.values  == "apple" ) ]

df[df.isin(['banana', "apple"])]

  #         A       B      C
  #  1   apple  banana    NaN
  #  2     NaN     NaN  apple
  #  3  banana     NaN    NaN
  #  4   apple   apple    NaN

From Divakar: lines with both are returned.

select_rows(df,['apple','banana'])

 #         A       B     C
 #   0  apple  banana  pear

Select rows containing certain values from pandas dataframe

Introduction

I. Match one string

II. Match multiple strings

Tags:

Python

Pandas

Related

Recent Posts