4 ways to select rows from a DataFrame based on column values

There are several ways to select rows from a Pandas dataframe:

Boolean indexing (DataFrame[DataFrame['col'] == value])
Positional indexing (DataFrame.iloc[...])
Label indexing (DataFrame.xs(...))
DataFrame.query(...) API

Below I show you examples of each, with advice when to use certain techniques. Assume our criterion is column 'A' == 'bar'

Setup

The first thing we'll need is to identify a condition that will act as our criterion for selecting rows. We'll start with the OP's case column_name == some_value, and include some other common use cases.

An example would be:

Click to copy


import pandas as pd, numpy as np

df = pd.DataFrame({
    'A':
    'Contrary bar popular bar Lorem bar Ipsum is not simply'.split(),
    'B':
    'Lorem Ipsum comes from sections one two three four five'.split(),
    'C':
    np.arange(10),
    'D':
    np.arange(10) * 2
})

print(df)

Output:

Click to copy

          A         B  C   D
0  Contrary     Lorem  0   0
1       bar     Ipsum  1   2
2   popular     comes  2   4
3       bar      from  3   6
4     Lorem  sections  4   8
5       bar       one  5  10
6     Ipsum       two  6  12
7        is     three  7  14
8       not      four  8  16
9    simply      five  9  18

Method 1: Boolean indexing (DataFrame[DataFrame['col'] == value] ) #

This is one of the simplest ways to accomplish this task and if performance or intuitiveness isn't an issue, this should be your chosen method. However, if performance is a concern, then you might want to consider an alternative way of creating the mask.

Click to copy

mask = df['A'] == 'bar'
print(mask)

Output:

Click to copy

     A      B  C   D
1  bar  Ipsum  1   2
3  bar   from  3   6
5  bar    one  5  10

Method 2: Positional indexing (DataFrame.iloc[...]) #

Positional indexing (DataFrame.iloc[...]) has its use cases, but this isn't one of them. In order to identify where to slice, we first need to perform the same boolean analysis we did above. This leaves us performing one extra step to accomplish the same task.

Click to copy

mask = df['A'] == 'bar'
pos = np.flatnonzero(mask)
print(df.iloc[pos])

Output as same as the previous method:

Click to copy

     A      B  C   D
1  bar  Ipsum  1   2
3  bar   from  3   6
5  bar    one  5  10

Method 3: Label indexing (DataFrame.xs(...)) #

Label indexing can be very handy, but in this case, we are again doing more work for no benefit

Click to copy

res = df.set_index('A', append=True, drop=False).xs('bar', level=1)

print(res)

Output as same as the previous methods:

Click to copy

     A      B  C   D
1  bar  Ipsum  1   2
3  bar   from  3   6
5  bar    one  5  10

Method 4: DataFrame.query(...) API #

pd.DataFrame.query is a very elegant/intuitive way to perform this task, but is often slower. However, if you pay attention to the timings below, for large data, the query is very efficient. More so than the standard approach and of similar magnitude as my best suggestion.

Click to copy

res = df.query('A == "bar"')

print(res)

Output as same as the previous methods:

Click to copy

     A      B  C   D
1  bar  Ipsum  1   2
3  bar   from  3   6
5  bar    one  5  10

4 ways to select rows from a DataFrame based on column values

Method 1: Boolean indexing (DataFrame[DataFrame['col'] == value] ) #

Method 2: Positional indexing (DataFrame.iloc[...]) #

Method 3: Label indexing (DataFrame.xs(...)) #

Method 4: DataFrame.query(...) API #

Tags:

Python

Pandas

Related

Recent Posts