Select rows by partial string with query with pandas
The petobens solution now works with Query
without engine spec, what increases the speed, acconding the manual.
Uses contains
in query spec it's a powerful feature to handling string content because allow use regex.
import numpy as np
import pandas as pd
A = np.array(["Paulo", "Lucas", "Luana", "Larra", "BaLu","Bela"])
B = np.array([111, 222, 222, 333, 333, 777])
C = np.random.randint(10, 99, 6)
dt = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C'])
dt.set_index(['A', 'B'], inplace=True)
print(dt)
print("=============")
print(dt.query('A.str.contains("Lu")'))
print("=============")
print(dt.query('A.str.contains("L(a|u)", regex=True)'))
print("=============")
print(dt.query('A.str.contains("^L", regex=True)')) # starts with L
The result is
A B
1.1 Paulo 57
Lucas 49
3.3 Luana 38
Larra 82
5.5 BaLu 37
6.6 Bela 14
=============
C
A B
1.1 Lucas 49
3.3 Luana 38
5.5 BaLu 37
=============
C
A B
1.1 Lucas 49
3.3 Luana 38
Larra 82
5.5 BaLu 37
=============
C
A B
1.1 Lucas 49
3.3 Luana 38
Larra 82
This answer is out of date. Please check @petobens' answer.
As of version 0.20.2, query
doesn't support partial string matching. There is an open future request about it and one of the core developers seems to agree that it would be a nice addition.
The issue that @ayhan refers to now shows how this can be achieved by using query
's python engine:
print(df.query('name.str.contains("lu")', engine='python').head())
should work.