select columns based on columns names containing a specific string in pandas
In case @Pedro answer doesn't work here is official way of doing it for pandas 0.25
Sample dataframe:
>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
... index=['mouse', 'rabbit'],
... columns=['one', 'two', 'three'])
one two three mouse 1 2 3 rabbit 4 5 6
Select columns by name
df.filter(items=['one', 'three'])
one three
mouse 1 3
rabbit 4 6
Select columns by regular expression
df.filter(regex='e$', axis=1) #ending with *e*, for checking containing just use it without *$* in the end
one three
mouse 1 3
rabbit 4 6
Select rows containing 'bbi'
df.filter(like='bbi', axis=0)
one two three
rabbit 4 5 6
You've several options, here's a couple:
1 - filter
with like
:
df.filter(like='alp')
2 - filter
with regex
:
df.filter(regex='alp')
alternative methods:
In [13]: df.loc[:, df.columns.str.startswith('alp')]
Out[13]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182
In [14]: df.loc[:, df.columns.str.contains('alp')]
Out[14]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182
option 1
Full numpy
+ pd.DataFrame
m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
pd.DataFrame(df.values[:, m], df.index, df.columns[m])
alp1 alp2
0 0.819189 0.356867
1 0.900406 0.968947
2 0.201382 0.658768
3 0.700727 0.946509
4 0.176423 0.290426
5 0.132773 0.378251
6 0.749374 0.983251
7 0.768689 0.415869
8 0.292140 0.457596
9 0.214937 0.976780
option 2numpy
+ loc
m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
df.loc[:, m]
alp1 alp2
0 0.819189 0.356867
1 0.900406 0.968947
2 0.201382 0.658768
3 0.700727 0.946509
4 0.176423 0.290426
5 0.132773 0.378251
6 0.749374 0.983251
7 0.768689 0.415869
8 0.292140 0.457596
9 0.214937 0.976780
timingnumpy
is faster