Get index of a row of a pandas dataframe as an integer
To answer the original question on how to get the index as an integer for the desired selection, the following will work :
df[df['A']==5].index.item()
The easier is add [0]
- select first value of list with one element:
dfb = df[df['A']==5].index.values.astype(int)[0]
dfbb = df[df['A']==8].index.values.astype(int)[0]
dfb = int(df[df['A']==5].index[0])
dfbb = int(df[df['A']==8].index[0])
But if possible some values not match, error is raised, because first value not exist.
Solution is use next
with iter
for get default parameetr if values not matched:
dfb = next(iter(df[df['A']==5].index), 'no match')
print (dfb)
4
dfb = next(iter(df[df['A']==50].index), 'no match')
print (dfb)
no match
Then it seems need substract 1
:
print (df.loc[dfb:dfbb-1,'B'])
4 0.894525
5 0.978174
6 0.859449
Name: B, dtype: float64
Another solution with boolean indexing
or query
:
print (df[(df['A'] >= 5) & (df['A'] < 8)])
A B
4 5 0.894525
5 6 0.978174
6 7 0.859449
print (df.loc[(df['A'] >= 5) & (df['A'] < 8), 'B'])
4 0.894525
5 0.978174
6 0.859449
Name: B, dtype: float64
print (df.query('A >= 5 and A < 8'))
A B
4 5 0.894525
5 6 0.978174
6 7 0.859449
The nature of wanting to include the row where A == 5
and all rows upto but not including the row where A == 8
means we will end up using iloc
(loc
includes both ends of slice).
In order to get the index labels we use idxmax
. This will return the first position of the maximum value. I run this on a boolean series where A == 5
(then when A == 8
) which returns the index value of when A == 5
first happens (same thing for A == 8
).
Then I use searchsorted
to find the ordinal position of where the index label (that I found above) occurs. This is what I use in iloc
.
i5, i8 = df.index.searchsorted([df.A.eq(5).idxmax(), df.A.eq(8).idxmax()])
df.iloc[i5:i8]
numpy
you can further enhance this by using the underlying numpy objects the analogous numpy functions. I wrapped it up into a handy function.
def find_between(df, col, v1, v2):
vals = df[col].values
mx1, mx2 = (vals == v1).argmax(), (vals == v2).argmax()
idx = df.index.values
i1, i2 = idx.searchsorted([mx1, mx2])
return df.iloc[i1:i2]
find_between(df, 'A', 5, 8)
timing