Using in operator with Pandas series
In the first case:
Because the in
operator is interpreted as a call to df['name'].__contains__('Adam')
. If you look at the implementation of __contains__
in pandas.Series
, you will find that it's the following (inhereted from pandas.core.generic.NDFrame
) :
def __contains__(self, key):
"""True if the key is in the info axis"""
return key in self._info_axis
so, your first use of in
is interpreted as:
'Adam' in df['name']._info_axis
This gives False
, expectedly, because df['name']._info_axis
actually contains information about the range/index
and not the data itself:
In [37]: df['name']._info_axis
Out[37]: RangeIndex(start=0, stop=3, step=1)
In [38]: list(df['name']._info_axis)
Out[38]: [0, 1, 2]
In the second case:
'Adam' in list(df['name'])
The use of list
, converts the pandas.Series
to a list of the values. So, the actual operation is this:
In [42]: list(df['name'])
Out[42]: ['Adam', 'Ben', 'Chris']
In [43]: 'Adam' in ['Adam', 'Ben', 'Chris']
Out[43]: True
Here are few more idiomatic ways to do what you want (with the associated speed):
In [56]: df.name.str.contains('Adam').any()
Out[56]: True
In [57]: timeit df.name.str.contains('Adam').any()
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 144 µs per loop
In [58]: df.name.isin(['Adam']).any()
Out[58]: True
In [59]: timeit df.name.isin(['Adam']).any()
The slowest run took 5.13 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 191 µs per loop
In [60]: df.name.eq('Adam').any()
Out[60]: True
In [61]: timeit df.name.eq('Adam').any()
10000 loops, best of 3: 178 µs per loop
Note: the last way is also suggested by @Wen in the comment above