Convert Pandas series containing string to boolean
You can just use map
:
In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
'SomethingElse']})
In [8]: df
Out[8]:
Status
0 Delivered
1 Delivered
2 Undelivered
3 SomethingElse
In [9]: d = {'Delivered': True, 'Undelivered': False}
In [10]: df['Status'].map(d)
Out[10]:
0 True
1 True
2 False
3 NaN
Name: Status, dtype: object
An example of replace
method to replace values only in the specified column C2
and get result as DataFrame
type.
import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})
C1 C2
0 X Y
1 Y Y
2 X X
3 Y X
df.replace({'C2': {'X': True, 'Y': False}})
C1 C2
0 X False
1 Y False
2 X True
3 Y True
You've got everything you need. You'll be happy to discover replace
:
df.replace(d)
Expanding on the previous answers:
Map method explained:
- Pandas will lookup each row's value in the corresponding
d
dictionary, replacing any found keys with values fromd
. - Values without keys in
d
will be set asNaN
. This can be corrected withfillna()
methods. - Does not work on multiple columns, since pandas operates through serialization of
pd.Series
here. - Documentation: pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)
Replace method explained:
- Pandas will lookup each row's value in the corresponding
d
dictionary, and attempt to replace any found keys with values fromd
. - Values without keys in
d
will be be retained. - Works with single and multiple columns (
pd.Series
orpd.DataFrame
objects). - Documentation: pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)
Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.