How to lowercase a pandas dataframe string column if it has missing values?
Another possible solution, in case the column has not only strings but numbers too, is to use astype(str).str.lower()
or to_string(na_rep='')
because otherwise, given that a number is not a string, when lowered it will return NaN
, therefore:
import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan,2],columns=['x'])
xSecureLower = df['x'].to_string(na_rep='').lower()
xLower = df['x'].str.lower()
then we have:
>>> xSecureLower
0 one
1 two
2
3 2
Name: x, dtype: object
and not
>>> xLower
0 one
1 two
2 NaN
3 NaN
Name: x, dtype: object
edit:
if you don't want to lose the NaNs, then using map will be better, (from @wojciech-walczak, and @cs95 comment) it will look something like this
xSecureLower = df['x'].map(lambda x: x.lower() if isinstance(x,str) else x)
use pandas vectorized string methods; as in the documentation:
these methods exclude missing/NA values automatically
.str.lower()
is the very first example there;
>>> df['x'].str.lower()
0 one
1 two
2 NaN
Name: x, dtype: object
you can try this one also,
df= df.applymap(lambda s:s.lower() if type(s) == str else s)