How to treat NULL as a normal string with pandas?
You can specify a converters
argument for the string
column.
pd.read_csv(StringIO(data), converters={'strings' : str})
strings numbers
0 foo 1
1 bar 2
2 null 3
This will by-pass pandas' automatic parsing.
Another option is setting na_filter=False
:
pd.read_csv(StringIO(data), na_filter=False)
strings numbers
0 foo 1
1 bar 2
2 null 3
This works for the entire DataFrame, so use with caution. I recommend first option if you want to surgically apply this to select columns instead.
The reason this happens is that the string 'null'
is treated as NaN
on parsing, you can turn this off by passing keep_default_na=False
in addition to @coldspeed's answer:
In[49]:
data = u'strings,numbers\nfoo,1\nbar,2\nnull,3'
df = pd.read_csv(io.StringIO(data), keep_default_na=False)
df
Out[49]:
strings numbers
0 foo 1
1 bar 2
2 null 3
The full list is:
na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.