How to treat NULL as a normal string with pandas?

You can specify a converters argument for the string column.

pd.read_csv(StringIO(data), converters={'strings' : str})

  strings  numbers
0     foo        1
1     bar        2
2    null        3

This will by-pass pandas' automatic parsing.

Another option is setting na_filter=False:

pd.read_csv(StringIO(data), na_filter=False)

  strings  numbers
0     foo        1
1     bar        2
2    null        3

This works for the entire DataFrame, so use with caution. I recommend first option if you want to surgically apply this to select columns instead.

The reason this happens is that the string 'null' is treated as NaN on parsing, you can turn this off by passing keep_default_na=False in addition to @coldspeed's answer:

In[49]:
data = u'strings,numbers\nfoo,1\nbar,2\nnull,3'
df = pd.read_csv(io.StringIO(data), keep_default_na=False)
df

Out[49]: 
  strings  numbers
0     foo        1
1     bar        2
2    null        3

The full list is:

na_values : scalar, str, list-like, or dict, default None

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

How to treat NULL as a normal string with pandas?

Tags:

Python

Pandas

Csv

String

Dataframe

Related

Recent Posts