Pandas DataFrame stored list as string: How to convert back to list?

You can directly use pandas -

import pandas as pd
df = pd.read_csv(DF_NAME, converters={'COLUMN_NAME': pd.eval})

This will read that column as a it's corresponding dtype in python instead of a string.

UPDATE:

As rightfully pointed out by @ctwardy in comments. It is wiser to use pd.eval instead of eval to avoid unintended regex related consequences. Details - https://realpython.com/python-eval-function/#minimizing-the-security-issues-of-eval


  • Use ast.literal_eval to safely evaluate a string containing a Python literal or container datatype.
    • It's part of the standard library

    • Using python's eval() vs. ast.literal_eval()? explains why literal_eval is safer than using eval.

    • Examples:

      • literal_eval("[1.23, 2.34]") works
      • literal_eval("['KB4523205','KB4519569','KB4503308']") works
        • Other answers mention pd.eval, but its usage is limited; it results in ValueError: NumExpr 2 does not support Unicode as a dtype. for this simple example.
      • literal_eval("[KB4523205, KB4519569, KB4503308]") doesn't work (no quotes around the str values)
        • See pandas - convert string into list of strings for dealing with this representation.
  • Convert the column when reading the file, by using the converters parameter of pandas.read_csv.

data in test.csv

col1
"[1.23, 2.34]"
"['KB4523205','KB4519569','KB4503308']"

Convert the column when creating the csv

from ast import literal_eval
import pandas as pd

# convert the column during import
df = pd.read_csv('test.csv', converters={'col1': literal_eval})

# display(df)
                                col1
0                       [1.23, 2.34]
1  [KB4523205, KB4519569, KB4503308]

# check type
print(type(df.iloc[0, 0]))
list

print(type(df.iloc[1, 0]))
list

Convert the column of an existing dataframe

df.col1 = df.col1.apply(literal_eval)

%%timeit

  • pd.eval is 28 times slower than literal_eval
  • Given test.csv with 2,820,511 rows of "[1.23, 2.34]"

enter image description here


As you pointed out, this can commonly happen when saving and loading pandas DataFrames as .csv files, which is a text format.

In your case this happened because list objects have a string representation, allowing them to be stored as .csv files. Loading the .csv will then yield that string representation.

If you want to store the actual objects, you should use DataFrame.to_pickle() (note: objects must be picklable!).

To answer your second question, you can convert it back with ast.literal_eval:

>>> from ast import literal_eval
>>> literal_eval('[1.23, 2.34]')
[1.23, 2.34]