Pandas DataFrame stored list as string: How to convert back to list?
You can directly use pandas -
import pandas as pd
df = pd.read_csv(DF_NAME, converters={'COLUMN_NAME': pd.eval})
This will read that column as a it's corresponding dtype in python instead of a string.
UPDATE:
As rightfully pointed out by @ctwardy in comments. It is wiser to use pd.eval
instead of eval
to avoid unintended regex related consequences. Details - https://realpython.com/python-eval-function/#minimizing-the-security-issues-of-eval
- Use
ast.literal_eval
to safely evaluate a string containing a Python literal or container datatype.It's part of the standard library
Using python's eval() vs. ast.literal_eval()? explains why
literal_eval
is safer than usingeval
.Examples:
literal_eval("[1.23, 2.34]")
worksliteral_eval("['KB4523205','KB4519569','KB4503308']")
works- Other answers mention
pd.eval
, but its usage is limited; it results inValueError: NumExpr 2 does not support Unicode as a dtype.
for this simple example.
- Other answers mention
literal_eval("[KB4523205, KB4519569, KB4503308]")
doesn't work (no quotes around thestr
values)- See pandas - convert string into list of strings for dealing with this representation.
- Convert the column when reading the file, by using the
converters
parameter ofpandas.read_csv
.
data in test.csv
col1
"[1.23, 2.34]"
"['KB4523205','KB4519569','KB4503308']"
Convert the column when creating the csv
from ast import literal_eval
import pandas as pd
# convert the column during import
df = pd.read_csv('test.csv', converters={'col1': literal_eval})
# display(df)
col1
0 [1.23, 2.34]
1 [KB4523205, KB4519569, KB4503308]
# check type
print(type(df.iloc[0, 0]))
list
print(type(df.iloc[1, 0]))
list
Convert the column of an existing dataframe
df.col1 = df.col1.apply(literal_eval)
%%timeit
pd.eval
is 28 times slower thanliteral_eval
- Given
test.csv
with 2,820,511 rows of"[1.23, 2.34]"
As you pointed out, this can commonly happen when saving and loading pandas DataFrames as .csv
files, which is a text format.
In your case this happened because list objects have a string representation, allowing them to be stored as .csv
files. Loading the .csv
will then yield that string representation.
If you want to store the actual objects, you should use DataFrame.to_pickle()
(note: objects must be picklable!).
To answer your second question, you can convert it back with ast.literal_eval
:
>>> from ast import literal_eval
>>> literal_eval('[1.23, 2.34]')
[1.23, 2.34]