How to duplicate rows in pandas, based on items in a list

You could write a simple cleaning function to make it a list (assuming it's not a list of commas, and you can't simply use ast.literal_eval):

def clean_string_to_list(s):
    return [c for c in s if c not in '[,]']  # you might need to catch errors

df['data'] = df['data'].apply(clean_string_to_list)

Iterating through the rows seems like a reasonable choice:

In [11]: pd.DataFrame([(row['COL'], d)
                       for d in row['data']
                       for _, row in df.iterrows()],
                       columns=df.columns)
Out[11]:
     COL data
0  line1    A
1  line1    B
2  line1    C

I'm afraid I don't think pandas caters specifically for this kind of manipulation.


You can use df.explode() option. Refer to the documentation. I believe this is exactly the functionality you need.

Tags:

Python

Pandas