Split pandas dataframe column list values to duplicate rows

Since pandas 0.25.0 we have the explode method. First we duplicate the authors column and rename it at the same time using assign then we explode this column to rows and duplicate the other columns:

df.assign(author=df['authors']).explode('author')

Output

  publication_title                      authors         type   author
0           title_1  [author1, author2, author3]  proceedings  author1
0           title_1  [author1, author2, author3]  proceedings  author2
0           title_1  [author1, author2, author3]  proceedings  author3
1           title_2           [author4, author5]  collections  author4
1           title_2           [author4, author5]  collections  author5
2           title_3           [author6, author7]        books  author6
2           title_3           [author6, author7]        books  author7

If you want remove the duplicated index, use reset_index:

df.assign(author=df['authors']).explode('author').reset_index(drop=True)

Output

  publication_title                      authors         type   author
0           title_1  [author1, author2, author3]  proceedings  author1
1           title_1  [author1, author2, author3]  proceedings  author2
2           title_1  [author1, author2, author3]  proceedings  author3
3           title_2           [author4, author5]  collections  author4
4           title_2           [author4, author5]  collections  author5
5           title_3           [author6, author7]        books  author6
6           title_3           [author6, author7]        books  author7

Split pandas dataframe column list values to duplicate rows

Tags:

Pandas

Python 3.X

Dataframe

Related

Recent Posts