Split pandas dataframe column list values to duplicate rows
Since pandas 0.25.0
we have the explode
method. First we duplicate the authors
column and rename it at the same time using assign
then we explode this column to rows and duplicate the other columns:
df.assign(author=df['authors']).explode('author')
Output
publication_title authors type author
0 title_1 [author1, author2, author3] proceedings author1
0 title_1 [author1, author2, author3] proceedings author2
0 title_1 [author1, author2, author3] proceedings author3
1 title_2 [author4, author5] collections author4
1 title_2 [author4, author5] collections author5
2 title_3 [author6, author7] books author6
2 title_3 [author6, author7] books author7
If you want remove the duplicated index, use reset_index
:
df.assign(author=df['authors']).explode('author').reset_index(drop=True)
Output
publication_title authors type author
0 title_1 [author1, author2, author3] proceedings author1
1 title_1 [author1, author2, author3] proceedings author2
2 title_1 [author1, author2, author3] proceedings author3
3 title_2 [author4, author5] collections author4
4 title_2 [author4, author5] collections author5
5 title_3 [author6, author7] books author6
6 title_3 [author6, author7] books author7