Repeat rows in a pandas DataFrame based on column value
Not enough reputation to comment, but building on @cs95's answer and @lmiguelvargasf's comment, one can preserve dtypes with:
pd.DataFrame(
df.values.repeat(df.persons, axis=0),
columns=df.columns,
).astype(df.dtypes)
reindex
+ repeat
df.reindex(df.index.repeat(df.persons))
Out[951]:
code . role ..1 persons
0 123 . Janitor . 3
0 123 . Janitor . 3
0 123 . Janitor . 3
1 123 . Analyst . 2
1 123 . Analyst . 2
2 321 . Vallet . 2
2 321 . Vallet . 2
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
3 321 . Auditor . 5
PS: you can add.reset_index(drop=True)
to get the new index
Wen's solution is really nice and intuitive, however it will fail for duplicate rows by throwing ValueError: cannot reindex from a duplicate axis
.
Here's an alternative which avoids this by calling repeat
on df.values
.
df
code role persons
0 123 Janitor 3
1 123 Analyst 2
2 321 Vallet 2
3 321 Auditor 5
pd.DataFrame(df.values.repeat(df.persons, axis=0), columns=df.columns)
code role persons
0 123 Janitor 3
1 123 Janitor 3
2 123 Janitor 3
3 123 Analyst 2
4 123 Analyst 2
5 321 Vallet 2
6 321 Vallet 2
7 321 Auditor 5
8 321 Auditor 5
9 321 Auditor 5
10 321 Auditor 5
11 321 Auditor 5