Python Pandas - Reshape Dataframe
Data is transformed every three rows : we can use numpy's reshape method to transform the data, and create a cartesian product of range(1,3) with the columns to get the new columns :
from itertools import product
row = len(df)//3
#create new columns
new_columns = df.columns.union(["_".join((letter,str(num)))
for letter,num in product(df.columns,range(1,3))],
sort=False)
#create new dataframe
new_df = pd.DataFrame(np.reshape(df.to_numpy(),(row,-1)),
columns=new_columns)
new_df
A B C A_1 A_2 B_1 B_2 C_1 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
One idea is create MultiIndex with integer and modulo division and reshape by DataFrame.unstack
:
a = np.arange(len(df))
df.index = [a // 3, a % 3]
df = df.unstack().sort_index(axis=1, level=1)
df.columns = [f'{a}_{b}' for a, b in df.columns]
print (df)
A_0 B_0 C_0 A_1 B_1 C_1 A_2 B_2 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
For reverse operation is possible use str.split
with DataFrame.stack
:
a = np.arange(len(df))
df1 = (df.set_index(pd.MultiIndex.from_arrays([a // 3, a % 3]))
.unstack().sort_index(axis=1, level=1))
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
print (df1)
A_0 B_0 C_0 A_1 B_1 C_1 A_2 B_2 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
df1.columns = df1.columns.str.split('_', expand=True)
df2 = df1.stack().reset_index(drop=True)
print (df2)
A B C
0 1 4 6
1 2 5 7
2 3 6 8
3 11 14 16
4 12 15 17
5 13 16 18
You can try this:
pd.DataFrame(
data=df.values.reshape([-1, df.values.shape[1]*3]),
columns=list(df.columns) + sum([[c+'_'+str(i) for c in df.columns] for i in range(1, 3)], [])
)
Output for your input dataframe:
A B C A_1 B_1 C_1 A_2 B_2 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
new = pd.concat([df[a::3].reset_index(drop=True) for a in range(3)], axis=1)
new.columns = ['{}_{}'.format(a,b) for b in range(3) for a in 'ABC']