How to efficiently change data layout of a DataFrame in pandas?
You could use as_strided
:
from numpy.lib.stride_tricks import as_strided
window = 3
stride = df['a'].values.strides[0]
pd.DataFrame(as_strided(df['a'].values,
shape=(len(df) - window + 1, window),
strides = (stride,stride))
)
Output:
0 1 2
0 41 42 43
1 42 43 44
2 43 44 45
This should do the trick:
df = df.rename(columns={"b": "D", "a": "A"})
df["B"] = df["A"].shift(-1)
df["C"] = df["A"].shift(-2)
df["D"] = df["D"].shift(-2)
df = df.sort_index(axis=1)
Output:
A B C D
0 41 42.0 43.0 7.0
1 42 43.0 44.0 8.0
2 43 44.0 45.0 9.0
3 44 45.0 NaN NaN
4 45 NaN NaN NaN
You can use as_strided
:
stride = np.lib.stride_tricks.as_strided
window=3
v = stride(df.a, (len(df) - (window - 1), window), (df.a.values.strides * 2))
df=df.assign(**pd.DataFrame(v.tolist(),columns=list('ABC')).reindex(df.index))
df=df.assign(D=df.iloc[:,-1].map(df.set_index('a')['b']))
print(df)
a b A B C D
0 41 5 41.0 42.0 43.0 7.0
1 42 6 42.0 43.0 44.0 8.0
2 43 7 43.0 44.0 45.0 9.0
3 44 8 NaN NaN NaN NaN
4 45 9 NaN NaN NaN NaN