change Pandas dataframe column order in place
Their is no easy way to do this without making a copy. In theory it is possible to do if you ONLY have a single dtype (or are only changing columns WITHIN out the labels changing dtypes). But is fairly complicated, and hence is not implemented.
That said, if you are careful you can do this. You should ONLY do this with a single-dtyped frame (you are forewarned).
In [22]: df = DataFrame(np.random.randn(5,3),columns=list('ABC'))
In [23]: df
Out[23]:
A B C
0 -0.696593 -0.459067 1.935033
1 1.783658 0.612771 1.553773
2 -0.572515 0.634174 0.113974
3 -0.908203 1.454289 0.509968
4 0.776575 1.629816 1.630023
If df
is multi-dtyped then df.values WILL NOT BE A VIEW (of course you can subselect out the single-dtyped frame which is a view itself). Another note, this is NOT ALWAYS POSSIBLE to have this come out as a view. It depends on what you are doing, YMMV.
e.g. df.values.take([2,0,1],axis=1)
gives you the same result BUT IS A COPY.
In [24]: df2 = DataFrame(df.values[:,[2,0,1]],columns=list('ABC'))
In [25]: df2
Out[25]:
A B C
0 1.935033 -0.696593 -0.459067
1 1.553773 1.783658 0.612771
2 0.113974 -0.572515 0.634174
3 0.509968 -0.908203 1.454289
4 1.630023 0.776575 1.629816
We have a view on the original values
In [26]: df2.values.base
Out[26]:
array([[ 1.93503267, 1.55377291, 0.1139739 , 0.5099681 , 1.63002264],
[-0.69659276, 1.78365777, -0.5725148 , -0.90820288, 0.7765751 ],
[-0.45906706, 0.61277136, 0.63417392, 1.45428912, 1.62981613]])
Note that if you then assign to df2 (another float column for instance), you will trigger a copy. So you have to be extremely careful with this.
That said the creation from a view of another frame takes almost no memory and is just a pointer, so very fast.
Hmm... no one proposed drop and insert:
df = pd.DataFrame([['a','b','c']],columns=list('ABC'))
print('Before', id(df))
for i,col in enumerate(['C','B', 'A']):
tmp = df[col]
df.drop(labels=[col],axis=1,inplace=True)
df.insert(i,col,tmp)
print('After ', id(df))
df.head()
The result will preserve the original dataframe
Before 140441780394360
After 140441780394360
C B A
----------
0 c b a