In pandas, how to set_index with using column index instead of referring to column names?
If the column index is unique you could use:
df.set_index(list(df.columns[cols]))
where cols
is a list of ordinal indices.
For example,
In [77]: np.random.seed(2016)
In [79]: df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('ABCD'))
In [80]: df
Out[80]:
A B C D
0 3 7 2 3
1 8 4 8 7
2 9 2 6 3
3 4 1 9 1
4 2 2 8 9
In [81]: df.set_index(list(df.columns[[0,2]]))
Out[81]:
B D
A C
3 2 7 3
8 8 4 7
9 6 2 3
4 9 1 1
2 8 2 9
If the DataFrame's column index is not unique, then setting the index by label is impossible and by ordinals more complicated:
import numpy as np
import pandas as pd
np.random.seed(2016)
def set_ordinal_index(df, cols):
columns, df.columns = df.columns, np.arange(len(df.columns))
mask = df.columns.isin(cols)
df = df.set_index(cols)
df.columns = columns[~mask]
df.index.names = columns[mask]
return df
df = pd.DataFrame(np.random.randint(10, size=(5,4)), columns=list('AAAA'))
print(set_ordinal_index(df, [0,2]))
yields
A A
A A
3 2 7 3
8 8 4 7
9 6 2 3
4 9 1 1
2 8 2 9