Pandas - select column using other column value as column name
Solution using pd.factorize
(from https://github.com/pandas-dev/pandas/issues/39171#issuecomment-773477244):
idx, cols = pd.factorize(df['names'])
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
You can use lookup
:
df['new_col'] = df.lookup(df.index, df.names)
df
# a b names new_col
#0 1 -1 a 1
#1 2 -2 b -2
#2 3 -3 a 3
#3 4 -4 b -4
EDIT
lookup
has been deprecated, here's the currently recommended solution:
idx, cols = pd.factorize(df['names'])
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Because DataFrame.lookup
is deprecated as of Pandas 1.2.0, the following is what I came up with using DataFrame.melt
:
df['new_col'] = df.melt(id_vars='names', value_vars=['a', 'b'], ignore_index=False).query('names == variable').loc[df.index, 'value']
Output:
>>> df
a b names new_col
0 1 -1 a 1
1 2 -2 b -2
2 3 -3 a 3
3 4 -4 b -4
Can this be simplified? For correctness, the index must not be ignored.
Additional reference:
- Looking up values by index/column labels (archive)