pandas get mapping of categories to integer value

I use:

dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

# {'a': 0, 'b': 1, 'c': 2}

Edited answer (removed cat.categories and changed list to dict):

>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

{0: 'a', 1: 'b', 2: 'c'}

The original answer which some of the comments are referring to:

>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

[(0, 'a'), (1, 'b'), (2, 'c')]

As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].

If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:

To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).
If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.

See pandas: Categorical Data

Usage of from_codes

As on official documentation, it makes a Categorical type from codes and categories arrays.

splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
print splitter
print s

gives

[0 1 1 0 0]
0    train
1     test
2     test
3    train
4    train
dtype: category
Categories (2, object): [train, test]

For your codes

# after your previous conversion
print df['col2']
# apply from_codes, the 2nd argument is the categories from mapping dict
s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
print s

gives

0    0
1    1
2    2
3    0
4    1
Name: col2, dtype: int8
0    a
1    b
2    c
3    a
4    b
dtype: category
Categories (5, object): [a, b, c, d, e]

pandas get mapping of categories to integer value

Tags:

Python

Pandas

Related

Recent Posts