Sklearn changing string class label to int
you can use factorize method:
In [13]: df['fruit'] = pd.factorize(df['fruit'])[0].astype(np.uint16)
In [14]: df
Out[14]:
index fruit quantity price
0 0 0 5 0.99
1 1 0 2 0.99
2 2 1 4 0.89
3 4 2 1 1.64
4 10023 3 10 0.92
In [15]: df.dtypes
Out[15]:
index int64
fruit uint16
quantity int64
price float64
dtype: object
alternatively you can do it this way:
In [21]: df['fruit'] = df.fruit.astype('category').cat.codes
In [22]: df
Out[22]:
index fruit quantity price
0 0 0 5 0.99
1 1 0 2 0.99
2 2 3 4 0.89
3 4 1 1 1.64
4 10023 2 10 0.92
In [23]: df.dtypes
Out[23]:
index int64
fruit int8
quantity int64
price float64
dtype: object
Use factorize
and then convert to categorical
if necessary:
df.fruit = pd.factorize(df.fruit)[0]
print (df)
fruit quantity price
0 0 5 0.99
1 0 2 0.99
2 1 4 0.89
3 2 1 1.64
4 3 10 0.92
df.fruit = pd.Categorical(pd.factorize(df.fruit)[0])
print (df)
fruit quantity price
0 0 5 0.99
1 0 2 0.99
2 1 4 0.89
3 2 1 1.64
4 3 10 0.92
print (df.dtypes)
fruit category
quantity int64
price float64
dtype: object
Also if need count from 1
:
df.fruit = pd.Categorical(pd.factorize(df.fruit)[0] + 1)
print (df)
fruit quantity price
0 1 5 0.99
1 1 2 0.99
2 2 4 0.89
3 3 1 1.64
4 4 10 0.92
You can use sklearn.preprocessing
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df.fruit)
df['categorical_label'] = le.transform(df.fruit)
Transform labels back to original encoding.
le.inverse_transform(df['categorical_label'])