Feature names from OneHotEncoder
A list with the original column names can be passed to get_feature_names
.
>>> encoder.get_feature_names(['Sex', 'AgeGroup'])
array(['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15',
'AgeGroup_30', 'AgeGroup_45', 'AgeGroup_60', 'AgeGroup_75'],
dtype=object)
- DEPRECATED:
get_feature_names
is deprecated in 1.0 and will be removed in 1.2. Please useget_feature_names_out
instead.- As per
sklearn.preprocessing.OneHotEncoder
.
- As per
>>> encoder.get_feature_names_out(['Sex', 'AgeGroup'])
array(['Sex_female', 'Sex_male', 'AgeGroup_0', 'AgeGroup_15',
'AgeGroup_30', 'AgeGroup_45', 'AgeGroup_60', 'AgeGroup_75'],
dtype=object)
type(train_X_encoded)
→scipy.sparse.csr.csr_matrix
- Use
pandas.DataFrame.sparse.from_spmatrix
to load a sparse matrix, otherwise convert to a dense matrix and load withpandas.DataFrame
.
- Use
# pandas.DataFrame.sparse.from_spmatrix will load this sparse matrix
>>> print(train_X_encoded)
(0, 1) 1.0
(0, 2) 1.0
(1, 0) 1.0
(1, 3) 1.0
(2, 1) 1.0
(2, 4) 1.0
(3, 0) 1.0
(3, 5) 1.0
(4, 1) 1.0
(4, 6) 1.0
(5, 0) 1.0
(5, 7) 1.0
# pandas.DataFrame will load this dense matrix
>>> print(train_X_encoded.todense())
[[0. 1. 1. 0. 0. 0. 0. 0.]
[1. 0. 0. 1. 0. 0. 0. 0.]
[0. 1. 0. 0. 1. 0. 0. 0.]
[1. 0. 0. 0. 0. 1. 0. 0.]
[0. 1. 0. 0. 0. 0. 1. 0.]
[1. 0. 0. 0. 0. 0. 0. 1.]]
import pandas as pd
column_name = encoder.get_feature_names_out(['Sex', 'AgeGroup'])
one_hot_encoded_frame = pd.DataFrame.sparse.from_spmatrix(train_X_encoded, columns=column_name)
# display(one_hot_encoded_frame)
Sex_female Sex_male AgeGroup_0 AgeGroup_15 AgeGroup_30 AgeGroup_45 AgeGroup_60 AgeGroup_75
0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0
1 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
2 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0
3 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
4 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
5 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
- From
scikit-learn v1.0
useget_feature_names_out
instead ofget_feature_names