sklearn.compose.ColumnTransformer: fit_transform() takes 2 positional arguments but 3 were given
There are two major reasons why this will not work for your purpose.
LabelEncoder()
is desinged to be used for the target variable (y). That is the reason for getting the positional argument error, whencolumnTransformer()
tries to feedX, y=None, fit_params={}
.
From Documentation:
Encode labels with value between 0 and n_classes-1.
fit(y)
Fit label encoderParameters:
y : array-like of shape (n_samples,)
Target values.
- Even if you do a workaround to remove the empty dictionary, then also
LabelEncoder()
cannot take 2D array (basically multiple features at a time) because it takes only 1Dy
values.
Short answer - we should not be using LabelEncoder()
for input features.
Now, what is the solution to encode the input features?
Use OrdinalEncoder()
if your features are ordinal features or OneHotEncoder()
in case of nominal features.
Example:
>>> from sklearn.compose import ColumnTransformer
>>> from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder
>>> X = np.array([[1000., 100., 'apple', 'green'],
... [1100., 100., 'orange', 'blue']])
>>> ct = ColumnTransformer(
... [("ordinal", OrdinalEncoder(), [0, 1]),
("nominal", OneHotEncoder(), [2, 3])])
>>> ct.fit_transform(X)
array([[0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 1., 0.]])
I believe this is actually an issue with LabelEncoder
. The LabelEncoder.fit
method only accepts self
, and y
as arguments (which is odd as most transformer objects have the paradigm of fit(X, y=None, **fit_params)
). Anyway, in pipeline the transformer gets called with fit_params
regardless of what you have passed. In this particular situation the exact arguments passed to LabelEncoder.fit
are X
and an empty dictionary {}
. Thus raising the error.
From my point of view this is a bug in LabelEncoder
, but you should take that up with the sklearn folks as they may have some reason for implementing the fit
method differently.