Error "Expected 2D array, got 1D array instead" Using OneHotEncoder
At the moment that will change the categorical features, you need to add another pair of brackets:
X[:, 0] = pd.DataFrame(onehotencoder1.fit_transform(X[[:, 0]]).toarray())
This is an issue in sklearn OneHotEncoder raised in https://github.com/scikit-learn/scikit-learn/issues/3662. Most scikit learn estimators need a 2D array rather than a 1D array.
The standard practice is to include a multidimensional array. Since you have specified which column to consider as categorical for onehotencoding in categorical_features = [0]
, you can rewrite the next line as the following to take whole dataset or a part of it. It will consider only the first column for categorical to dummy transformation while still have a multidimensional array to work with.
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = onehotencoder1.fit_transform(X).toarray()
(I hope your dataset doesn't have anymore categorical values. I'll advise you to labelencode everything first, then onehotencode.
I got the same error and after the error message there's a suggestion as followed:
"Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
Since my data was an array, i used X.values.reshape(-1,1)
and it works. (There was another suggestion to use X.values.reshape
instead of X.reshape
).
I came across a fix by adding
X=X.reshape(-1,1)
the error appears to be gone now, but not sure if this is the right way to fix this