sklearn-LinearRegression: could not convert string to float: '--'

A quick solution would involve using pd.to_numeric to convert whatever strings your data might contain to numeric values. If they're incompatible with conversion, they'll be reduced to NaNs.

from sklearn.linear_model import LinearRegression

X = X.apply(pd.to_numeric, errors='coerce')
Y = Y.apply(pd.to_numeric, errors='coerce')

Furthermore, you can choose to fill those values with some default:

X.fillna(0, inplace=True)
Y.fillna(0, inplace=True)

Replace the fill value with whatever's relevant to your problem. I don't recommend dropping these rows, because you might end up dropping different rows from X and Y causing a data-label mismatch.

Finally, split and call your classifier:

X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)
clf = LinearRegression().fit(X_train, y_train)

sklearn-LinearRegression: could not convert string to float: '--'

Tags:

Python

Pandas

Scikit Learn

Valueerror

Related

Recent Posts