sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()
It looks like sklearn requires the data shape of (row number, column number).
If your data shape is (row number, ) like (999, )
, it does not work.
By using numpy.reshape()
, you should change the shape of the array to (999, 1)
, e.g. using
data=data.reshape((999,1))
In my case, it worked with that.
Looks like you are using pandas dataframe (from the name df2).
You could also do the following:
regr = LinearRegression()
regr.fit(df2.iloc[1:1000, 5].to_frame(), df2.iloc[1:1000, 2].to_frame())
NOTE: I have removed "values" as that converts the pandas Series to numpy.ndarray and numpy.ndarray does not have attribute to_frame().
Seen on the Udacity deep learning foundation course:
df = pd.read_csv('my.csv')
...
regr = LinearRegression()
regr.fit(df[['column x']], df[['column y']])