Significant mismatch between `r2_score` of `scikit-learn` and the R^2 calculation

I think you have misinterpreted wikipedia. The example on wikipedia does not state:

y=[1,2,3,4,5]
f=[1.9, 3.7, 5.8, 8.0, 9.6]
R^2 = 0.998

Instead, it says that the R^2 for a linear least-squares fit to the data:

x=[1,2,3,4,5]
y=[1.9, 3.7, 5.8, 8.0, 9.6]

is equal to 0.998

Consider this script, which first uses np.linalg.lstsq to find the least squares fit, and the uses both methods to find an R^2 of 0.998 for both:

import numpy as np
from sklearn.metrics import r2_score

x=np.arange(1,6,1)
y=np.array([1.9, 3.7, 5.8, 8.0, 9.6])

A=np.vstack([x, np.ones(len(x))]).T

# Use numpy's least squares function
m, c = np.linalg.lstsq(A, y)[0]

print m,c
# 1.97 -0.11

# Define the values of our least squares fit
f=m*x+c

print f
# [ 1.86  3.83  5.8   7.77  9.74]

# Calculate R^2 explicitly
yminusf2=(y-f)**2
sserr=sum(yminusf2)
mean=float(sum(y))/float(len(y))
yminusmean2=(y-mean)**2
sstot=sum(yminusmean2)
R2=1.-(sserr/sstot)

print R2
# 0.99766066838

# Use scikit
print r2_score(y,f)
# 0.99766066838

r2_score(y,f) == R2
# True

The referred question is correct -- if you work through the calculation for the residual sum of squares and the total sum of squares, you get the same value as sklearn:

In [85]: import numpy as np

In [86]: y = [1,2,3,4,5]

In [87]: f = [1.9, 3.7, 5.8, 8.0, 9.6]

In [88]: SSres = sum(map(lambda x: (x[0]-x[1])**2, zip(y, f)))

In [89]: SStot = sum([(x-np.mean(y))**2 for x in y])

In [90]: SSres, SStot
Out[90]: (48.699999999999996, 10.0)

In [91]: 1-(SSres/SStot)
Out[91]: -3.8699999999999992

The idea behind a negative value is that you'd have been closer to the actual values had you just predicted the mean each time (which would correspond to an r2 = 0).


The coefficient of determination effectively compares the variance in the data to the variance in the residual. The residual is the difference between the predicted and observed value and its variance is the sum of squares of this difference.

If the prediction is perfect, the variance of the residual is zero. Hence, the coefficient of determination is one. If the prediction is not perfect some of the residuals are non-zero and the variance of the residuals is positive. Hence, the coefficient of determination is lower than one.

The toy problem obviously has a low coefficient of determination since most of the predicted values are way off. A coefficient of determination of -3.86 means that the variance of the residual is 4.86 times as large as the variance in the observed values.

The 0.998 value comes from the coefficient of determination of linear least squares fit of the set of data. This means that the observed values are related to the predicted values by a linear relation (plus a constant) that minimizes the variance of the residual. The observed and predicted values from the toy problem are highly linear dependent and thus the coefficient of determination of the linear least squares fit is very close to one.