Least squares / residual sum of squares in closed form

According to Randal J. Barnes, Matrix Differentiation, Prop. 7, if $\alpha=y^TAx$ where $y$ and $x$ are vectors and $A$ is a matrix, we have $$\frac{\partial\alpha}{\partial x}=y^TA\text{ and }\frac{\partial\alpha}{\partial y}=x^TA^T$$ (the proof is very simple). Also according to his Prop. 8, if $\alpha=x^TAx$ then $$\frac{\partial \alpha}{\partial x}=x^T(A+A^T). $$ Therefore in Alecos's solution above, I would rather write $$ \frac{\partial\mathrm{RSS}(\beta)}{\partial\beta}=-y^TX-y^TX+\beta^T(X^TX+XX^T) $$ where the last term is indeed $2\beta^TX^TX$ since $X^TX$ is symmetric and hence $X^TX=XX^T$. This gives us an equation $$ (y^T+b^TX^T)X=0 $$ which provides the same result as in Alecos's answer, if we take the transpose of both sides. I guess Alecos has used a different definition of matrix differentiation than Barnes, but the final result is, of course, correct.


This is standard multiplication and differentiation rules for matrices.

We have

$$RSS(\beta) = (y - X \beta)^T (y - X \beta) = (y^T - \beta^TX^T)(y - X \beta) \\ =y^Ty-y^TX \beta-\beta^TX^Ty+\beta^TX^TX \beta$$

Then $$\frac {\partial RSS(\beta)}{\partial \beta} = -X^Ty-X^Ty+2X^TX\beta$$

the last term because the matrix $X^TX$ is symmetric.

So $$\frac {\partial RSS(\beta)}{\partial \beta} =0 \Rightarrow -2X^Ty+2X^TX\beta =0 \Rightarrow -X^Ty+X^TX\beta = 0$$

$$\Rightarrow X^T(-y + X\beta) = 0\Rightarrow X^T(y-X\beta)=0$$