Matrix calculus in multiple linear regression OLS estimate derivation
Consider the full matrix case of the regression $$\eqalign{ Y &= XB+E \cr E &= Y-XB \cr }$$ In this case the function to be minimized is $$\eqalign{f &= \|E\|^2_F = E:E}$$ where colon represents the Frobenius Inner Product.
Now find the differential and gradient $$\eqalign{ df &= 2\,E:dE \cr &= -2\,E:X\,dB \cr &= 2\,(XB-Y):X\,dB \cr &= 2\,X^T(XB-Y):dB \cr\cr \frac{\partial f}{\partial B} &= 2\,X^T(XB-Y) \cr }$$ Set the gradient to zero and solve $$\eqalign{ X^TXB &= X^TY \cr B &= (X^TX)^{-1}X^TY \cr }$$ This result remains valid when $B$ is an $(N\times 1)$ matrix, i.e. a vector.
The problem is that, in the vector case, people tend to write the function in terms of the transpose product instead of the inner product, and then fall into rabbit holes concerning the details of the transpositions.