Matrix-by-matrix derivative formula

Start with a matrix function, then take the differential, then vectorize and identify the gradient.

$$\eqalign{ F &= X^TMX \cr dF &= dX^TMX + X^TMdX \cr {\rm vec}(dF) &= {\rm vec}(dX^TMX) + {\rm vec}(X^TMdX) \cr df&=(X^TM^T\otimes I){\rm vec}(dX^T) + (I\otimes X^TM){\rm vec}(dX)\cr &= \Big((X^TM^T\otimes I)K + (I\otimes X^TM)\Big)\,{\rm vec}(dX) \cr \frac{\partial f}{\partial x} &= (X^TM^T\otimes I)K + (I\otimes X^TM) \cr }$$ where $K$ is the Commutation Matrix for Kronecker products.


Let $f(X) = X^T M X$. Then for a variation $\epsilon Y$, with $\epsilon$ a real number, we have by direct calculation $$ f(X+\epsilon Y) = f(X) + \epsilon \left( Y^T M X + X^T M Y \right) + \epsilon^2 \left( Y^T M Y \right) $$ Therefore, we can compute the directional derivative as follows: $$ \nabla_Y f(X) := \lim_{\epsilon \to 0} \frac{f(X+\epsilon Y)-f(X)}{\epsilon} = Y^T MX + X^T M Y. $$ Hence, the derivative $\nabla f$ at $X$ is the linear map $$ \nabla f(X): Y \mapsto Y^T M X + X^T M Y $$


$\Phi: M_n \times M_n \to M_n$ given by $\Phi(X,Y) = X^T M Y$ is a bilinear form on a finite dimensional nvs, so $\Phi$ is bounded. Hence, $\Phi$ is differentiable and:

$$D \Phi(X,Y)(H,K) = \Phi(H,Y) + \Phi(X,K) = H^T M Y + X^T M K$$

for all $X,Y,H,K \in M_n$.

Let $q: M_n \to M_n$, $q(X) = \Phi(X,X)$. Then, $q$ is differentiable and for all $X, H \in M_n$,

$$Dq(X)H = D\Phi(X,X)(H,H) = H^T M X + X^T M H$$