Proof of a fact about traces

I guess $\Delta_A$ denotes the derivative with respect to the elements of the matrix $A$ (more conventionally denoted by $\partial_{A}$).

To evaluate the derivative with respect to $A_{ij}$, write out the trace in terms of components and then use $\partial_{A_{ij}} A_{mn} = \delta_{im} \delta_{jn}$, $$\partial_{A_{ij}} \text{tr}(A B A^T C) = \partial_{A_{ij}}\sum_{mnkl} A_{mn} B_{nk} A_{lk} C_{lm}= \sum_{kl} B_{jk}A_{lk} C_{li} + \sum_{mn} A_{mn} B_{nj} C_{im} $$ $$= ( C^T A B^T+ C A B )_{ij}.$$ This is the component-wise version of your identity.

Note to the comment of Todd Trimble: the matrices $A,B$, and $C$ do not have to be necessarily square matrices. Their dimension just has to "match" ($A \in \mathbb{R}^{m \times n}$, $B\in \mathbb{R}^{n \times n}$, $C \in \mathbb{R}^{m \times m}$, with $m$ and $n$ arbitrary integers).


I'm going to decompose Fabian's answer into something a little more newb friendly. An important concept is the partial derivative of a matrix with respect to one of its elements. An example of this is

$$ \partial_{\mathbf{A}_{31}} \mathbf{A} = \partial_{\mathbf{A}_{31}} \begin{bmatrix} \mathbf{A}_{11} & \mathbf{A}_{12} \\ \mathbf{A}_{21} & \mathbf{A}_{22} \\ \mathbf{A}_{31} & \mathbf{A}_{32} \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ 1 & 0 \end{bmatrix}. $$

In general, if we take the derivative with respect to the $(i,j)$ entry, then the $(m,n)$ entry of the resulting matrix is

$$ \partial_{A_{ij}} A_{mn} = \delta_{im} \delta_{jn} $$ where $\delta$ is the Kronecker delta. This is simply the basic statement of multivariate calculus: namely $\partial_x x = 1$ and $\partial_x y = 0$. In particular,

\begin{align*} \sum_{mn} \partial_{A_{ij}} A_{mn} = \partial_{A_{ij}} A_{ij} = 1. \qquad \text{(1)} \end{align*}

To begin the proof, we consider matrix equations, which are

\begin{align*} \partial_{\mathbf{A}} \text{tr}(\mathbf{A} \mathbf{B} \mathbf{A}^\text{T} \mathbf{C}) &= \partial_{\mathbf{A}} \sum_{m} \left(\mathbf{A} \mathbf{B} \mathbf{A}^\text{T} \mathbf{C}\right)_{mm} \qquad \text{(Trace[2])} \\ &= \partial_{\mathbf{A}} \sum_{m} \Bigg(\sum_{n k \ell} A_{mn} B_{nk} \left(A^T\right)_{k \ell} C_{\ell m}\Bigg)_{mm} \qquad \text{ (Matrix multiplication [3])} \\ &= \partial_{\mathbf{A}} \sum_{m n k \ell} A_{mn} B_{nk} A_{ \ell k} C_{\ell m} \qquad \text{ (Transpose [4])} \\ &= \mathbf{C}^T \mathbf{A B}^T + \mathbf{C A B}, \end{align*}

where we justify the last step component-wise. For all elements $(i,j)$, it follows that

\begin{align*} &\Bigg( \partial_{\mathbf{A}} \sum_{m n k \ell} A_{mn} B_{nk} A_{ \ell k} C_{\ell m} \Bigg)_{ij} \\ &= \partial_{\mathbf{A}_{ij}} \sum_{m n k \ell} A_{mn} B_{nk} A_{ \ell k} C_{\ell m} \qquad \text{(Scalar-by-matrix derivative[5])} \\ &= \sum_{mnk \ell} \partial_{A_{ij}} A_{mn} B_{nk} A_{ \ell k} C_{\ell m} \qquad \text{ (Linearity of differentiation [6])} \\ &= \sum_{mnk \ell} (\partial_{A_{ij}} A_{mn}) (B_{nk} A_{\ell k} C_{\ell m}) + (A_{mn}) (\partial_{A_{ij}} B_{nk} A_{\ell k} C_{\ell m}) \qquad \text{(Product Rule[7])} \\ &= \sum_{mnk \ell} (\partial_{A_{ij}} A_{mn}) B_{nk} A_{\ell k} C_{\ell m} + \sum_{mnk \ell} A_{mn} (\partial_{A_{ij}} B_{nk} A_{\ell k} C_{\ell m}) \\ &= \sum_{k \ell} B_{jk} A_{\ell k} C_{\ell i} + \sum_{mnk \ell} A_{mn} (\partial_{A_{ij}} B_{nk} A_{\ell k} C_{\ell m}) \qquad \text{ (Eqn. 1)} \\ &= \sum_{k \ell} B_{jk} A_{\ell k} C_{\ell i} + \sum_{mnk \ell} A_{mn} \Big[(\partial_{A_{ij}} A_{\ell k}) (B_{nk} C_{\ell m}) + (A_{\ell k}) (\partial_{A_{ij}} B_{nk} C_{\ell m})\Big] \text{ (Product R.)} \\ &= \sum_{k \ell} B_{jk} A_{\ell k} C_{\ell i} + \sum_{mnk \ell} A_{mn} (\partial_{A_{ij}} A_{\ell k}) B_{nk} C_{\ell m} \\ &= \sum_{k \ell} B_{jk} A_{\ell k} C_{\ell i} + \sum_{mn} A_{mn} B_{nj} C_{i m} \qquad \text{ (Eqn. 1)} \\ &= \sum_{k \ell} C_{\ell i} A_{\ell k} B_{jk} + \sum_{mn} C_{i m} A_{mn} B_{nj} \\ &= \sum_{k \ell} \left(C^T\right)_{i \ell} A_{\ell k} \left(B^T\right)_{kj} + \sum_{mn} C_{im} A_{mn} B_{nj} \qquad \text{(Transpose)} \\ &= \left( \mathbf{C}^T \mathbf{A} \mathbf{B}^T + \mathbf{C} \mathbf{A} \mathbf{B} \right)_{ij}, \qquad \text{(Matrix multiplication)} \end{align*}

which completes the proof.

  • [2]: https://en.wikipedia.org/wiki/Trace_(linear_algebra)
  • [3]: https://en.wikipedia.org/wiki/Matrix_multiplication#General_definition_of_the_matrix_product
  • [4]: https://en.wikipedia.org/wiki/Transpose
  • [5]: https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix
  • [6]: https://en.wikipedia.org/wiki/Linearity_of_differentiation
  • [7]: https://en.wikipedia.org/wiki/Product_rule