Differentiating functions of vectors/matrices?

In this post I discuss a function MatrixD which attempts to take a matrix derivative following the guidelines given in the The Matrix Cookbook.

I still want to take advantage of the normal partial derivative function D, but I need to override the default handling of matrix functions. The basic approach is the following:

  1. Add Inverse, Tr, Transpose, Det, MatrixPower, and MatrixFunction to the list of "ExcludedFunctions" in SystemOptions[]. The default handling of derivatives of these symbols needs to be avoided (D[Tr[X], X] evaluating to Tr'[X] is not useful).

  2. The matrix derivative of a matrix is not 1, but rather is given by the following: $$\frac{\partial M_{\text{ij}}}{\partial M_{\text{kl}}}=\mathbb{J}_{\text{ijkl}}$$ where $\mathbb{J}_{\text{ijkl}}\equiv \delta _{\text{ik}} \delta _{\text{jl}}$. Since D[X, X] invariably returns 1, I will instead use the following construct:

    MatrixD /: D[X, MatrixD, NonConstants->{X}] := $SingleEntryMatrix
    

    MatrixD here acts as a variable, and the NonConstants option prevents the derivative from being 0. This DownValue for D (as well as all of the others) are hidden inside the implementation of MatrixD.

  3. The result of MatrixD will need to be simplified further. For this purpose, I wrote a MatrixReduce function. Some of the simplifications performed by MatrixReduce depend on whether the matrix variable is invertible or not. This is controlled by the "Invertible" option of MatrixD and MatrixReduce.

  4. Scalar functions of a matrix (e.g., Log[X]) are not supported. Instead, one must use MatrixFunction (or MatrixPower), e.g., MatrixFunction[Log, X] instead of Log[X].

  5. MatrixFunction has a simple differentiation rule when it is inside of a Tr/Det, but much more complicated rules if it is outside of a Tr/Det. Hence, derivatives of MatrixFunction are only supported when they occur inside of a Tr/Det wrapper.

  6. For test purposes, I also created a TestMatrixD function. This function tests the MatrixD output by performing the matrix derivative using ordinary D rules as well as with the MatrixD, and comparing the output after substituting random real matrices for the variables and matrix constants.

  7. There is at least one bug in the MatrixD implementation, when the argument to one of the matrix functions Tr, Det, Inverse and Transpose is actually a scalar and not a matrix (e.g., Tr[Det[X]]). There may be more.

The code is a little lengthy, so I put it on GitHub. You can download it at:

https://github.com/carlwoll/MatrixD/releases

The source code can be viewed there as well. Download the file "MatrixD-1.0.paclet" and then install it with:

PacletInstall[file]

Load the package with:

<<MatrixD`

Now, for your first couple elementary examples:

MatrixD[Tr[X\[Transpose].X], X]//TeXForm

$2 X$

MatrixD[Tr[X.X], X]//TeXForm

$2 X^T$

Your slightly more complicated Det example:

MatrixD[Det[A.X.B], X, "Invertible"->False]//TeXForm

$\left| A.X.B\right| A^T.\left(B^T.X^T.A^T\right)^{-1}.B^T$

And the output for your last example:

MatrixD[Tr[A.X.B.X\[Transpose].C], X, "Invertible"->False]//TeXForm

$A^T.C^T.X.B^T+C.A.X.B$

I'm planning on adding support for vectors as well as matrices in the future.


It is certainly fairly easy to check these relations for specific sizes of array:

X = Array[x, {7, 11}];
Map[D[Tr[Transpose[X].X], #] &, X, {2}] == 2 X
(* True *)

Y = Array[y, {17, 17}];
Map[D[Tr[Y.Y], #] &, Y, {2}] == 2 Transpose[Y]
(* True *)

Generalisation to the determinant expression should be relatively simple...

EDIT

Without being too rigorous about it, we can use the following relations to find some matrix derivatives (I assume in the following that all arguments to Tr are square )

With[{m = 4, n = 2, p = 3},
 A = Array[a, {m, n}];
 X = Array[x, {p, n}];
 B = Array[b, {p, p}];
 F = Array[c, {n, m}];
 Y = Array[y, {n, p}];]


Tr[B] == Tr[Transpose[B]]
(* True *)

Tr[X.Y] == Tr[Y.X]
(* True *)

This allows us to derive (for example)

Tr[A.Transpose[X].B.X.F] == Tr[B.X.F.A.Transpose[X]] == Tr[Transpose[B].X.Transpose[A].Transpose[F].Transpose[X]] // Simplify
(* True *)

and guess the form of the derivative

Map[D[Tr[A.Transpose[X].B.X.F], #] &, X, {2}] == 
  B.X.F.A + Transpose[B].X.Transpose[A].Transpose[F] // Expand
(* True *)