Differentiating functions of vectors/matrices?
In this post I discuss a function MatrixD
which attempts to take a matrix derivative following the guidelines given in the The Matrix Cookbook.
I still want to take advantage of the normal partial derivative function D, but I need to override the default handling of matrix functions. The basic approach is the following:
Add
Inverse
,Tr
,Transpose
,Det
,MatrixPower
, andMatrixFunction
to the list of "ExcludedFunctions" inSystemOptions[]
. The default handling of derivatives of these symbols needs to be avoided (D[Tr[X], X]
evaluating toTr'[X]
is not useful).The matrix derivative of a matrix is not 1, but rather is given by the following: $$\frac{\partial M_{\text{ij}}}{\partial M_{\text{kl}}}=\mathbb{J}_{\text{ijkl}}$$ where $\mathbb{J}_{\text{ijkl}}\equiv \delta _{\text{ik}} \delta _{\text{jl}}$. Since
D[X, X]
invariably returns 1, I will instead use the following construct:MatrixD /: D[X, MatrixD, NonConstants->{X}] := $SingleEntryMatrix
MatrixD
here acts as a variable, and theNonConstants
option prevents the derivative from being 0. This DownValue forD
(as well as all of the others) are hidden inside the implementation ofMatrixD
.The result of
MatrixD
will need to be simplified further. For this purpose, I wrote aMatrixReduce
function. Some of the simplifications performed byMatrixReduce
depend on whether the matrix variable is invertible or not. This is controlled by the "Invertible" option ofMatrixD
andMatrixReduce
.Scalar functions of a matrix (e.g.,
Log[X]
) are not supported. Instead, one must useMatrixFunction
(orMatrixPower
), e.g.,MatrixFunction[Log, X]
instead ofLog[X]
.MatrixFunction
has a simple differentiation rule when it is inside of aTr
/Det
, but much more complicated rules if it is outside of aTr
/Det
. Hence, derivatives ofMatrixFunction
are only supported when they occur inside of aTr
/Det
wrapper.For test purposes, I also created a
TestMatrixD
function. This function tests theMatrixD
output by performing the matrix derivative using ordinaryD
rules as well as with theMatrixD
, and comparing the output after substituting random real matrices for the variables and matrix constants.There is at least one bug in the
MatrixD
implementation, when the argument to one of the matrix functionsTr
,Det
,Inverse
andTranspose
is actually a scalar and not a matrix (e.g.,Tr[Det[X]]
). There may be more.
The code is a little lengthy, so I put it on GitHub. You can download it at:
https://github.com/carlwoll/MatrixD/releases
The source code can be viewed there as well. Download the file "MatrixD-1.0.paclet" and then install it with:
PacletInstall[file]
Load the package with:
<<MatrixD`
Now, for your first couple elementary examples:
MatrixD[Tr[X\[Transpose].X], X]//TeXForm
$2 X$
MatrixD[Tr[X.X], X]//TeXForm
$2 X^T$
Your slightly more complicated Det
example:
MatrixD[Det[A.X.B], X, "Invertible"->False]//TeXForm
$\left| A.X.B\right| A^T.\left(B^T.X^T.A^T\right)^{-1}.B^T$
And the output for your last example:
MatrixD[Tr[A.X.B.X\[Transpose].C], X, "Invertible"->False]//TeXForm
$A^T.C^T.X.B^T+C.A.X.B$
I'm planning on adding support for vectors as well as matrices in the future.
It is certainly fairly easy to check these relations for specific sizes of array:
X = Array[x, {7, 11}];
Map[D[Tr[Transpose[X].X], #] &, X, {2}] == 2 X
(* True *)
Y = Array[y, {17, 17}];
Map[D[Tr[Y.Y], #] &, Y, {2}] == 2 Transpose[Y]
(* True *)
Generalisation to the determinant expression should be relatively simple...
EDIT
Without being too rigorous about it, we can use the following relations to find some matrix derivatives (I assume in the following that all arguments to Tr
are square )
With[{m = 4, n = 2, p = 3},
A = Array[a, {m, n}];
X = Array[x, {p, n}];
B = Array[b, {p, p}];
F = Array[c, {n, m}];
Y = Array[y, {n, p}];]
Tr[B] == Tr[Transpose[B]]
(* True *)
Tr[X.Y] == Tr[Y.X]
(* True *)
This allows us to derive (for example)
Tr[A.Transpose[X].B.X.F] == Tr[B.X.F.A.Transpose[X]] == Tr[Transpose[B].X.Transpose[A].Transpose[F].Transpose[X]] // Simplify
(* True *)
and guess the form of the derivative
Map[D[Tr[A.Transpose[X].B.X.F], #] &, X, {2}] ==
B.X.F.A + Transpose[B].X.Transpose[A].Transpose[F] // Expand
(* True *)