How to differentiate formally?
Here is the simplest answer:
sum[n_] := Sum[i x[i], {i, 1, n}]
x /: D[x[i_], x[j_], NonConstants -> {x}] :=
KroneckerDelta[i, j]
D[sum[n], x[2], NonConstants -> x]
$\begin{cases} 2 & n>1 \\ 1-n & \text{True} \end{cases}$
The trick here is the use of the NonConstants
option of the derivative operator. This then has to be combined with a definition stating that the x[i]
are independent variables for the purposes of this differentiation (hence the KroneckerDelta
on the second line).
Edit: more discussion
And here is another cool result, completely symbolic:
Assuming[m ∈ Integers, D[sum[n], x[m], NonConstants -> x]]
$\left( \begin{array}{cc} \{ & \begin{array}{cc} m & m\geq 1 \\ 0 & \text{True} \\ \end{array} \\ \end{array} \right)-\left( \begin{array}{cc} \{ & \begin{array}{cc} m & m>2\land m\geq n+1 \\ n+1 & m=2\land n=1 \\ \end{array} \\ \end{array} \right)$
This isn't easy to absorb, but it works if you check it with specific examples by doing
condition = %;
Simplify[condition /. m -> 10]
$\begin{cases} 10 & n>9 \\ 0 & \text{True} \end{cases}$
In summary, it's worth pointing out that a lot of symbolic differentiation tasks can be achieved by using either NonConstants
specifications in D
or conversely using Constants
specifications in Dt
.
I did some computation of formal derivatives a while back which might be of interest in this context (though keep in mind that this is anything but bullet proof! it does work for the cases I bothered to check though).
Clear[a]; Format[a[k_]] = Subscript[a, k]
Let us say we have an objective function which is formally a function of
the vector a[i]
Q = Sum[Log[Sum[a[r] Subscript[B, r][Subscript[x, i]], {r, 1, p}]/
Sum[a[r] , {r, 1, p}]], {i, 1, n}]
Let us define a couple of rules for formal differentiation as follows
Clear[d];
d[Log[x_], a[k_]] := 1/x d[x, a[k]]
d[Sum[x_, y__], a[k_]] := Sum[d[x, a[k]], y]
d[ a[k_] b_., a[k_]] := b /; FreeQ[b, a]
d[ a[q_] b_., a[k_]] := b Subscript[δ, k, q] /; FreeQ[b, a]
d[ c_ b_, a[k_]] := d[c, a[k]] b + d[b, a[k]] c
d[ b_ + c_, a[k_]] := d[c, a[k]] + d[b, a[k]]
d[Subscript[δ, r_, q_], a[k_]] := 0
d[x_, a[k_]] := 0 /; FreeQ[x, a]
d[G_^n_, a[k_]] := n G^(n - 1) d[G , a[k]] /; ! FreeQ[G, a]
d[Exp[G_], a[q_]] := Exp[G] d[G , a[q]] /; ! FreeQ[G, a]
Unprotect[Sum]; Attributes[Sum] = {ReadProtected};Protect[Sum];
And a rule to deal with Kroneckers
ds = {Sum[a_ + b_, {s_, 1, p_}] :> Sum[a, {s, 1, p}] + Sum[b, {s, 1, p}],
Sum[ y_ Subscript[δ, r_, s_], {s_, 1, p_}] :> (y /. s -> r),
Sum[ y_ Subscript[δ, s_, r_], {s_, 1, p_}] :> (y /. s -> r),
Sum[ Subscript[δ, s_, r_], {r_, 1, p_}] :> 1,
Sum[δ[i_, k_] δ[j_, k_] y_. , {k_, n_}] -> δ[i, j] (y /. k -> i),
Sum[a_ b_, {r_, 1, p_}] :> a Sum[b, {r, 1, p}] /; NumberQ[a],
Sum[a__, {r_, 1, p_}] :> Sum[Simplify[a], {r, 1, p}] }
Then, for instance, the gradient of Q
with respect to one of the a[k]
reads
grad = d[Q, a[k]] /. ds // Simplify;
Similarly the tensor of second derivatives w.r.t. a[k]
and a[s]
is given by
hess = d[d[Q, a[k]], a[s]] /. ds // Simplify
As a less trivial example let us consider the 4th order derivatives of Q
d[d[d[d[Q, a[k]], a[s]], a[m]], a[t]]; /. ds // Simplify
For the problem at hand we check easily that
Q = Sum[r a[r] , {r, 1, p}];
grad = d[Q, a[k]] // Simplify;
grad //. ds
returns k
as it should
EDIT
This process can be made a bit more general, say, on this Objective function
Q = 1/2 Sum[(Sum[a[r] Subscript[B, r, i][a[q]], {r, 1, p}] -
Subscript[y, i])^2, {i, 1, n}]
which depends non linearly on a[k]
via B
.
All we need is to add a new rule for d
d[H_[a[q_]],
a[k_]] := (D[H[x] , x] /. x -> a[k] ) Subscript[δ, k, q]
Now we readily have
grad = d[Q, a[k]] // Simplify;
hess = d[d[Q, a[k]], a[s]];
grad //. ds
hess /. ds // Simplify
As a other example, let us look at a parametrized entropy distance,
Q = -Sum[(Sum[a[r] Subscript[B, r, i], {r, 1, p}]/
Subscript[y, i]) Log[(Sum[a[r] Subscript[B, r, i], {r, 1, p}]/
Subscript[y, i])], {i, 1, n}]
we can compute its Hessian while mapping twice the sum rule
Map[# /. ds &, d[d[Q, a[k]], a[s]] /. ds]
As a final example, consider a Poisson likelihood
Q = Sum[Log[Exp[-a[k]] a[k]^Subscript[y, k]/Subscript[y, k]!], {k, 1, n}]
so that
grad = d[Q, a[k]] // Simplify
and
hess =d[d[Q, a[k]], a[s]] /. ds // Simplify
Of course these algebraic rules are not bullet proof but illustrate nicely the way mathematica handles new grammar.
Starting in M11.1, this works:
sum[n_] = Sum[i x[i],{i,1,n}];
D[sum[n],x[2]] //InputForm
Piecewise[{{2, n >= 2}}, 0]