How to implement Kullback-Leibler divergence using Mathematica's probability and distribution functions?
In the continuous case the Kullback-Leibler-Divergence from distribution $Q$ to distribution $P$ is defined as
\begin{equation} D_{KL}( P ||Q) = \int \limits_{-\infty}^{+\infty} p(x)\cdot \log \frac{p(x)}{q(x)} dx \end{equation}
So what you need is the probability density function which is PDF
in Mathematica:
distP = NormalDistribution[]; (* avoid capital letters *)
distQ = GumbelDistribution[];
klDivergenceContinuous = Function[ { dist1, dist2 },
NIntegrate[
PDF[ dist1, x ] × ( Log @ PDF[ dist1, x ] - Log @ PDF[ dist2, x ] ),
{ x, -∞, +\[Infinity] }
]
];
klDivergenceContinuous[ distP, distQ ]
0.229783
Update: More general solution
The definition of the expected value in the continuous case is given as
\begin{equation} \mathbb{E}[ f(X) ] = \int \limits_{-\infty}^{+\infty} f(x)\cdot p(x) dx \end{equation}
We can use this definition to find the KL-Divergence. Since Mathematica's implementation of Expectation
will handle discrete distributions also (effectivly summing over all values instead of integrating over them), the following routine might turn out to be more general:
Options[ klDivergence ] = Options @ Expectation;
klDivergence[ distP_ , distQ_, opts:OptionsPattern[ klDivergence ] ] := Expectation[
Log @ PDF[ distP, \[FormalX] ] - Log @ PDF[ distQ, \[FormalX] ],
\[FormalX] \[Distributed] distP,
opts
]
klDivergence[ 1, distP_, distQ_, opts:OptionsPattern[ klDivergence ] ] := klDivergence[
distP,
distQ,
opts
]
klDivergence[ n_Integer, distP_, distQ_, opts:OptionsPattern[ klDivergence ] ]
/; n > 0 := Module[
{
vars = Table[ Unique[ \[FormalX] ], n ]
},
Expectation[
Log @ PDF[ distP, vars ] - Log @ PDF[ distQ, vars ],
vars \[Distributed] distP,
opts
]
]
This might (in principle) work for discrete as well as for continuous distributions:
Continuous Distributions (OP)
klDivergence[ distP, distQ ]
$-\frac{1}{2}+ \sqrt{\mathrm{e}} - \frac{1}{2} \log 2\pi $
N[%]
0.229783
Discrete Distributions
We may use the example given in (104506) to check how this works out in the discrete case:
pmfA = EmpiricalDistribution[ {1/6, 1/6, 1/6, 1/6, 1/6, 1/6} -> Range[6] ];
pmfB = EmpiricalDistribution[ {1/4, 1/4, 1/8, 1/8, 1/8, 1/8} -> Range[6] ];
klDivergence[ pmfA, pmfB ]
1/3 (Log[4] - 3 Log[6] + 2 Log[8])
The problems arising from a discrete distribution containing zero probabilities (cf. the question linked above) may be solved by dropping the zero-entries from the distribution:
(* pmfB = {1/4, 1/4, 1/2, 0, 0, 0} *)
pmfBzero = EmpiricalDistribution[ {1/4, 1/4, 1/2} -> Range[3] ];
klDivergence[ pmfBzero, pmfA ]
Log[ 3/Sqrt[2] ]
I know that this question already has a very good answer, but I still wanted to present the Wolfram Function Repository function I wrote specifically for this purpose (before I even found this question).
Like the solution by gwr, it uses (N)Expectation
to compute the result, though I use LogLikelihood
instead of Log @ PDF[...]
. The function also checks the domains of the distributions for you.
edit
The current version will complain if you use an EmpiricalDistribution
, but I submitted a new version that fixes that problem.