"Entropy" proof of Brunn-Minkowski Inequality?

The similarity between the entropy power inequality and the Brunn-Minkowski inequality is not directly related to convexity - after all, Brunn-Minkowski can be generalised to bounded open sets that are not necessarily convex. (However, many of the proofs of both inequalities use ideas from convexity theory, of course.)

Taking the microstate (i.e. Boltzmann) interpretation of entropy, one can heuristically view the entropy power inequality as a high-dimensional "99%" analogue of Brunn-Minkowski, in which one considers the "99%-sumset" $A \stackrel{99\%}{+} B$ of two high-dimensional sets A and B (this is a somewhat vaguely defined concept, but roughly speaking it is the bulk of the support of two random variables distributed on A and B respectively) rather than the "100%-sumset" A+B. However, thanks to the concentration of measure phenomenon, the 99%-sumset is considerably smaller than the 100%-sumset in high dimensions, leading to the additional exponent of 2 in the entropy power inequality. For instance, consider in high dimensions two balls $B(0,R)$, $B(0,r)$ of radii $R,r$ respectively. Their 100%-sumset is $B(0,R+r)$ of course. But if one takes a random element of $B(0,R)$ and adds it to a random element of $B(0,r)$, the sum concentrates in a much smaller ball - asymptotically, $B(0,\sqrt{R^2+r^2})$ (in fact it concentrates to the boundary of this ball). This boils down to the basic fact that pairs of vectors in high dimensions are typically almost orthogonal to each other. So one morally has

$$ B(0,R) \stackrel{99\%}{+} B(0,r) = B(0,\sqrt{R^2+r^2}) \qquad (1)$$

in the high-dimensional limit.

Anyway, the EPI can be thought of as a rigorous formulation of a ``99% Brunn-Minkowski inequality''

$$ |A \stackrel{99\%}{+} B|^{1/N} \geq \sqrt{ (|A|^{1/N})^2 + (|B|^{1/N})^2 } \qquad (2)$$

in the high-dimensional limit $N \to \infty$ (with $A,B \subset \mathbb{R}^N$ varying appropriately with $N$), which is of course consistent with (1). To see this interpretation, one observes from the microstate interpretation of entropy that if $X$ is a continuous random variable on $\mathbb{R}^n$ with a nice distribution function (e.g. $C^\infty_c$), and $M$ is large, then taking $M$ independent copies $X_1,\dots,X_M$ of $X$ gives a random vector $X^{\otimes M} := (X_1,\dots,X_M)$ in $\mathbb{R}^{N}$ for $N := Mn$ which (by the asymptotic equipartition property) is concentrated in a subset of $\mathbb{R}^N$ of measure $e^{M (H(X)+o(1))}$ in the limit $M \to \infty$ (this is a nice calculation using Stirling's formula; it may help to first work out the case when the probability distribution of $X$ is a simple function rather than a test function). Similarly, if $Y$ is another random variable independent of $X$, then $Y^{\otimes M}$ will be concentrated in a set of measure $e^{M(H(Y)+o(1))}$, while $(X+Y)^{\otimes M}$ is concentrated in a set of measure $e^{M(H(X+Y)+o(1))}$. EPI is then morally a consequence of (2) in the limit $M \to \infty$.

Despite the significant differences between the $99\%$-sumset and $100\%$-sumset in high dimensions, it is still good to think of these concepts as being closely analogous, so that almost any sumset inequality should have an entropy counterpart and vice versa (although in most cases we do not have a direct logical implication between the sumset inequality and the entropy inequality; instead, the inequalities typically have analogous, but not completely identical, proofs). See e.g. this recent paper of Kontoyiannis and Madiman (and the references therein) for some instances of this.

EDIT: Of course, by bounding the $99\%$-sumset by the $100\%$-sumset one can get some connection between the two types of inequalities, but usually one gets an inferior estimate when one uses this approach (it completely ignores the effect of concentration of measure), so this is not the "right" way to relate sumset inequalities with their entropy counterparts. For instance, by directly applying the EPI to uniform random variables on $A,B \subset \mathbb{R}^n$ and then using Jensen's inequality, one only gets a weak form $$ |A+B|^{1/n} \geq \sqrt{ (|A|^{1/n})^2 + (|B|^{1/n})^2 }$$ of the Brunn-Minkowski inequality (compare with (2)). The problem here, of course, is that the sum of two uniformly distributed independent random variables is almost never uniformly distributed.

$\newcommand{\R}{\mathbb{R}}$Actually, for compact subsets $A, B\subset \R^n$, the inequality \begin{equation*} m(A+B)^{1/n} \ge m(A)^{1/n} + m(B)^{1/n} \end{equation*} is ultimately nothing but the convexity of $\log(1+e^x)$ in disguise (whose convexity in turn follows immediately from the AM-GM inequality).

This idea, and the entire inductive proof may be found in Theorem 2.3.9, Notions of Convexity, L. Hörmander (2007, Birkhäuser); this is a superb book which anybody with interest in convexity should definitely own!

The idea of the proof is to show this inequality when $A$ and $B$ are unions of finitely many disjoint products of intervals, by induction over the number of intervals that make up $A$ and $B$.

I'll just cite the base case: here $A$ and $B$ are intervals of side lengths $a_1,\ldots,a_k$ and $b_1,\ldots,b_k$, then

\begin{equation*} m(A)^{1/n} +m(B)^{1/n} = \prod\nolimits_j a_j^{1/n} + \prod\nolimits_j b_j^{1/n} \le \prod\nolimits_j (a_j+b_j)^{1/n}, \end{equation*} where the final inequality follows from homogeneity of terms on both sides of the inequality upon combination with convexity of $\log(1+e^x)$.

Regarding your question "I would like to understand better why Convex geometry should have anything to do with information theory, and that the natural probability distributions are associated to a convex set.", James Melbourne, Peng Xu and I recently wrote a survey article (https://arxiv.org/abs/1604.04225) that exhaustively explores parallels between information theory and convex geometry, including in particular the resemblance between the Brunn-Minkowski inequality (BMI) and the entropy power inequality (EPI).

Let me elucidate some of the key points:

1) As Terry points out, a priori the resemblance between the two inequalities has nothing to do with convexity. Indeed, the BMI holds for arbitrary Borel sets in $\mathbb{R}^d$, while the EPI holds for arbitrary Borel probability measures on $\mathbb{R}^d$. (For the latter, one should use the convention that $N(X)=e^{2h(X)/d}$ is set to be 0 if the entropy $h(X)=-\int f\log f$ with $f$ being the density of $X$ is not defined or if the distribution of $X$ does not have a density. See Bobkov and Chistyakov, "Entropy Power Inequality for the Rényi Entropy", IEEE Trans. on Info. Theory, 2015, for why this convention is essential.)

2) Both the BMI and EPI can be seen as special cases of Rényi entropy inequalities-- one way to see this is via Young's inequality with sharp constant as pointed out by Ofer (indeed, Dembo-Cover-Thomas noted that the Young-Beckner inequality can be rewritten in a form similar to the EPI but with three different Rényi entropies showing up), and the other is to see it via a rearrangement-based unification developed in this paper of Liyao Wang and myself: https://arxiv.org/abs/1307.6018. The latter unification, which is based on the Rogers-Brascamp-Lieb-Luttinger rearrangement inequality, has the following rather neat formulation: $$ (1) \quad\quad\quad h_p(X_1 + \cdots + X_k)\geq h_p(X_1^* + \cdots + X_k^*) $$ where $h_p(X)=\frac{1}{1-p}\log \int f^p$ is the Rényi entropy of $X$ of order $p$, and for any $X$ with density $f$, $X^*$ is a random vector drawn from the density $f^*$ that is the spherically symmetric, decreasing rearrangement of $f$. Note that $h_1(X)$ is just the Shannon entropy $h(X)$, $h_0(X)=\log |K|$ if $K$ is the support of $X$, and $h_\infty(X)=\log \|f\|_\infty$.

3) To see that (1) is a valid unification, the following observation is useful. The BMI can be rewritten in the form $|A_1 +A_2|\geq |B_1+B_2|$ where $B_i$ are Euclidean balls satisfying $|B_i|=|A_i|$ for each $i$. Similarly the EPI can rewritten in the form $N(X_1+X_2)\geq N(Z_1+Z_2)$ where $Z_i$ are independent, spherically symmetric Gaussian random vectors satisfying $N(X_i)=N(Z_i)$ for each $i$. Now we see that the $p=0$ case of the inequality (1) is precisely this alternate form of the BMI. Furthermore, as shown in our paper with Wang, the EPI can be derived from the $p=1$ version of (1) (which in turn can be seen as a strengthening of the EPI, in the sense that it inserts the term $h(X_1^* + \cdots + X_k^*)$ in between the terms $h(X_1 + \cdots + X_k)$ and $h(Z_1 + \cdots + Z_k)$ that appear in the alternate form of the EPI).

4) There is a special role for convexity in the Brunn-Minkowski world, of course. Even though the BMI itself holds for much more general sets, many of the more sophisticated results in convex geometry (examples include the Rogers-Shephard inequality, mixed volumes and the Alexandrov-Fenchel theory, and the reverse Brunn-Minkowski inequality of V. Milman that plays an important role in geometric functional analysis) are special to convex sets. There is, as you suspected, an analogue in the world of probability measures. Indeed, C. Borell developed in the 1970's a beautiful and historically underappreciated theory of "convex measures"-- of which the more well known class of log-concave measures forms a distinguished subset. Let us say (I am skipping some fine points so as not to make this answer too long) that a measure is log-concave when the logarithm of its density is log-concave; thus log-concave measures include all Gaussian measures, uniform measures on convex bodies, exponential measures etc. Then it turns out that many of the inequalities that hold for convex sets but not general sets have analogues that can be stated as entropy inequalities holding for log-concave probability measures but not general probability measures. This story is only partially developed and there remain many interesting open questions; all of what is known until late 2016 together with many open questions are described in our survey at https://arxiv.org/abs/1604.04225.

"Entropy" proof of Brunn-Minkowski Inequality?

Tags:

Mg.Metric Geometry

Pr.Probability

Fa.Functional Analysis

It.Information Theory

St.Statistics

Related

Recent Posts