Probability two products are equal

I came across this problem a few days ago, and must say it was a lot of fun to think about. Thank you!

The following is, I believe, a quite complete solution of the problem for a "typical", general $M$, making no assumptions about the relative size of $n$ and $m$ (even $m>n$ is allowed), or about the rank of $M$.

Like kodlu in his/her answer from Jan 5, I will start with slightly re-formulating the question:

We want to find the probability $P(Ms=0)$, where the $n$ components of the vector $s=\frac{x-y}2$ are i.i.d. random variables which have values $1$ or $-1$ with probability $\frac1 4$ and value $0$ with probability $\frac1 2$. $P(Ms=0)$ can then be expressed as the expectation value of a product of Kronecker deltas for the $m$ components of $Ms$: $$ P(Ms=0) = \left\langle \prod_{i=1}^m \delta_{0,\sum_{j=1}^nM_{ij}s_j} \right\rangle_s \ , \hspace{2cm} (1) $$ where $\left\langle \cdots \right\rangle_s$ indicates the average over all $s_j$.

For any integer $t$ with $|t|\leq n$ we can substitute $\delta_{0,t} = \frac1{n+1} \sum_{r=0}^n \omega^{rt}$ with the primitive $(n+1)^{\rm th}$ root of unity $\omega = {\rm e}^\frac{2\pi{\rm i}}{n+1}$ ("discrete Fourier transform"), so that $(1)$ becomes $$ P(Ms=0) = \left\langle \prod_{i=1}^m \left( \frac1{n+1} \sum_{r_i=0}^n \omega^{r_i\sum_{j=1}^n M_{ij}s_j} \right) \right\rangle_s = \left\langle \prod_{i=1}^m \prod_{j=1}^n \omega^{M_{ij} r_i s_j} \right\rangle_{r,s} \ , \hspace{2cm} (2) $$ where in the second step the $r_i$ have been re-interpreted as i.u.d. random variables with values in $\{0,...,n\}$, so that $\left\langle \cdots \right\rangle_{r_i} = \frac1{n+1} \sum_{r_i=0}^n (\cdots)$, and $\left\langle \cdots \right\rangle_{r,s}$ indicates the simultaneous average over all $r_i$ and $s_j$.

Now, obviously $P(Ms=0)$ has not the same value for all $(-1,1)$-matrices $M$ of a given size $m \times n$. As an example, if $M$ has rank $1$ (i.e., each of its column vectors is equal to either $v$ or $-v$, with some fixed $(-1,1)$-vector $v$) then the problem is equivalent to the case $m=1$, and the result is $$ P(Ms=0) = 2^{-n} \sum_{l=0}^{\lfloor n/2 \rfloor} 2^{-2l} \begin{pmatrix} {n} \\ {2l} \end{pmatrix} \begin{pmatrix} {2l} \\ {l} \end{pmatrix} \ , \hspace{2cm} (3) $$ whereas usually it will be much smaller (for instance, if $M$ has rank $n$ then trivially $P(Ms=0) = 2^{-n}$). Thus $(2)$ cannot be evaluated without further assumptions on $M$. Alternatively, we can try to calculate a "typical" value of $(2)$, assuming that $M$ itself is a random sample out of a distribution of matrices with i.i.d. matrix elements $M_{ij}$, and averaging $P(Ms=0)$ over this distribution. This approach leads to $$ \Big\langle P(Ms=0) \Big\rangle_M = \left\langle \prod_{j=1}^n \prod_{i=1}^m \left\langle \omega^{M_{ij} r_i s_j} \right\rangle_M \right\rangle_{r,s} = \left\langle \prod_{j=1}^n \prod_{i=1}^m \frac{ \omega^{r_i s_j} + \omega^{-r_i s_j} }2 \right\rangle_{r,s} \ . \hspace{1cm} (4) $$ Note that we have also changed the order of the product operations over $i$ and $j$ here. Now the average over the $s_j$ can be performed easily: for each $j$, $$ \left\langle \prod_{i=1}^m \frac{ \omega^{r_i s_j} + \omega^{-r_i s_j} }2 \right\rangle_{s_j} = \frac1 2 \left( 1 + \prod_{i=1}^m \frac{\omega^{r_i} + \omega^{-r_i}}2 \right) \ , $$ which is independent of $j$, so that $$ \Big\langle P(Ms=0) \Big\rangle_M = 2^{-n} \left\langle \left( 1 + \prod_{i=1}^m \frac{\omega^{r_i} + \omega^{-r_i}}2 \right)^n \right\rangle_{r} \ . \hspace{2cm} (5) $$ To evaluate the right hand side of $(5)$ we expand the $n^{\rm th}$ power, $$ \Big\langle P(Ms=0) \Big\rangle_M = 2^{-n} \sum_{k=0}^n \begin{pmatrix} {n} \\ {k} \end{pmatrix} 2^{-km} \prod_{i=1}^m \left\langle \Big( \omega^{r_i} + \omega^{-r_i} \Big)^k \right\rangle_{r} \ , \hspace{2cm} (6) $$ and likewise the $k^{\rm th}$ power in $(6)$: $$ \left\langle \Big( \omega^{r_i} + \omega^{-r_i} \Big)^k \right\rangle_{r} = \sum_{l=0}^k \begin{pmatrix} {k} \\ {l} \end{pmatrix} \left\langle \omega^{(k-2l)r_i} \right\rangle_{r} = \begin{cases} \begin{pmatrix} {k} \\ {k/2} \end{pmatrix} , \ k\ {\rm even} \\ \quad \ 0 \ , \quad k\ {\rm odd} \end{cases} \ , \hspace{2cm} (7) $$ where $\left\langle \omega^{(k-2l)r_i} \right\rangle_{r} = \delta_{k,2l}$ has been used. Inserting $(7)$ into $(6)$ then leads to $$ \Big\langle P(Ms=0) \Big\rangle_M = 2^{-n} \sum_{l=0}^{\lfloor n/2 \rfloor} 2^{-2lm} \begin{pmatrix} {n} \\ {2l} \end{pmatrix} \begin{pmatrix} {2l} \\ {l} \end{pmatrix} ^m \ . \hspace{2cm} (8) $$ Note the close similarity to Eq. $(3)$. The $l=0$ term in $(8)$ yields $\big\langle P(Ms=0) \big\rangle_M = 2^{-n} $, which is the lower bound mentioned in the question. For large $n$, $m$, and $l$, Stirling's approximation gives to leading order $$ 2^{-2lm} \begin{pmatrix} {n} \\ {2l} \end{pmatrix} \begin{pmatrix} {2l} \\ {l} \end{pmatrix} ^m \sim \sqrt{\frac{n}{4\pi l(n-2l)}} \ {\rm e}^{n I(\frac{2l}n)} (\pi l)^{-\frac m2} \ , \hspace{2cm} (9) $$ where $I(p)$ denotes the "information entropy" of a 2-state system with probabilities $p$ and $1-p$: $$ I(p) = - p \ln(p) - (1-p) \ln(1-p) \ . \hspace{2cm} (10) $$ Note that the divergence of the right hand side of $(9)$ for $l \to 0$ is an artefact due to Stirling's approximation; the original expression on the left hand side has a $l \to 0$ limit equal to $1$. To find the dominant term(s) in the sum of $(8)$, we look for the maximum of the exponential part of $(9)$ as a function of $l$, disregarding for the moment the square root. That is, setting $p = \frac{2l}n$ and $q = \frac{m}n$, we look for the maximum of $$ {\rm e}^{n \left( I(p) - \frac q 2 \ln\frac{\pi p n}2 \right)} \hspace{2cm} (11) $$ as a function of $p \in ]0,1]$. The sum in $(8)$ is either dominated (for large enough $q$) by the $l=0$ term $1$, or otherwise by a term with $l \approx \frac{pn}2$, where $p$ is a solution of the equation $$ \frac q{2p} = I'(p) = \ln\frac{1-p}p \ . \hspace{2cm} (12) $$ If $q$ is larger than about $0.56$, $(11)$ has no solution $p \in ]0,1]$. For $q$ smaller than about $0.56$ there are two solutions, the smaller of the two corresponding to a local minimum of ${\rm e}^{n \left( I(p) - \frac q 2 \ln\frac{\pi p n}2 \right)}$ and the larger one to a local maximum. For large $n$, this local maximum dominates the sum in $(8)$ if and only if $I(p) - \frac q 2 \ln\frac{\pi p n}2 > 0$ for this $p$ value. For large $n$, the maximal $q$ that meets this criterion decreases logarithmically with increasing $n$, so in that case we may assume $\frac q{2p} \ll 1$ and thus obtain $p \approx \frac1 2$ from $(12)$. Inserting this back into the exponential expression $(11)$ yields an estimate of its maximum for large $n$, $m$, and $l$ $$ \max_{p \gg \frac1n} \left\{ {\rm e}^{n \left( I(p) - \frac q 2 \ln\frac{\pi p n}2 \right)} \right\} \approx {\rm e}^{n \left( \ln2 - \frac q 2 \ln\frac{\pi n}4 \right)} \ . \hspace{2cm} (13) $$ Finally, we can combine this with $(8)$, $(9)$ and obtain, to leading order for large $n$, $$ 2^{n} \Big\langle P(Ms=0) \Big\rangle_M \approx 1 + \sqrt{\frac{2}{\pi n}} \ {\rm e}^{n \left( \ln2 - \frac q 2 \ln\frac{\pi n}4 \right)} = 1 + 2^{n-\frac1 2} \left( \frac{\pi n}4 \right)^{-\frac{qn+1} 2} \ . \hspace{2cm} (14) $$ As long as the exponent in $(13)$ is negative, i.e. $q > \frac{\ln4}{\ln\frac{\pi n}4}$ (or $m > n \frac{\ln4}{\ln\frac{\pi n}4} $), the first term $=1$ in $(14)$ dominates, which means that only the trivial solution $s=0$ of $Ms=0$ has nonnegligible probability. Otherwise, the second term dominates, quantifying how the number of solutions increases when $m$ becomes small enough compared to $n$.

Probability two products are equal

Tags:

Linear Algebra

Pr.Probability

Related

Recent Posts