Problem about expectation of maximum partial sum

[Edited to add Sage plots of the asymptotic distributions for $n=2,3,4,5,6,7$ and $n=10,20,30,40$]

There are $m-1 \choose n-1$ "compositions" of $m$ into $n$ positive parts $X_i$, because they correspond bijectively with choices of $n-1$ partial sums out of $\lbrace 1, 2, \ldots, m-1\rbrace$. By symmetry, each $X_i$ has expectation $m/n$. For large $n$ and much larger $m$, the sequence of normalized partial sums $$ S_i = \sum_{h=1}^i Y_h = \sum_{h=1}^i (X_h - \mathbb{E}(X_h)) = \Bigl(\sum_{h=1}^i X_h \Bigr) - \frac{i}{n} m $$ looks like a scaled "Brownian bridge", i.e. ${\rm BB}(t) = B(t) - B(1)t$ where $B(t)$ is Brownian motion on $[0,1]$. The distribution of $\max_t {\rm BB}(t)$ is known thanks to a reflection trick, and thus so is its expectation. If I did this right, for $\sigma > 0$ the probability that $S_\max := \max_i S(i)$ exceeds $\sigma$ approaches $\exp(-2n(\sigma/m))^2$, which would make $$ \mathbb{E}(S_\max) \sim \int_0^\infty e^{-2n(\sigma/m)^2} d\sigma = \sqrt{\frac\pi{8n}} \cdot m. $$ Since the question at hand asks only for the expectation of $S_\max$, not subtler features of its distribution, the estimate should be reasonably good for moderately large $n$ and $m$. As Aaron Meyerowitz noted, for fixed $n$ we have $$ \mathbb{E}(S_\max) = c_n m + O(1) $$ as $m \rightarrow \infty$; computation of the constants $c_n$ for $n \leq 40$ suggests that $n^{1/2} c_n \rightarrow \sqrt{\pi/8}$ but not very quickly: $n^{1/2} (\sqrt{\pi/8} - n^{1/2} c_n)$ seems to be converging to about $0.66$, and while $\sqrt{\pi/8} = 0.626657\ldots$, the computed $n^{1/2} c_n$ first exceeds $0.5$ only at $n = 27$. The formulas below may give a start towards proving this behavior.

Getting good estimates for finite values of $n$ seems unwieldy because computing the probability that $\max_i S(i) \leq \sigma$ requires counting "compositions" that satisfy $X_1 + \ldots + X_i \leq (i/n) m + \sigma$ for each $\sigma$, which gives rise to an intimidating inclusion-exclusion. Fortunately the answers to this mathoverflow query, which appeared just yesterday, give for any increasing integer sequence $0 < a_1 < a_2 < \ldots < a_N$ a nice determinant formula for the number of $N$-tuples $(x_1,\ldots,x_N)$ of integer sequences satisfying $0 < x_1 < \ldots < x_N$ and $x_i \leq a_i$ for each $i \leq N$. In our setting, we compute the number of "compositions" with $\max_i S(i) \leq \sigma$ by taking $N = n-1$, $x_i = \sum_{h=1}^i X_h$, and each $a_i = \min (m-1, \lfloor (i/n) m + \sigma \rfloor)$. The determinant has order $N$, with $(i,j)$ entry equal $a_i + j - i \choose j - i + 1$ [which is $1$ on the first subdiagonal $i = j+1$, and $0$ on the triangle $i > j+1$ under this subdiagonal]. This makes it feasible to compute exactly the distribution of $S_\max$ for moderately large $n$ and any $m$, and also the asymptotic distribution as $m \rightarrow \infty$.

For given $n$, the asymptotic distribution of $S_\max / m$ is piecewise polynomial, but takes a while to look like the limiting form proportional to $\sigma \exp(-2n(\sigma/m)^2) d\sigma$; notably it is discontinuous at $\sigma = 1/n$ with a substantial jump downwards. Here are Sage plots showing these distributions in red/orange/green/blue/purple/black for $n=2$ through $n=7$:

_{(source: harvard.edu)}

(The total area is $(n-1)/n$, not $1$, because $1/n$ of the measure is concentrated at $\sigma = 0$). Likewise for $n=10,20,30,40$, plotting only $0 < \sigma < 0.6$ because the density is too small to be seen for $\sigma \geq 0.6$:

_{(source: harvard.edu)}

(after much more computing, most of it for the $40 \times 40$ determinants; at $n=40$ the graph finally crests after the downwards plunge at $1/n$).

Finally, a tabulation of the values of $c_n$ for $n = 1, 2, \ldots, 40$. For each $n$ I give $n^n c_n$, which is an integer except for $n=2$ when $c_2 = 1/8$. This value, as well as each of $c_3 = 4/3^3$, $c_4 = 39/4^4$, and $c_5 = 472 / 5^5$, agrees with A.Meyerowitz's calculations.

[1, 0]
[2, 1/2]
[3, 4]
[4, 39]
[5, 472]
[6, 6900]
[7, 118716]
[8, 2354072]
[9, 52911216]
[10, 1330107840]
[11, 36991592500]
[12, 1127914077312]
[13, 37420777559496]
[14, 1342183358856704]
[15, 51756244887797100]
[16, 2135359495833676800]
[17, 93864296121282210784]
[18, 4379542774345464496128]
[19, 216178594161376244063076]
[20, 11255374377126199463936000]
[21, 616463053079082019376239800]
[22, 35432440194603405959506034688]
[23, 2132478311049609494982190450204]
[24, 134116927093400952330702474706944]
[25, 8798258310785305861553627142930000]
[26, 601017143689398400881632598032384000]
[27, 42684847106441394367226307718311565716]
[28, 3147222817221577402622824207375266742272]
[29, 240582153893938356991908848927905622445736]
[30, 19043079550271688145972837306025115648000000]
[31, 1558981284930199828739239320361571788041021900]
[32, 131855889346498739861328689280063889600321421312]
[33, 11509801310013312943392059948175963018705195688896]
[34, 1035923337749909421439098617643978496876513679376384]
[35, 96047358406199526246797502389944873251740366086562500]
[36, 9165799239775410749318809193562746250766254788837376000]
[37, 899564153243436548625817312320806272340082850282897445784]
[38, 90727638906463992065814103522957255273344987915765288009728]
[39, 9396779387234810402125876063643842517670905874506382252419196]
[40, 998740925886849063603687252012149942602287367747692134400000000]

If I understand the problem correctly then $\mathbb{E}(X_i)=\frac{m}{n}$ So that the $Y_i$ are values from $0-\frac mn,1-\frac{m}{n},\dots,m-\frac mn$ which add to $0$.

If we would instead compose $m-n$ as an ordered sum of $n$ non-negative integers this would have the effect of lowering the $X_i$ and expectation by one each but would give the same $Y_i$ and expectation for the maximum of the $S_i.$ That way of setting up the problem might lead to neater expressions.

It seems to me that the expected maximum would asymptotically be $$c_nm+e_r+O(1/m)$$ where $e_r$ depends on the congruence class of $m$ mod $n.$ The fuzzy reasoning is that if we were to take a composition of $m$ and triple all the entries we would get a composition of $m'=3m$ with thrice the expected maximum. To get an arbitrary composition of $ 3m$ we might move some of the parts up or down by 1, but that would not have a big effect on the ratio of the expected maximum to $m'$. Beyond that I'll just say that the calculations below support this supposition with $$c_2=\frac{1}{8},c_3=\frac{4}{27},c_4=\frac{39}{256},c_5=\frac{472}{3125}.$$

More specifically, calculations strongly suggest for each $0 \le r \le n-1$ there is a polynomial $p_r(q)$ of degree $n$ such that for $m=qn+r$ $$\mathbb{E}(S_{\max})=\frac{p_r(q)}{(m-1)(m-2)\dots(m-n)}.$$ In all cases the leading coefficient is the same, namely $n^{n}c_n.$ Assuming that that is the case, one can find the coefficients of $p_r(q)$ and hence $c_n$ by computing $\mathbb{E}(S_{\max})$ for $m=r,n+r,2n+r,\dots,n^2+r.$ And going a bit further provides a check. Eventually this would be too hard a calculation, but the obvious naive strategy works well for a while (I did $n \le 4$ and $m \le 70$ with Maple.)

later N. Elkies' great answer explains (and establishes) these facts and much more, but I leave this amended naive approach for what it is worth.

For $n=2$ the expected value of the maximum is $\frac{m-1}{8}$ or $\frac{m-1}{8}-\frac{1}{8(m-1)}$ according as $m$ is odd or even. In the odd case, for $1 \le X_1 \le \frac{m-1}{2}$ the maximum is $S_2=0.$ And for $\frac{m+1}{2} \le X_1 \le m-1$ the maximum is $S_1=X_1-\frac{m}{2}$ with expected value $\frac{1/2+(m-2)/2}{2}=\frac{m-1}{4}.$ When $m$ is even there is also the composition into 2 equal parts with $S_1=S_2=0$ and it slightly lowers the answer. After some arithmetic the expected maximum turns out to be as stated.

For $n=3$ the expected maximum appears to be about $\frac{4}{27}m$ and exactly $\frac{2(q-1)^2(2q-3)}{(m-1)(m-2)},\frac{2(q-1)q(2q-3)}{(m-1)(m-2)}$ or $\frac{2q^2(2q-3)}{(m-1)(m-2)}$ according as $m=3q-1,m=3q$ or $m=3q+1.$ The sequence A200067 from the OEIS may be related.

In the case that $n=4$ the expected maximum appears to be $\frac{39}{256}m+O(1)$ In the case $m=4q$ the expected maximum appears to be exactly $\frac{3(13q^2-13q+3)q(q-1)}{(m-1)(m-2)(m-3)}.$ When $m=4q-1$ or $m=4q+1$ the term q(q-1) is replaced by $(q-1)^2$ or $q^2$ respectively. This is similar to the $n=3$ case above. For $m=4q+2$ it comes out to be $\frac{3q^2(13q^2+2)}{(m-1)(m-2)(m-3)}$

For $n=5$ similar considerations give an expected maximum of $\frac{472}{3125}m+O(1)$

I can't comment , but based on Noam D. Elkies's numbers and OEIS A000435 it looks as if

$$c_n = \frac{(n-1)!}{2 n^n} \sum_{k=0}^{n-2} \frac{n^k}{k!}.$$

Problem about expectation of maximum partial sum

Tags:

Co.Combinatorics

Pr.Probability

Related

Recent Posts