Asymptotics of a Bernoulli-number-like function

[Revised and expanded to give the answer for all $k>1$ and incorporate further terms of an asymptotic expansion as $n \rightarrow \infty$]

Fix $k>1$, and write $a_1=f(1,k)=1$ and $$ a_n = f(n,k) = \frac1{1-q^{-n}} \sum_{r=1}^{n-1} {n \choose r} (1/k)^{n-r} (1/q)^r a_r \phantom{for}(n>1), $$ where $q := k/(k-1)$, so $(1/k) + (1/q) = 1$. Set $$ a_\infty := \frac1{k \log q}. $$ For example, if $k=2$ then $a_\infty = 1 / \log 4 = 0.72134752\ldots$, which $a_n$ seems to approach for large $n$, and likewise for $k=6$ (the dice-throwing case) with $a_\infty = 1/(6 \log 1.2) = 0.9141358\ldots$. Indeed as $n \rightarrow \infty$ we have "$a_n \rightarrow a_\infty$ on average", in the sense that (for instance) $\sum_{n=1}^N (a_n/n) \sim a_\infty \phantom. \sum_{n=1}^N (1/n)$ as $N \rightarrow \infty$. But, as suggested by earlier posted answers to Tim Chow's question, $a_n$ does not converge, though it stays quite close to $a_\infty$: we have $$ a_n = a_\infty + \epsilon^{\phantom.}_0(\log_q n) + O(1/n) $$ as $n \rightarrow \infty$, where $\epsilon^{\phantom.}_0$ is a smooth function of period $1$ whose average over ${\bf R} / {\bf Z}$ vanishes but is not identically zero; for large $k$ (already $k=2$ is large enough), $\epsilon^{\phantom.}_0$ is a nearly perfect sine wave with a tiny amplitude $\exp(-\pi^2 k + O(\log k))$, namely $$ \frac2{k\log q}\left|\phantom.\Gamma\bigl(1 + \frac{2\pi i}{\log q}\bigr)\right| \phantom.=\phantom. \frac2{k \log q} \left[\frac{(2\pi^2/ \log q)}{\sinh(2\pi^2/ \log q)}\right]^{1/2}. $$ For example, for $k=2$ the amplitude is $7.130117\ldots \cdot 10^{-6}$, in accordance with numerical observation (see previously posted answers and the plot below). For $k=6$ the amplitude is only $8.3206735\ldots \cdot 10^{-23}$ so one must compute well beyond the usual "double precision" to see the oscillations.

More precisely, there is an asymptotic expansion $$ a_n \sim a_\infty + \epsilon^{\phantom.}_0(\log_q n) + n^{-1} \epsilon^{\phantom.}_1(\log_q n) + n^{-2} \epsilon^{\phantom.}_2(\log_q n) + n^{-3} \epsilon^{\phantom.}_3(\log_q n) + \cdots, $$ where each $\epsilon^{\phantom.}_j$ is smooth function of period $1$ whose average over ${\bf R} / {\bf Z}$ vanishes, and — while the series need not converge — truncating it before the term $n^{-j} \epsilon^{\phantom.}_j(\log_q n)$ yields an approximation good to within $O(n^{-j})$. The first few $\epsilon^{\phantom.}_j$ still have exponentially small amplitudes, but larger that of $\epsilon^{\phantom.}_0$ by a factor $\sim C_j k^{2j}$ for some $C_j > 0$; for instance, the amplitude of $\epsilon^{\phantom.}_1$ exceeds that of $\epsilon^{\phantom.}_0$ by about $2(\pi / \log q)^2 \sim 2 \pi^2 k^2$. So $a_n$ must be computed up to a somewhat large multiple of $k^2$ before it becomes experimentally plausible that the residual oscillation $a_n - a_\infty$ won't tend to zero in the limit as $n \rightarrow \infty$.

Here's a plot that shows $a_n$ for $k=2$ (so also $q=2$) and $2^6 \leq n \leq 2^{13}$, and compares with the periodic approximation $a_\infty + \epsilon^{\phantom.}_0(\log_q n)$ and the refined approximation $a_\infty + \sum_{j=0}^2 n^{-j} \epsilon^{\phantom.}_j(\log_q n)$. (See http://math.harvard.edu/~elkies/mo11255+.pdf for the original PDF plot, which can be "zoomed in" to view details.) The horizontal coordinate is $\log_2 n$; the vertical coordinate is centered at $a_\infty = 1/(2 \log 2)$, showing also the lines $a_\infty \pm 2|a_1|$; black cross-hairs, eventually merging visually into a continuous curve, show the numerical values of $a_n$; and the red and green contours show the smooth approximations.

To obtain this asymptotic expansion, we start by generalizing R.Barton's formula from $k=2$ to arbitrary $k>1$: $$ a_n = \frac1k \sum_{r=0}^\infty \phantom. n q^{-r} (1-q^{-r})^{n-1}. $$ [The proof is the same, but note the exponent $n$ has been corrected to $n-1$ since we want $n-1$ players eliminated at the $r$-th step, not all $n$; this does not affect the limiting behavior $a_\infty+\epsilon^{\phantom.}_0(\log_q n)$, but is needed to get $\epsilon^{\phantom.}_m$ right for $m>1$.] We would like to approximate the sum by an integral, which can be evaluated by the change of variable $q^{-r} = z$: $$ \frac1k \int_{r=0}^\infty \phantom. n q^{-r} (1-q^{-r})^{n-1} = \frac1{k \log q} \int_0^1 \phantom. n (1-z)^{n-1} dz = \left[-a_\infty(1-z)^n\right]_{z=0}^1 = a_\infty. $$ But it takes some effort to get at the error in this approximation.

We start by comparing $(1-q^{-r})^{n-1}$ with $\exp(-nq^{-r})$: $$ \begin{eqnarray} (1-q^{-r})^{n-1} &=& \exp(-nq^{-r}) \cdot \exp \phantom. [nq^{-r} + (n-1) \log(1-q^{-r})] \cr &=& \exp(-nq^{-r}) \cdot \exp \left[q^{-r} - (n-1) \left( \frac{q^{-2r}}2 + \frac{q^{-3r}}3 + \frac{q^{-4r}}4 + \cdots \right) \right]. \end{eqnarray} $$ The next two steps require justification (as R.Barton noted for the corresponding steps at the end of his analysis), but the justification should be straightforward. Expand the second factor in powers of $u := nq^{-r}$, and collect like powers of $n$, obtaining $$ \exp(-nq^{-r}) \cdot \left( 1 - \frac{u^2-2u}{2n} + \frac{3u^4-20u^3+24u^2}{24n^2} - \frac{u^6-14u^5+52u^4-48u^3}{48n^3} + - \cdots \right). $$ Each term $n^{-j} \epsilon_j(\log_q(n))$ ($j=0,1,2,3,\ldots$) will arise from the $n^{-j}$ term in this expansion.

We start with the main term, for $j=0$, which is the only one that does not decay with $n$. Define $$ \varphi_0(x) := q^x \exp(-q^x), $$ which as Reid observed decays rapidly both as $x \rightarrow \infty$ and as $x \rightarrow -\infty$. Our zeroth-order approximation to $a_n$ is $$ \frac1k \sum_{r=0}^\infty \phantom. \varphi_0(\log_q(n)-r), $$ which as $n \rightarrow \infty$ rapidly nears $$ \frac1k \sum_{r=-\infty}^\infty \varphi_0(\log_q(n)-r). $$ For $k=q=2$, this is equivalent with Reid's formula for $R(n)$, even though he wrote it in terms of the fractional part of $\log_2(n)$, because the sum is clearly invariant under translation of $\log_q(n)$ by integers.

We next apply Poisson summation. Since $\sum_{r \in {\bf Z}} \phantom. \varphi_0(t+r)$ is a smooth ${\bf Z}$-periodic function of $t$, it has a Fourier expansion $$ \sum_{m\in{\bf Z}} \phantom. c_m e^{2\pi i m t} $$ where $$ c_m = \int_0^1 \left[ \sum_{r \in {\bf Z}} \phantom. \varphi_0(t+r) \right] \phantom. e^{-2\pi i m t} dt = \int_{-\infty}^\infty \varphi_0(t) \phantom. e^{-2\pi i m t} dt = \hat\varphi_0(-m). $$ Changing the variable of integration from $t$ to $q^t$ lets us recognize the Fourier transform $\hat\varphi$ as $1/\log(q)$ times a Gamma integral: $$ \hat\varphi_0(y) = \frac1{\log q} \Gamma\Bigl(1 + \frac{2 \pi i y} {\log q}\Bigr). $$ This gives us the coefficients $a_m$ in closed form. The constant coefficient $a_0 = 1 / (\log q)$ can again be interpreted as the approximation of the Riemann sum $\sum_{r \in {\bf Z}} \phantom. \varphi_0(t+r)$ by an integral; the oscillating terms $a_m e^{2\pi i m t}$ for $m \neq 0$ are the corrections to this approximation, and are small due to the exponential decay of the Gamma function on vertical strips — indeed we can compute the magnitude $|a_m|$ in elementary closed form using the formula $|\Gamma(1+i\tau)| = (\pi\tau / \sinh(\pi\tau))^{1/2}$. So we have $$ \frac1k \sum_{r \in \bf Z} \phantom. \varphi_0(\log_q(n)-r) = \frac1k \sum_{m \in \bf Z} \phantom. \hat\varphi_0(-m) e^{2\pi i \log_q(n)} = a_\infty + \epsilon_0(\log_q(n)) $$ where $a_\infty = a_0 / k = 1 / (k \log q)$ as above, and $\epsilon^{\phantom.}_0$, defined by $$ \epsilon^{\phantom.}_0(t) = \left[ \sum_{r\in\bf Z} \phantom. \varphi_0(t+r) \right] - a_\infty, $$ has the Fourier series $$ \epsilon^{\phantom.}_0(t) = \frac1k \sum_{m \neq 0} \hat\varphi_0(-m) e^{2\pi i m t}. $$ Taking $m = \pm 1$ recovers the amplitude $2|a_1|/k$ exhibited above; the $m = \pm 2$ and further terms yield faster but tinier oscillations, e.g. for $k=2$ the $m=\pm 2$ terms oscillate twice as fast but with amplitude only $6.6033857\ldots \cdot 10^{-12}$.

The functions $\epsilon^{\phantom.}_j$ appearing in the further terms $n^{-j} \epsilon^{\phantom.}_j(\log_q(n))$ of the asymptotic expansion of $a_n$ are defined similarly by $$ \epsilon^{\phantom.}_j(t) = \frac1k \sum_{r\in\bf Z} \phantom. \varphi_j(t+r), $$ where $$ \varphi_j(x) = P_j(q^x) \varphi_0(x) = P_j(q^x) q^x \exp(-q^x) $$ and $P_j$ is the coefficient of $n^{-j}$ in our power series $$ (1-q^r)^{n-1} = \exp(-nq^{-r}) \phantom. \sum_{j=0}^\infty \frac{P_j(nq^{-r})}{n^j}. $$ Thus $ P_0(u)=1, \phantom+ P_1(u) = -(u^2-2u)/2, \phantom+ P_2(u) = (3u^4-20u^3+24u^2)/24 $, etc. Again we apply Poisson to expand $\epsilon^{\phantom.}_j(\log_q(n))$ in a Fourier series: $$ \epsilon^{\phantom.}_j(t) = \frac1k \sum_{m \in \bf Z} \hat\varphi_j(-m) e^{2\pi i m t}, $$ and evaluate the Fourier transform $\hat\varphi_j$ by integrating with respect to $q^t$. This yields a linear combination of Gamma integrals evaluated at $1 + (2\pi i y / \log q) + j'$ for integers $j' \in [j,2j]$, giving $\hat\varphi_j$ as a degree-$2j$ polynomial multiple of $\hat\varphi_0$. The first case is $$ \begin{eqnarray*} \hat\varphi_1(y) &=& \frac1{\log q} \left[ \Gamma\Bigl(2 + \frac{2 \pi i y} {\log q}\Bigr) - \frac12 \Gamma\Bigl(3 + \frac{2 \pi i y} {\log q}\Bigr) \right] \\ &=& \frac1{\log q} \frac{\pi y}{\log q} \left(\frac{2 \pi y}{\log q} - i\right) \phantom. \Gamma\Bigl(1 + \frac{2 \pi i y} {\log q}\Bigr) \\ &=& \frac{\pi y}{\log q} \left(\frac{2 \pi y}{\log q} - i\right) \phantom. \hat\varphi_0(y). \end{eqnarray*} $$ Note that $\varphi_1(0) = 0$, so the constant coefficient of the Fourier series for $\epsilon^{\phantom.}_1(t)$ vanishes; this is indeed true of $\epsilon^{\phantom.}_j(t)$ for each $j>0$, because $\hat\varphi_j(0) = \int_{-\infty}^\infty \phi_j(x) \phantom. dx$ is the $n^{-j}$ coefficient of a power series in $n^{-1}$ that we've already identified with the constant $a_\infty$. Hence (as can also be seen in the plot above) none of the decaying corrections $n^{-j} \epsilon^{\phantom.}_j(\log_q n)$ biases the average of $a_n$ away from $a_\infty$, even when $n$ is small enough that those corrections are a substantial fraction of the residual oscillation $\epsilon_0(\log_q n)$. This leaves $\hat\varphi_j(\mp1) e^{\pm 2 \pi i t} / k$ as the leading terms in the expansion of each $\epsilon^{\phantom.}_j(t)$, so we see as promised that $\epsilon^{\phantom.}_j$ is still exponentially small but with an extra factor whose leading term is a multiple of $(2\pi / \log q)^{2j}$.


I suspect the limit doesn't exist, believe it or not. It seems like it should! But plotting $f(n,2)$ for n from, say, 30 to 2000, it looks like we have

$$ f(n) = C + \omega(n) + o(1) $$

where $C \approx 0.721347$ and $\omega(n)$ is a certain function which is "log-periodic" in the sense that $\omega(n) = \omega(2n)$. $\omega(n)$ never gets bigger than about $8 \times 10^{-6}$, so you have to look pretty closely to see it.

This sort of very small log-periodic fluctuation appears in certain questions from the mathematics of analysis of algorithms. For example, the mean length of the longest run in a random binary word of length $n$ is $\log_2 n + C + P(\log_2 n) + O(n^{-1/2} \log^2 n)$, where $P$ is a continuous periodic function of period 1; see Flajolet and Sedgewick, Analytic Combinatorics, Proposition V.1. I've also seen similar results for problems involving binary trees.

Something similar seems to be true for $f(n,3)$, except that the log-periodic fluctuations appear to be on the order of $10^{-9}$ and the fluctuations are ``faster''; if we let $\omega$ denote the log-periodic function as above, then we have something like $\omega(n) = \omega(1.5 n)$ (I write 1.5, not $3/2$, because I am not at all confident that the relevant constant is $3/2$; I'm not doing the computations with enough precision to be sure.)


If my computation is correct, then f(n, 2) should be roughly

$$\frac12 \sum_{k \in \mathbb{Z}} 2^{k+s} e^{-2^{k+s}}$$

where s = the fractional part of $\log_2 n$. (Note the terms of the sum decay rapidly both as $k \to \infty$ and $k \to -\infty$.)


OK, here is the argument.

For n ≥ 2, f(n, 2) is equal to the average value over all subsets S of {1, ..., n} of f(|S|, 2) (including S = {1, ..., n}). So we may imagine a process where we start with {1, ..., n} and at each step we pick a subset uniformly at random of our current subset. f(n, 2) is the probability that the last time before we hit the empty set, our set contained just one element.

Consider what happens to each element of {1, ..., n} in this process. At each stage, if it is still in our subset, we keep it with probability 1/2 and throw it out with probability 1/2. So, the probability that it remains after r steps is 2-r. Thus we have n independent variables X1, ..., Xn, each with this exponential distribution, and what we seek is the probability that among them there is a unique maximum.

We may express this probability as a sum over the value of this maximum r. The probability of this occurring for any given r is

$$n(2^{-r} - 2^{-(r+1)})(1 - 2^{-r})^n = \frac 12 n 2^{-r} (1 - 2^{-r})^n.$$

So, $$f(n,2) = \sum_{r \ge 0} \frac 12 n 2^{-r} (1 - 2^{-r})^n.$$

Write $r = \log_2 n + u$. Then the rth summand becomes

$$\frac 12 2^{-u} (1 - 2^{-u}/n)^n.$$

From here the derivation is not entirely rigorous, but as $n$ increases, this value is roughly

$$\frac 12 2^{-u} e^{-2^{-u}}.$$

Now u is varying over values of the form $r - \log_2 n$ for $r$ a nonnegative integer, so -u ranges over values of the form $k + s$, s = the fractional part of $\log_2 n$, for $-\infty < k \le \lfloor \log_2 n \rfloor$ an integer. As n goes to ∞, we obtain the entire sum shown at the top.