Probability of tossing a fair coin with at least $k$ consecutive heads

We can derive an explicit formula of the probability $P(N,k)$ based upon the Goulden-Jackson Cluster Method.

We consider the set of words $\mathcal{V}^{\star}$ of length $N\geq 0$ built from an alphabet $$\mathcal{V}=\{H,T\}$$ and the bad word $\underbrace{HH\ldots H}_{k \text{ elements }}=:H^k$ which is not allowed to be part of the words we are looking for. We derive a function $f_k(s)$ with the coefficient of $s^N$ being the number of wanted words of length $N$.

The wanted probability $P(N,k)$ can then be written as \begin{align*} P(N,k)=1-\frac{1}{2^N}[s^N]f_k(s) \end{align*}

According to the paper (p.7) from Goulden and Jackson the generating function $f_k(s)$ is \begin{align*} f_k(s)=\frac{1}{1-ds-\text{weight}(\mathcal{C})}\tag{1} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and with $\mathcal{C}$ the weight-numerator with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[H^k]) \end{align*} We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[H^k])&=-\frac{s^k}{1+s+\cdots+s^{k-1}}=-\frac{s^k(1-s)}{1-s^k} \end{align*}

We obtain the generating function $f(s)$ for the words built from $\{H,T\}$ which don't contain the substring $H^k$ \begin{align*} f_k(s)&=\frac{1}{1-2s+\frac{s^k(1-s)}{1-s^k}}\\ &=\frac{1-s^k}{1-2s+s^{k+1}}\tag{2}\\ \end{align*}

$$ $$

Note: For $k=2$ we obtain \begin{align*} f_2(s)&=\frac{1-s^2}{1-2s+s^{3}}\\ &=1+2s+3s^2+5s^3+8s^4+13s^5+21s^6+34s^7+\mathcal{O}(s^8) \end{align*}

The coefficients of $f_2(s)$ are a shifted variant of the Fibonacci numbers stored as A000045 in OEIS.

Note: For $k=3$ we obtain \begin{align*} f_3(s)&=\frac{1-s^3}{1-2s+s^{4}}\\ &=1+2s+4s^2+7s^3+13s^4+24s^5+44s^6+81s^7+\mathcal{O}(s^8) \end{align*}

The coefficients of $f_3(s)$ are a shifted variant of the so-called Tribonacci numbers stored as A000073 in OEIS.

$$ $$

We use the series representation of $f_k(s)$ in (2) to derive an explicit formula of the coefficients.

\begin{align*} [s^N]f(s)&=[s^N](1-s^k)\sum_{m=0}^{\infty}(2s-s^{k+1})^m\\ &=[s^N](1-s^k)\sum_{m=0}^{\infty}s^m(2-s^k)^m\\ &=[s^N](1-s^k)\sum_{m=0}^{\infty}s^m\sum_{j=0}^m\binom{m}{j}(-1)^js^{kj}2^{m-j}\\ &=([s^N]-[s^{N-k}])\sum_{m=0}^{\infty}s^m\sum_{j=0}^m\binom{m}{j}(-1)^js^{kj}2^{m-j}\tag{3}\\ &=\sum_{m=0}^{N}([s^{N-m}]-[s^{N-k-m}])\sum_{j=0}^m\binom{m}{j}(-1)^js^{kj}2^{m-j}\tag{4}\\ &=\sum_{{m=0}\atop{m\equiv N(\bmod k)}}^{N}\binom{m}{\frac{N-m}{k}}(-1)^{\frac{N-m}{k}}2^{m-\frac{N-m}{k}}\\ &\qquad-\sum_{{m=0}\atop{m\equiv N(\bmod k)}}^{N-k}\binom{m}{\frac{N-m}{k}-1}(-1)^{\frac{N-m}{k}-1}2^{m-\frac{N-m}{k}+1}\\ &=\sum_{{m=0}\atop{m\equiv N(\bmod k)}}^{k-1}\binom{m}{\frac{N-m}{k}}(-1)^{\frac{N-m}{k}}2^{m-\frac{N-m}{k}}\tag{5}\\ &\qquad+\sum_{{m=k}\atop{m\equiv N(\bmod k)}}^{N-k} \left(\binom{m}{\frac{N-m}{k}}-\frac{1}{2^k}\binom{m-k}{\frac{N-m}{k}}\right)(-1)^{\frac{N-m}{k}}2^{m-\frac{N-m}{k}}\\ \end{align*}

Comment:

  • In (3) we use the linearity of the coefficient of operator and $[s^N]s^mf(s)=[s^{N-m}]f(s)$

  • In (4) we change the limit of the left hand sum from $\infty$ to $N$ according to the maximum coefficient $[s^N]$. According to the factors $s^{kj}$ we consider in the following only summands with $m\equiv N(\bmod k)$

  • In (5) we reorganise the sums from the line above by extracting from the left hand sum the first summand and shifting in the right hand side the index by one and putting both sums together.

We conclude: An explicit representation of the probability $P(N,k)$ $(n\geq 0)$ is according to (5) \begin{align*} P(N,k)&=1-\sum_{{m=0}\atop{m\equiv N(\bmod k)}}^{k-1}\binom{m}{\frac{N-m}{k}}(-1)^{\frac{N-m}{k}}2^{-\frac{(k+1)(N-m)}{k}}\\ &\qquad-\sum_{{m=k}\atop{m\equiv N(\bmod k)}}^{N-k} \left(\binom{m}{\frac{N-m}{k}}-\frac{1}{2^k}\binom{m-k}{\frac{N-m}{k}}\right)(-1)^{\frac{N-m}{k}}2^{-\frac{(k+1)(N-m)}{k}} \end{align*}


Your recurrence relation is correct. I don't think you can do much better than that for general $k$, but you can find a closed form for specific values of $k$. For the first non-trivial value of $k$, the recurrence relation is

$$ p_n=p_{n-1}+(1-p_{n-3})/8\;. $$

With $p_n=1+\lambda^n$, the characteristic equation becomes $\lambda^3-\lambda^2+1/8=0$. One solution, $\lambda=1/2$, can be guessed, and then factoring yields $(\lambda-1/2)(\lambda^2-\lambda/2-1/4)$ with the further solutions $\lambda=(1\pm\sqrt5)/4$. Thus the general solution is

$$p_n=1+c_1\left(\frac12\right)^n+c_2\left(\frac{1+\sqrt5}4\right)^n+c_3\left(\frac{1-\sqrt5}4\right)^n\;.$$

The initial conditions $p_0=0$, $p_1=0$, $p_2=1/4$ determine $c_1=0$, $c_2=-(1+3/\sqrt5)/2$ and $c_3=-(1-3/\sqrt5)/2$, so the probability is

$$ \begin{align} p_n &=1-\frac{1+3/\sqrt5}2\left(\frac{1+\sqrt5}4\right)^n-\frac{1-3/\sqrt5}2\left(\frac{1-\sqrt5}4\right)^n\\ &=1-\frac4{\sqrt5}\left(\left(\frac{1+\sqrt5}4\right)^{n+2}-\left(\frac{1-\sqrt5}4\right)^{n+2}\right)\;. \end{align} $$

Thus, for large $n$ the probability approaches $1$ geometrically with ratio $(1+\sqrt5)/4\approx0.809$.

Tags:

Probability