Convergence of probabilistic power tower $e^{\pm e^{\pm e^{...}}}$
Marcus Luebke's answer is correct and he fully deserves the bounty. I think there is a clearer way to prove this and there's more to be said about the limiting distribution, so I'm posting my solution. As an aside, this is by far the most fun I have ever had answering a question on this site - thank you for asking such an interesting one!
Theorem 1: The power tower $e^{\epsilon_1 e^{\epsilon_2 e^{\epsilon_3 e^{\cdots}}}}$ converges for all sequences $(\epsilon_n)_{n\in \mathbb{N}}\in\{-1,1\}$ such that $\epsilon_n = -1$ for at least one $n$.
Proof: As you note, $e^{-e^{-e^{-e^\cdots}}}$ converges. Hence if only finitely many of $\epsilon_n =1$, the power tower converges. If only finitely many are $-1$, then the tower ends in a tail of $e^{e^{e^\cdots}} = \infty$, so after the last $-1$, you will have $e^{-e^{e^\cdots}} = 0$, and the tower clearly converges.
If there are infinitely many $1$'s and $-1$'s, it's a little bit trickier to prove convergence. In this case, it suffices to show that the tower converges if $\epsilon_1 = -1$. We choose a subsequence $n_k$ such that $n_1=1$ and for all $k>1$, $\epsilon_{n_k} = -1 = -\epsilon_{n_k - 1}$, and furthermore $n_{k} - n_{k-1} \ge 2$. Such a subsequence exists by the assumption that $\epsilon_n$ changes sign infinitely often. Now define:$$ f_k(x) = e^{\epsilon_{n_{k}}e^{\cdots^{e^{\epsilon_{n_{k+1}-1} x}}}} $$ Therefore the $(n_{k+1}-1)$th power tower is $$ f_1(f_2(\cdots f_k(e)\cdots)) $$ Now observe the following properties of $f_k:$
- $f_k(x)$ is non-negative and monotonic in $x$.
- $\sup_{x\in\mathbb{R}} f_k(x) \le 1$.
- $\inf_{x\in\mathbb{R}} f_k(x) \ge 0$
- For all $x\in\mathbb{R}$, we can bound the derivative $|f_k'(x)| \le \frac1e$. (See footnote (1) for proof of this fact)
Let $g_k = f_1\circ f_2 \circ f_3 \cdots \circ f_k$. Then observe if $N > n_{k+1}$ there exists $x\in\mathbb{R}$ such that $$ e^{\epsilon_1 e^{\epsilon_2 e^{\cdots^{\epsilon_{N}e}}}} = g_k(x) $$ so it suffices to show that the range of $g_k(x)$ converges to single point. The bound on the derivative of $f_k$ implies $|g_k'(x)|\le e^{-k}$ for all $x\in\mathbb{R}$, which is almost good enough (uniform convergence of the derivatives to $0$ implies if $g_k(x)$ converges to a function, it converges uniformly to a constant function).
Notice that for each $k$, $g_k$ is either an increasing function on $\mathbb{R}$ or a decreasing function on $\mathbb{R}$. Let $M_k = \sup\limits_{x\in\mathbb{R}} g_k(x)$ and $m_k = \inf\limits_{x\in\mathbb{R}} g_k(x)$. Clearly both are finite for all $k\ge 1$. For $k$ such that $g_k$ is increasing:$$ M_{k+1} = \sup\limits_{x\in\mathbb{R}} g_{k+1}(x) = \sup\limits_{x\in\mathbb{R}} g_{k}(f_k(x))= g_k\left(\sup\limits_{x\in\mathbb{R}}f_k(x)\right) \le g_k(1) $$ and similarly $$ m_k =\inf\limits_{x\in\mathbb{R}} g_{k+1}(x) = \inf\limits_{x\in\mathbb{R}} g_{k}(f_k(x)) = g_k\left(\inf\limits_{x\in\mathbb{R}}f_k(x)\right) \ge g_k(0) $$ These imply that there exist $x_k,y_k\in [0,1]$ such that $M_{k+1} = g_k(x_k)$ and $m_{k+1} = g_k(y_k)$. For $k$ such that $g_k$ is decreasing we similarly have have \begin{eqnarray} m_{k+1} \ge g_k(1)\\ M_{k+1} \le g_k(0) \end{eqnarray} and so we can again pick $x_k,y_k\in[0,1]$ such that $M_{k+1} = g_k(x_k)$ and $m_{k+1} = g_k(y_k)$. Since $|g'_k(x)| \le e^{-k}$, we have $|g_k(x_k) - g_k(y_k)| \le e^{-k}|x_k - y_k| \le e^{-k}$, so the above conditions for $M_k$ and $m_k$ imply $M_{k+1} - m_{k+1} \le e^{-k}$ for all $k$. Furthermore, we have \begin{eqnarray} M_{k+1} &=& g_k(x_k) \le M_k\\ m_{k+1} &=& g_k(y_k) \ge m_k \end{eqnarray} hence $M_k$ decreases and $m_k$ increases. This, together with the fact $M_k>m_k$, implies both $M_k$ and $m_k$ converge to a limit, but we also know $M_k - m_k \le e^{-k} \rightarrow 0$, so they converge to the same limit, implying $g_k(x)$ converges uniformly to a constant function, which in turn implies convergence of the power tower. $\blacksquare$
This implies that the probabilistic power tower converges almost surely as long as $-1$ almost surely occurs somewhere in the sequence $\epsilon_1,\epsilon_2,\epsilon_3,\dots$ But what numbers are possible limits of such a sequence? It turns out that any nonnegative number is, as shown by this result:
Theorem 2: For all $x\in [0,\infty)$ there exists a sequence of $\epsilon_n$'s such that the corresponding power tower converges to $x$ in the limit.
Proof: Let $x\in [0,\infty)$. Define $x_n(t)$ recursively by \begin{eqnarray} x_0(t) &=& t \\ x_{n+1}(t) &=&\begin{cases} x_n(e^{t}) & x_n(0) < x, x_n'(t)>0 \\ x_n(-e^{t}) & x_n(0) \ge x, x_n'(t)>0\\ x_n(-e^{t}) & x_n(0) < x, x_n'(t) < 0\\ x_n(e^t) & x_n(0) \ge x, x_n'(t) < 0\end{cases} \end{eqnarray} By theorem 1, $x_n(t)$ converges uniformly to a limit. Note that $x_n(1)$ is a power tower of type $e^{\pm e^{\pm e^{\cdots^{\pm e}}}}$, and also $x_n(t)$ is either strictly increasing or strictly decreasing on its whole domain. We would like to show $x_n(t)\rightarrow x$, which will prove existence. Let $A_n = \sup x_n(t)$ and $B_n = \inf x_n(t)$. Observe that $A_0 \ge x \ge B_0$.
Claim: For all $n$, $x\in [B_n,A_n]$. We prove by induction. Assuming it is true for $n$, we have four cases.
- If $x_n(0) < x$ and $x_n'(t)>0$: Then $x_{n+1}(t) = x_n(e^t)$ so $A_{n+1} = x_n(\infty) = A_n \ge x$ and $B_{n+1} = x_n(0) < x$.
- If $x_n(0) \ge x$ and $x_n'(t)>0$: Then $x_{n+1}(t) = x_n(-e^{t})$ so $A_{n+1} = x_n(0) = A_n \ge x$ and $B_{n+1} = x_n(-\infty) = B_n \le x$.
- If $x_n(0) < x$ and $x_n'(t) < 0$: Then $x_{n+1}(t) = x_n(-e^{t})$ so $A_{n+1}=x_n(-\infty) = A_n \ge x$ and $B_{n+1} = x_n(0) < x$.
- If $x_n(0) \ge x$ and $x_n'(t) < 0$: Then $x_{n+1}(t) = x_n(e^t)$ so $A_{n+1}=x_n(0) \ge x$ and $B_{n+1} = x_n(\infty) = B_n \le x$.
This shows that $x_n(t) \rightarrow x$ for all $t$. In particular $x_n(1)\rightarrow x$, so there exists $\epsilon\in\{-1,1\}^\mathbb{N}$ such that $$ x = e^{\epsilon_1 e^{\epsilon_2 e^{\epsilon_3 e^{\cdots}}}} $$ $\blacksquare$
The representation is not in fact unique (as I claimed in a previous version). A countable set of numbers have two possible representations. For example:$$ 1 = e^{e^{-e^{e^{e^{e^\cdots}}}}} = e^{-e^{-e^{e^{e^{e^\cdots}}}}} = e^0 $$ and similarly any finite-height power tower of the form $e^{\pm e^{\pm e^{\pm e^{\cdots ^{\pm e}}}}}$ corresponds to exactly two representations. It is not hard to show that for all numbers not of this form, the representation as an infinite power tower $e^{\pm e^\cdots}$ is unique.
For the rest of this answer, I'll examine your particular distribution where $\epsilon_n$ are i.i.d. with $\epsilon_n =\begin{cases}1 & \mbox{ w.p. } p \\ -1& \mbox{ otherwise}\end{cases}$, where $0<p<1$. In your notation,$$ P_\epsilon = e^{\epsilon_1 e^{\epsilon_2 e^{\cdots}}} $$ is the limiting distribution of the power tower.
Theorem 3: $$ \mathbb{E} P_\epsilon = \infty $$
Proof: The probability that $(\epsilon_1,\cdots,\epsilon_n) = (1,\cdots,1)$ is $p^n$. Hence we have probability $p^n$ of observing $$P_\epsilon = \exp^n\left(e^{\epsilon_{n+1}e^{\cdots}}\right)$$ where $\exp^n$ represents the $n$th iteration of $\exp$. We therefore have at least a $p^n$ probability of observing $$ P_\epsilon \ge \exp^n(0) = {{^{(n-1)}}e} $$ hence, using the fact that $P_\epsilon \ge 0$, we observe for all $n$: $$ \mathbb{E} P_\epsilon \ge {p^n} \left({{^{n-1}}e}\right) $$ Since tetration grows faster than exponentiation decays, this is unbounded in $n$ so $\mathbb{E}P_\epsilon = \infty$.$\blacksquare$
Since we have tetrational growth of $P_\epsilon$, it makes sense to look at instead the expectation of its super-logarithm. To avoid problems with how to interpolate tetration for non-integer values, we can look at $\lfloor\mathrm{slog}(P_\epsilon)\rfloor$, which is well-defined and does not depend on choice of interpolation:
Theorem 4: $$ \mathbb{E}\lfloor\mathrm{slog}( P_\epsilon)\rfloor = \frac{2p-1}{1-p} $$ Proof: Observe that for any $x<0$, $\mathrm{slog}(x) \in (-2,-1)$. Hence:\begin{eqnarray} \mathbb{E}\lfloor\mathrm{slog}( P_\epsilon)\rfloor &=& \mathbb{E}\lfloor\mathrm{slog}( e^{uP_\epsilon})\rfloor \\&=& 1+\mathbb{E}\lfloor\mathrm{slog}(u P_\epsilon)\rfloor\\ &=& 1+p\mathbb{E}\lfloor\mathrm{slog}( P_\epsilon)\rfloor+(1-p)\mathbb{E}\lfloor\mathrm{slog}( -P_\epsilon)\rfloor\\ &=& 1+p\mathbb{E}\lfloor\mathrm{slog}( P_\epsilon)\rfloor +(1-p)(-2)\\ &=& 2p-1 + p\mathbb{E}\lfloor\mathrm{slog}( P_\epsilon)\rfloor \end{eqnarray} Solving for $\mathbb{E}\lfloor\mathrm{slog}( P_\epsilon)\rfloor$ gives the desired result.$\blacksquare$
In fact, one can show $\lfloor\mathrm{slog}(P_\epsilon)\rfloor$ has a geometric distribution with a shift.
I'll conclude by looking at some other properties of the limiting distribution:
Recall that $P_\epsilon \stackrel{d}{=} e^{u P_\epsilon}$ where $u$ is independent of $\epsilon$ with distribution but has the same distribution as each $\epsilon_j$. This gives us a way to compute the distribution function for $P_\epsilon$. Here I'm only going to consider the case of $p=\frac12$, but all of this can be generalized in a straightforward way.
The median of $P_\epsilon$ is $1$; $$ P(P_\epsilon \le 1) = \frac12 $$ because $P_\epsilon \le 1$ if and only if $\epsilon_1 = -1$.
In fact, we can compute explicit values for $P(P_\epsilon\le t)$ for many other values of $t$. Note that any $x$ can only be represented by at most two distinct sequences of $\epsilon_n$, so $P_\epsilon$ has no atoms. We can actually solve semi-explicitly for the distribution function $P_\epsilon$ because $P_\epsilon$ has the same distribution as $e^{u P_\epsilon}$. This gives us $$ P(P_\epsilon \le t) = P(e^{u P_\epsilon} \le t) = \frac12P(e^{P_\epsilon} \le t) + \frac12 P(e^{-P_\epsilon} \ge t) = \frac12 P(P_\epsilon \le \log t) + \frac12 P(-P_\epsilon \le \log t) = \frac12 P(P_\epsilon \le \log t) + \frac12 P(P_\epsilon \ge -\log t) $$ Letting $F(t) = P(P_\epsilon\le t)$ be the distribution function, we therefore have $$ F(t) = \frac12 F(\log t) + \frac12(1-F(-\log t)) = \frac12\left(1 + F(\log t) - F(-\log t)\right) $$ Since $F(t) = 0$ for $t\le 0$, this is equivalent to $$ F(t) = \frac12 + \frac12\mathrm{sgn}(\log t) F(|\log t|) $$ as long as $t\ne 1$. This allows us to explicitly compute some values of $F$. For instance we can find $F(\Omega)$, where $\Omega\approx 0.56714...$ is the Omega constant: $F(\Omega) = \frac12-\frac12F(\Omega)$ implies $F(\Omega) = \frac13$. This formula actually allows us to compute $F(t)$ effectively by iterating $t\to |\log t|$. Letting $L_n(t)$ be defined recursively by $L_{n+1}(t) = L_n(|\log t|)$, with $L_1(t) = \log t$, one can see:$$ F(t) = \frac12 + \frac{\mathrm{sgn}({L_1(t)})}4 +\frac{\mathrm{sgn}({L_2(t)})\mathrm{sgn}({L_1(t)})}8 + \cdots + \frac{\mathrm{sgn}(L_1(t))\cdots\mathrm{sgn}(L_m(t))}{2^{m+1}}F(|L_m(t)|) = \sum_{n=0}^\infty \frac {\prod_{k=1}^n \mathrm{sgn}(L_k(t))}{2^{n+1}} $$ The dynamics of $L_n(t)$ are pretty chaotic. Since $|\log t|$ has periodic points with period $3$, Sharkovskii's theorem implies that it has points of all periods.(2)
The value of $F$ may be computed explicitly for any periodic point. For example, take the 2-periodic pair $(a,b) = (0.269874...,1.30980...)$. Then we have \begin{eqnarray} F(a) = \frac12 - \frac12 F(b)\\ F(b) = \frac12 + \frac12 F(a) \end{eqnarray} which has solution $F(a) = \frac15$ and $F(b) = \frac35$.
Here's a graph of the function (computed using by summing the first 20 terms of the series):
It looks weird and fractal-ish. In fact:
Theorem 5: $F(t)$ is non-differentiable on a dense set.
Proof: First we show $F$ is not differentiable at $t=0$:$$ \lim_{h\rightarrow 0^+} \frac{F(h)-F(0)}{h} = \lim_{h\rightarrow 0^+} \frac{F(h)}{h} = \lim_{h\rightarrow 0^+} \frac{1-F(-\log h)}{2h} = \lim_{x\rightarrow\infty} \frac{1-F(x)}{2e^{-x}} $$ If this limit converged, then $1-F(x)$ would have a finite integral by the comparison test, but because $\int_0^\infty \left(1-F(x)\right) dx= \mathbb E P_\epsilon = \infty $, this is impossible. Hence $F$ is not differentiable at $0$. This implies $F$ is not differentiable at any point of the form $$ e^{\pm e^{\pm e^{\cdots^{\pm e}}}} $$ because all such points go to $0$ under finitely many applications of $t\to |\log(t)|$, and hence the recursion formula implies $F$ is non-differentiable at those points. By Theorem 2 above, the set of those points is dense in $[0,\infty)$, so thus $F$ is non-differentiable on a dense subset of $[0,\infty)$. $\blacksquare$
Footnotes:
(1) $f_k(x)$ is always a composition of at least two exponential functions $x\to e^{\pm x}$ such that the first is $e^{-x}$ and the last is $e^x$, i.e. $f$ has the form $$ f_k(x) = e^{- e^{\pm e^{\pm e^{\cdots ^{\pm e^x}}}}} $$ One can see that $f$ consists of some number of compositions of functions of the form $e^{-x}$ or $$ e^{- e^{e^{\cdots^x}}} = e^{-\exp^j(x)} $$ where $\exp^j(x)$ represents $j$ iterations of $\exp(x)$ for some $j\ge 1$. Furthermore, since the innermost function is of the latter form, all of the instances of $e^{-x}$ are evaluated at positive arguments. Since $\left|\frac{d}{dx} e^{-x}\right|\le 1$ for positive arguments, it suffices to show that $\left|\frac{d}{dx} e^{-\exp^j(x)}\right| \le \frac1e$ for all $x\in\mathbb{R}$ and $j\ge 1$. For $j=1$, we have $|\frac{d}{dx} e^{-\exp^j(x)}| = e^x e^{-e^x}$ which is maximized at $x=0$. One can show that this is the largest the derivative can be for any higher value of $j$ For higher values of $j$, we have $$ |\hat f_k'(x)| = e^x e^{e^x} \cdots \exp^j(x) e^{-\exp^j(x)} =\exp\left(x + e^x + \cdots + \exp^{j-1}(x) - \exp^j(x)\right) $$ By observing that for all $t\in\mathbb{R}$, we have $2e^t - e^{e^t} < 0$, we must have $$ x + e^x + \cdots + \exp^{j-1}(x) - \exp^j(x) = x + e^x + \cdots + \exp^{j-2}(x) -\exp^{j-1}(x) + 2\exp^{j-1}(x) - \exp^j(x) < x + e^x + \cdots + \exp^{j-2}(x) -\exp^{j-1}(x) $$ hence the sequence of maximum derivatives is decreasing in $j$, so in particular, all of them are bounded by $\frac1e$ as claimed.
(2) Sharkovskii's theorem is not strictly applicable to $|\log t|$ as a real-valued function, since has an infinite value at $t=0$. However, it is continuous on the extended reals as a function from $[0,\infty]$ to itself, so the theorem applies.
Um... I'm not sure if I'm missing something, but I believe that if any of the $\epsilon_n=-1$ for $n\ge 1$, then the value of the expression converges. Allow me to explain why.
Before I start, this is my first bounty, so I'm not really sure what the rules and standards of proof are. I know I could've gone into more detail, but I don't think it would've helped with understanding the proof, and I think the lemmas I skipped over would be fairly obvious/intuitive and tedious to prove. I'm open to feedback, but I think my response answers the question well.
Conventions:
- $\overline{\mathbb{R}}=[-\infty,\infty]=\mathbb{R}\cup\{-\infty, \infty\}$
- $[x,y]=\{z\in \overline{\mathbb{R}} \mid x\le z\le y\}$
- $[x] = [x,x] = \{x\}$. If $a=[x]$, then $a$ is unique.
- $\mu([a,b])=b-a$ is the standard measure on $\overline{\mathbb{R}}$.
This means that $\exp([a,b])=[e^a,e^b]$ and $-[a,b]=[-b,-a]$, where $e^{-\infty}=0$ and $e^{\infty}=\infty$.
We are studying the "reverse-sequence" of sets $a_{n-1}=\exp(\epsilon_na_n)$, where $\epsilon=(\epsilon_1,\epsilon_2\cdots)$ and $\epsilon_n\in\{-1,1\}$ for $n\ge 1$. My understanding of convergence is that if $a_{\omega}=\overline{\mathbb{R}}$ for some transfinite number $\omega$, then $a_0$ will be unique, containing a single value.
There are three cases: $\epsilon$ contains no $-1$s, $\epsilon$ contains finite $-1$s, and $\epsilon$ contains infinite $-1$s. If $\epsilon$ contains no $-1$s, then $a_0=\exp(\exp(\exp(\cdots)))$. It is pretty clear that $a_0=[\infty]$. However, as I will prove, this is the only case where $a_0$ diverges in any fashion.
If $\epsilon$ contains a finite number of $-1$s, then let $n$ be the last number such that $\epsilon_n=-1$. We know that $a_n=\exp(\exp(\exp(\cdots)))=[\infty]$, so $a_{n-1}=\exp(\epsilon_na_n)=\exp(-[\infty])=[0]$. Then, $a_0$ is a finite sequence of exponentials away from $a_{n-1}$, so $a_0$ is also finite and unique.
Now, let's consider the case where there are an infinite number of $-1$s. Let $\omega$ be some transfinite number where $\epsilon_{\omega}=-1$. By "transfinite" I mean that if $b(\omega)_n=a_{\omega-n}$, then $a_0=\lim_{\omega\to\infty}b(\omega)_{\omega}$. (Yeah, it's not great notation, but I'm sure you can figure out what I mean.)
We know that $a_{\omega+1}\subseteq[-\infty,\infty]$. Assuming the worst case scenario, let $a_{\omega+1}=[-\infty,\infty]$. Then, $\epsilon_{\omega+1}a_{\omega+1}=\pm[-\infty,\infty]=[-\infty,\infty]$, and $a_{\omega}=\exp(\epsilon_{\omega+1}a_{\omega+1})=\exp([-\infty,\infty])=[0,\infty]$. This means that $a_{\omega-1}=\exp(\epsilon_{\omega} a_{\omega})=\exp(-[0,\infty])=[0,1]$.
Let $\Omega=(\Omega_0,\Omega_1\ldots)$ be the sequence of values less than or equal to $\omega$, in descending order, such that $\epsilon_{\Omega_i}=-1$. Also, let's define a function $E:\left(\{1,2\ldots\}, [0,\infty]\right)\to [0,1]$ such that: $E(1,x)=e^{-x}$ and $E(d+1,x)=E(d,e^x)$ for $d\ge 1$. Then, it is easy to see that $a_{\Omega_{i+1}-1}=\exp(-\exp(\exp(\cdots\exp(a_{\Omega_i-1}))))=E(\Omega_{i+1}-\Omega_i,a_{\Omega_i-1})$. I claim that not only is $\mu(a_{\Omega_{i+1}-1})<\mu(a_{\Omega_i-1})$, but the decrease is large to justify that $\lim_{i\to\infty}\mu(a_{\Omega_{i}-1})=0$, for any difference $d=\Omega_{i+1}-\Omega_i$.
This is because $E(d,x)$ is Lipschitz-continuous for all $d$ on $[0,1]$, with Lipschitz coefficient $K=1$. In fact, for $d > 1$, $K$ is much smaller. Unfortunately, when $d=1$, $e^{-x}$ has a slope of $-1$ at $x=0$. Because the Lipschitz bound is only touched at one point on the edge, we can write that $|x-y|>K|E(d,x)-E(d,y)|>|E(d,x)-E(d,y)|$ for all $x,y\in[0,1]$. This means that $\mu(E(d,a))<\mu(a)$, for any set $a\subseteq[0,1]$. Furthermore, because $E$ is a smooth function, there can be no "catch-points" where the decay of the size of $a$ would stop. Therefore, after an infinite number of passes through $E$, the size of $a$ would approach $0$.
This means that if we start from progressively larger values of $\omega$, the size of $a_0$ will be calculated to be smaller and smaller, so that in the limit $a_0$ will have size $0$. Therefore, $a_0$ is unique. Therefore, the value of the expression $\exp(\epsilon_1\exp(\epsilon_2\cdots))$ has a unique value that is constrained in a convergent manner by the set $a_0$ as $\omega$ approaches $\infty$, when there are infinitely many $-1$s in $\epsilon$.
We see that regardless of whether there are finitely many (at least one) or infinitely many $-1$s in $\epsilon$, the expression converges.