Why could Mertens not prove the prime number theorem?

Here is a heuristic that I plan to make into a blog post some day. Suppose that there were only finitely many primes with first digit $9$. Is your estimate good enough to see that?

To be more precise, suppose that there were no primes between $9 \times 10^k$ and $10^{k+1}$ for all sufficiently large $k$. And suppose that the number of primes between $a$ and $b$, for $10^k \leq a < b \leq 9 \times 10^k$ is $\approx \frac{\log 10}{\log 9} \int_{a}^b dt/\log t$ (when $a$ is not too close to $b$). We'll see later where the fraction $\log 10/\log 9$ comes from.

The first thing to note is that this would violate the prime number theorem. In this scenario, we would have $\pi(9 \times 10^k) = \pi(10 \times 10^k)$ for $k$ large. But the prime number theorem says that $$\pi(10\times 10^k) - \pi(9 \times 10^k) \sim \frac{10 \times 10^k}{k \log 10 + \log {10}} - \frac{9 \times 10^k}{k \log 10 + \log {9}} \sim \frac{10^k}{k \log 10} \to \infty.$$ So proving the prime number theorem involves disproving this ridiculous scenario.

Now, let's see that the scenario is consistent with $\sum_{p \leq N} 1/p = \log \log N + M + O(1/\log N)$. The sum over the primes between $10^k$ and $10^{k+1}$ would be roughly $$\frac{\log 10}{\log 9} \int_{10^k}^{9 \times 10^k} \frac{dt}{t \log t} = \frac{\log 10}{\log 9} \left( \log \log (9 \times 10^k) - \log \log 10^k \right)$$ $$=\frac{\log 10}{\log 9} \left( \log( k \log 10+\log 9) - \log (k \log 10) \right) = \frac{\log 10}{\log 9} \left( \log \left( 1+\frac{\log 9}{k \log 10} \right) \right) $$ $$ = \frac{\log 10}{\log 9} \frac{\log 9}{k \log 10} + O(1/k^2)= \frac{1}{k} + O(1/k^2)$$

So $$\sum_{p \leq 10^{n+1}} \frac{1}{p} = \sum_{j=1}^n \left( \frac{1}{j} + O(1/j^2) \right)=$$ $$\log n + B + O(1/n) = \log \log 10^{n+1} + C + O(1/\log 10^{n+1}).$$ Very important exercise left for you: If you redo this computation for $\sum_{p \leq 9 \times 10^k} 1/p$, you get $\log \log (9 \times 10^k) + C + O(1/\log(9 \times 10^k))$ for the same constant $C$. The point is that $\log \log 10^{k+1} - \log \log (9 \times 10^k) = O(1/\log 10^k)$, so this estitmate is consistent with $\sum_{p \leq N} 1/p$ not growing at all between $9 \times 10^k$ and $10 \times 10^k$.

This trick is useful for refuting other simple approaches to the PNT. For example, the "primes hate to start with $9$ scenario" is also consistent with $\sum \log p/p^s = 1/(s-1) + O(1)$ as $s \to 1^{+}$, so that is also not enough to prove PNT.


Let me supplement David Speyer's nice response by elaborating on his original comment and Greg Martin's comment. Let us write $$ \sum_{p\leq x}\frac{1}{p}=\ln\ln x+M+R(x), $$ then we have, using Riemann-Stieltjes integrals, $$ F(x):=\sum_{x < p \leq 2x} \frac{\ln p}{p}=\int_x^{2x}\ln t\ d(\ln\ln t+M) + \int_x^{2x} \ln(t) dR(t) $$ $$ = \int_x^{2x} \frac{dt}{t} + [R(t)\ln t]_x^{2x} - \int_x^{2x} \frac{R(t)}{t} dt = \ln 2 + O( \ln x \sup_{x < t \leq 2x} R(t) ). $$ If $\hat F(s)$ denotes the Mellin transform of $dF(x)$, then with the notation $$ S(x):=\sum_{p \leq x} \frac{\ln p}{p} $$ we have $$ \hat F(s) = \int_{1-}^\infty x^{-s}dS(2x) - \int_{2-}^\infty x^{-s}dS(x) = (2^s-1)\sum_p \frac{\ln p}{p^{s+1}}, \quad \Re s>0. $$ In particular, if $\zeta(s)$ has a zero on $\Re s=\sigma\geq\frac{1}{2}$, then $\hat F(s)$ has a pole on $\Re s=\sigma -1$. Note that on the real segment $s\geq-\frac{1}{2}$, the only singularity of $\hat F(s)$ is $s=-\frac{1}{2}$, coming from the difference between $\sum (\ln p)p^{-s}$ and $\sum\Lambda(n)n^{-s}$. Hence by a well-known principle (see Theorem 11.8 in Bateman-Diamond: Analytic number theory), for a certain $c\in\mathbb{R}$ we infer the two-sided estimates $$ F(x)-\ln 2-c x^{-1/2} = \Omega_\pm(x^{\sigma-1}). $$ This implies, by our initial calculation, $$ R(x) = \Omega(x^{\sigma-1}/\ln x). $$ Here we can take $\sigma=\frac{1}{2}$. In the unlikely case that RH fails, we can choose a larger $\sigma$ and replace $\Omega$ by $\Omega_\pm$. Probably, with more work, we could replace $\Omega$ by $\Omega_\pm$ for $\sigma=\frac{1}{2}$ as well.

Added. Here is a closely related arXiv preprint.


Because the scale is too small in Mertens's theorem, and the prime number theorem as well as the Riemann hypothesis are hidden by the $O(1/\log{X})$ notation.

Indeed, the former amounts to strengthening this term to $o(1/\log{X})$; the latter - to $O(1/\sqrt{X})$.

[Incidentally, one could go for equivalent statements to a still smaller scale. Of course, an estimate like $\sum_{p < X} 1/p^2 = \mathrm{const} + O(1/X)$ does not say anything at all about the primes. But one could, if one wished, express the prime number theorem by elaborating on the $O(1/X)$ term. ]

To elaborate on this a bit, let me go to a slightly bigger scale where the prime number theorem begins to emerge outside of $o(1)$. This is also more natural; indeed, it was how Mertens's theorem was proved.

By partial summation, Mertens's estimate is equivalent to $\sum_{p < X} (\log{p})/(p-1) = \log{X} + O(1)$; or, if one prefers, $\sum_{n < X} \Lambda(n)/n = \log{X} + O(1)$. The prime number theorem however is the statement that the $O(1)$ term converges to a constant: $\sum_{n < X} \Lambda(n)/n = \log{X} - \gamma + o(1)$. Indeed, the related bound $\sum_{n < X} \Big( \frac{1}{n} - \frac{1}{X} \Big) \Lambda(n) = \log{X} - 1 - \gamma + o(1)$, another form of the prime number theorem, is what de la Vallee Poussin actually obtained in his original paper. Here $\gamma = 0.57\ldots$ is Euler's constant, but this is of no importance for us, see the next paragraph. Also the $\log{X}$ logarithmic pole corresponds to the pole of $\zeta(s)$ at $s=1$, whereas the $o(1)$ term expresses there being no zeros with $\mathrm{Re}(s) = 1$. The Riemann hypothesis is the correspondingly stronger bound $O(1/\sqrt{X})$ on the $o(1)$ oscillating term. At this scale, in contrast to $\psi(X) = X + O\big( \sqrt{X} (\log{X})^2 \big)$ or $\pi(X) = \mathrm{Li}(X) + O(\sqrt{X}\log{X})$, a logarithmic factor in addition to the square root is not required, as $\sum_{\rho} 1 / |\rho|^2 < \infty$ over the zeros.

Here finally is how to deduce the more usual form $\psi(X) \sim X$ of the prime number theorem from the refinement $S(X) := \sum_{p < X} \Lambda(n)/n = \log{X} -\gamma + o(1)$ of Mertens's theorem: Summing by parts, $\psi(X) = XS(X) - \int_1^X S(t) \, dt = X(\log{X} - \gamma + o(1)) - \big( \int_1^X \log{t} \, dt - \gamma X + o(X) \big) = X + o(X)$.


Added (December, 2017). I came upon an observation giving also a 'trivial' proof of the reverse elementary implication of the two purely qualitative forms, multiplicative and logarithmic, of the prime number theorem: $\psi(X) \sim X \Leftrightarrow S(X) = \log{X} - \gamma + o(1)$. The following seems to have been missed in the literature on elementary methods which, at this point, seem all to quote a somewhat more involved Tauberian theorem of Axer; cf. section 8.1.1 of Montgomery and Vaughan's book (Multiplicative Number Theory: I) or, for a more general setting, chapter 14 of Diamond and Zhang's recent book on Beurling Generalized Numbers (really this paper of theirs). The simpler argument below also extends easily to number fields, supplying a particularly easy proof of the 'elementary equivalence' of Landau's prime ideal theorem and number field sharp Mertens. Incidentally, as I happen to recall, this addresses a slightly curious point that had come up in the comments to this answer of Eric Naslund. Remembering also my answer here, I figured it may be worth to record the following observation as an addendum to it, sticking for simplicity to the rational case assumed in this question.

A proof of $\psi(X) \sim X \Rightarrow S(X) = \log{X} - \gamma + o(1)$. For simplicity, let me stick to $\mathbb{Q}$. The case of a number field $K$ has the same result with $\gamma$ generalized as the 'Euler-Kronecker invariant' $\gamma_K$.

The key is to observe that the formula $$ X^{-1} \log{X!} = \sum_{n \leq X/T} \frac{\Lambda(n)}{n} + \sum_{m \leq T} \frac{1}{X} \Big( \psi\Big( \frac{X}{m} \Big) - \psi\Big( \frac{X}{T} \Big) \Big) + O(1/T) $$ holds uniformly in the two parameters $X, T \geq 1$, with an absolute implied coefficient. It interpolates between Mertens's estimate (case $T = 1$) and Chebyshev's convolution formula $\log{X!} = \sum_m \psi(X/m)$ (case $T = \infty$). But the general formula also follows, after a moment of reflection, from Chebyshev's argument with the prime factorization of $X!$. Divide the moduli into the ranges $n \leq X/T$ and $n > X/T$. The total contribution of the latter are exactly accounted for by the second sum. For a small modulus $n \leq X/T$, the contribution via the prime factorization is $X^{-1} \lfloor X/n \rfloor \Lambda(n) = \frac{\Lambda(n)}{n} + O\Big(\frac{\Lambda(n)}{X}\Big)$, neglecting the fractional part. The $O(1/T)$ term then comes from summing these for $n \leq X/T$, and using Chebyshev's estimate $\sum_{n \leq Y} \Lambda(n) \ll Y$. (In the number field generalization, the latter estimates extend as lattice point counts.)

Now, by Stirling's asymptotic, the qualitative $\psi(X) \sim X \Rightarrow S(X) = \log{X} - \gamma + o(1)$ implication is immediate from the observed formula upon first letting $X \to \infty$ and then $T \to \infty$.