Probabilistic interpretation of prime number theorem

First of all, I assume you understand that this is meant to be a nonrigorous argument, so there will be a limit to how rigorous I can make my answer.

The intuition here is that $n$ is prime if and only if it is not divisible by any prime $<n$. So we "should" have $$f(n) \approx \prod_{p < n} \left( 1-1/p \right).$$ Similarly $$f(n+1) \approx \prod_{p<n+1} \left( 1-1/p \right) = \prod_{p < n} \left( 1-1/p \right) \cdot \left\{ \begin{matrix} \left( 1-1/n \right) \ \mbox{if}\ n\ \mbox{is prime} \\ 1 \ \mbox{if}\ n\ \mbox{is not prime} \end{matrix} \right.$$ $$\approx f(n) \cdot \left\{ \begin{matrix} \left( 1-1/n \right) \ \mbox{if}\ n\ \mbox{is prime} \\ 1 \ \mbox{if}\ n\ \mbox{is not prime} \end{matrix} \right. .$$ Since $n$ is prime with "probability $f(n)$", we interpolate between the two cases next to the brace by writing: $$f(n+1) \approx f(n) \left( 1-f(n)/n \right).$$ One might argue that it would be better to interpolate with a factor of $(1-1/n)^{f(n)}$, but this will make no difference in the asymptopics as $(1-1/n)^{f(n)} = 1-f(n)/n+O(1/n^2)$.

This argument is famously fishy, because it gives the right answer, but the intermediate point is wrong! The actual asymptopics of $\prod_{p<n} \left( 1-1/p \right)$ do not look like $1/\log n$, but like $e^{-\gamma} /\log n$. I've never seen a good intuitive explanation for why we get the wrong estimate for $\prod_{p<n} \left( 1-1/p \right)$, but the right estimate for the density of the primes.


You might read "Statistical Independence in Probability, Analysis, and Number Theory" by Mark Kac.

Meanwhile, Harald Cramer did a very good job of investigating the consequences of your idea with $f(x) = \frac{1}{\log x}$, which seemed fine, until in 1985 Helmut Maier showed that it gave incorrect estimates for the number of primes in short intervals. The precise nature of the word "incorrect" here is worth a look at the Maiers article, or later summaries.


I came up with this argument before (and I'm sure many other people have, independently). Start like David Speyer does, above, until you get to the equation

$$ f(n+1) \approx f(n) (1 - f(n)/n) $$

Therefore, you have

$$ f(n+1) - f(n) \approx -f(n)^2/n$$

and if we figure that $f$ is a ``nice'' function, then $f(n+1) - f(n) \approx f^\prime(n)$. So $f$ is approximately a solution to

$$ f^\prime(x) \approx -f(x)^2/x $$

where I'm calling the variable $x$ now, instead of $n$, to emphasize the change from discrete to continuous. It's easy to check that $f(x) = 1/\log x$ solves this differential equation.