How often two iid variables are close?

If $\epsilon \geqslant \tfrac{1}{n}$, then $$ \mathbb{P}(|X-Y|<\epsilon) \geqslant \sum_{i=1}^n \mathbb{P}(X, Y \in [\tfrac{i-1}{n}, \tfrac{i}{n}]) = \sum_{i=1}^n (\mathbb{P}(X \in [\tfrac{i-1}{n}, \tfrac{i}{n}]))^2 . $$ It follows that $$ \mathbb{P}(|X-Y|<\epsilon) \geqslant \frac{1}{n} \biggl(\sum_{i=1}^n \mathbb{P}(X \in [\tfrac{i-1}{n}, \tfrac{i}{n}])\biggr)^2 = \frac{1}{n} \, . $$ If we choose $n$ so that additionally $\epsilon < \frac{1}{n-1}$, then we obtain $$ \mathbb{P}(|X-Y|<\epsilon) > \frac{\epsilon}{\epsilon + 1} \, . $$ This leads to the desired result with $c = 1$.


Mateusz Kwaśnicki's guess, that the optimal value of $c$ is 2, is correct! In fact:

Theorem: Suppose $X,Y$ are i.i.d. random variables on $\mathbb{R}$. Then the limit $\lim_{\varepsilon \rightarrow 0} \varepsilon^{-1} \mathbb{P}[|X - Y| < \varepsilon]$ exists. It is $+\infty$ unless $X$ has a density function $f$ satisfying $||f||_2^2 = \int f^2\ \text{d}x < \infty$, in which case the limit is equal to $2 ||f||_2^2$.

Restricting to distributions on $[0,1]$, Cauchy-Schwarz says $$ 1 = \int_0^1 f\ \text{d}x \le \left ( \int_0^1 f^2\ \text{d}x \right )^{1/2} \left ( \int_0^1 1\ \text{d}x \right )^{1/2} = ||f||_2, $$ with equality iff $f = 1$. That is, the constant $c = 2$ is valid for every distribution, and it is sharp only for the uniform distribution.

Here are a few examples where the limit is infinite:

Example 1: If $X$ has an atom then $\mathbb{P}[|X-Y| < \varepsilon]$ is bounded away from 0. So the limit in question diverges like $\varepsilon^{-1}$.

Example 2: The Cantor ternary function is the CDF of a probability distribution on $[0,1]$ which is nonatomic but not absolutely continuous. A random sample from this distribution is given by $\sum_{k \ge 1} c_k \tfrac{2}{3^k}$ where $\{c_k\}$ are i.i.d. Bernoulli random variables. (Thought of in terms of the "repeatedly take out the middle third" construction of the Cantor set, the $c_k$'s are the choices of left versus right third.) If $\varepsilon = 3^{-n}$, then $|X-Y| < \varepsilon$ iff the first $n$ $c_k$'s agree. Thus $$ \mathbb{P}[|X-Y| < \varepsilon] = 2^{-n} = \varepsilon^{\log_3 2}. $$ So the limit in question diverges like $\varepsilon^{-1+\log_3 2} = \varepsilon^{-0.37}$.

Example 3: The power law $f(x) = \tfrac{1}{2\sqrt{x}}$ is an example of a density function which is not in $L^2$. In this case we can evaluate $\mathbb{P}[|X-Y| < \varepsilon]$ exactly (well, if you believe in inverse hyperbolic trig functions). The asymptotic behavior is $$ \mathbb{P}[|X-Y| < \varepsilon] = \tfrac{1}{2} (-\log \varepsilon) \varepsilon + (\tfrac{1}{2} + \log 2) \varepsilon + O(\varepsilon^2). $$ So the limit in question diverges logarithmically.

Mateusz's slick argument worked on the block diagonal, a sum of $n$ squares of width $1/n$. This covers about half of the area of the strip $|X-Y| < \tfrac{1}{n}$, which is why the resulting constant is half of optimal. There's probably a hands-on way to extend it, but you start using words like "convolution" and I found it easier to reason in Fourier space. This question is asking for a bound on the density of $X-Y$ at 0. This random variable has nonnegative Fourier transform (characteristic function, if you prefer), and that condition alone is enough to guarantee positive density (shameless self-citation: Lemma 3.1 in this paper). In general, when it makes sense, the density at 0 of a random variable is equal to the integral of its Fourier transform. So our job is to make that concrete and translate it back to a statement about PDFs.

Convention: The Fourier transform (characteristic function) of a random variable $X$ is the function $t \mapsto \mathbb{E}[e^{2 \pi \mathrm{i} X t}]$. The Fourier transform of a function $f$ is $t \mapsto \hat f(t) = \int e^{2 \pi \mathrm{i} x t} f(x)\ \text{d}x$. If $X$ is absolutely continuous (has a density function), then its Fourier transform is equal to the Fourier transform of its density function. With this convention the Plancherel identity reads $\int f \bar g\ \text{d}x = \int \hat f \overline{\hat g}\ \text{d}t$ (no normalization constant).

Lemma: Let $\psi$ denote the Fourier transform of $X$. Then $\psi \in L^2$ iff $X$ is absolutely continuous and its density function $f$ is in $L^2$. Moreover, if this holds, then $||\psi||_2 = ||f||_2$.

Proof of Lemma: The Fourier transform is an isometry of the space $L^2$. The ($\Leftarrow$) direction is immediate: if $X$ has a density function in $L^2$, then certainly its Fourier transform is in $L^2$. For ($\Rightarrow$), if $\psi$ is in $L^2$ then it is the Fourier transform of some function $f \in L^2$. But then $f$ defines the same distribution as $X$, so it is indeed the density function. $\square$

(This is probably a basic result in some probability textbook?)

Proof of Theorem: Let $\psi$ denote the Fourier transform of $X$. Then the Fourier transform of $X-Y$ is $t \mapsto \psi(t) \psi(-t) = \psi(t) \overline{\psi(t)} = |\psi(t)|^2$. Let $T_\varepsilon$ denote the "triangle filter" $T_\varepsilon(x) = \varepsilon^{-1} (1 - |x|/\varepsilon)$ for $|x| \le \varepsilon$ and $T_\varepsilon(x) = 0$ otherwise. It is a standard calculation that $\hat T_\varepsilon(t) = \operatorname{sinc}^2(\pi t \varepsilon)$. (Here $\operatorname{sinc} x = \tfrac{\sin x}{x}$.) This is integrable, so we can compute the expected value of $T_\varepsilon$ using the Fourier transform. We find $$ \varepsilon^{-1} \mathbb{P}[|X-Y| < \varepsilon] \ge \mathbb{E}[T_\varepsilon(X-Y)] = \int |\psi(t)|^2 \operatorname{sinc}^2(\pi t \varepsilon)\ \text{d}t. $$ The integrand is nonnegative and, as $\varepsilon \to 0$, it converges (pointwise) to its (pointwise) supremum $|\psi(t)|^2$. So, by an appropriate incantation of the monotone convergence theorem (the one in this comment applies exactly), the integral converges to $\int |\psi(t)|^2\ \text{d}t = ||\psi||_2^2$. If this is infinite, then also $\lim_{\varepsilon \to 0} \varepsilon^{-1} \mathbb{P}[|X-Y| < \varepsilon] = +\infty$.

Suppose now that $||\psi||_2^2 < \infty$, i.e., $\psi \in L^2$. Consider the "box filter" $B_\varepsilon$ defined by $B_\varepsilon(x) = (2 \varepsilon)^{-1}$ for $|x| < \varepsilon$ and $B_\varepsilon(x) = 0$ otherwise. This has $\hat B_\varepsilon(t) = \operatorname{sinc}(2 \pi t \varepsilon)$. This is bounded and $|\psi|^2$ is integrable. So we can again compute expected values in Fourier space: $$ (2\varepsilon)^{-1} \mathbb{P}[|X-Y| < \varepsilon] = \mathbb{E}[B_\varepsilon(X-Y)] = \int |\psi(t)|^2 \operatorname{sinc}(2 \pi t \varepsilon)\ \text{d}t. $$ The integrand converges pointwise to $|\psi(t)|^2$. We don't have nonnegativity this time, but now we know integrability of the bounding function $|\psi|^2$. So we can apply the dominated convergence theorem, concluding that $$ \lim_{\varepsilon \to 0} (2 \varepsilon)^{-1} \mathbb{P}[|X-Y| < \varepsilon] = \int |\psi(t)|^2\ \text{d}t = ||\psi||_2^2. $$ The lemma completes the proof. $\square$