What upper bounds are known for the diameter of the minimum spanning tree of $n$ uniformly random points in $[0,1]^2$?
Added: What can we expect from the Euclidean MST
After mulling over this question for a few days, I think it's inconsistent with the stated facts that the diameter of the Euclidean MST is $\Theta(\sqrt n)$. Here's why.
Let's first interpret the theorem mentioned in the question: that the diameter of the Delaunay triangulation for $P$ has expected diameter $\Theta(\sqrt n)$. For each $n$, the Delaunay triangulations for $P$ define a probability measure on the space of metrics on $n$-element subsets of the unit square. We can interpolate this metric to a metric on the unit square by making each triangle of the Delaunay triangulation an equilateral triangle, and rescale it by $1/\sqrt n$ to get a measure on the space of metrics on the square. As $n$ goes to infinity, these metrics have uniformly bounded $\epsilon$ complexity (=minimum number of balls of size $\epsilon$ needed to cover). The set of metrics of $\epsilon$-complexity bounded by some fixed function of $\epsilon$ is compact in the Gromov-Hausdorff topology, so the space of probability measures on these metric spaces is compact in the weak topology, so there exists at least a subsequence of $n$ such that the scaled Delaunay metrics converge to a measure on metric spaces.
The metric spaces are a.s. Lipschitz equivalent to the standard metric on the unit square, using the theorem about diameter (which can be used to deduce that the Delaunay distance between two random elements $p, q \in P$ is $O(\sqrt n)$). These metrics are path-metric spaces. The shortest paths are rectifiable arcs in the Euclidean sense as well as in the particular metric, since the two metrics are Lipschitz equivalent. A rectifiable arc in the plane has a tangent line almost everywhere, so there are rescaled limits where almost all limiting Delaunay geodesics are straight lines.
Proposition: The rescaled Delaunay metrics converge a.s. to a constant multiple of the Euclidean metric in the plane.
That the metric of the plane is the actual limit, not just a limit point, follows from considerations parallel to the fact that a rescaled limit of a Lipschitz function is a.s. linear. If there were any significant probability of significant deviation at any stage, these would accumulate and prevent the large-scale map from being Lipschitz.
Now consider the Euclidean MST for a Delaunay triangulation as a metric tree. Suppose the trees have probable diameter is $\Theta(\sqrt n)$. For each pair of points $x$ and $y$ in the unit square and for each $P$, let $x_P$ be the point of $P$ closest to the $x$, let $y_P$ be closest to $y$, and let $\gamma_P$ be the path in the MST connecting $x_P$ to $y_P$, parametrized by $1/\sqrt(n)$ times the combinatorial Delaunay distance. In the limit, the measure on $P$'s would converge to a probaility measure on rectifiable paths from $x$ to $y$. But more than that, we would get a measurable map from the square cross the square that takes each pair of points to a rectifiable path between them, with arclength less than some constant times their distance.
This is not topologically compatible with the tree-like condition: near the boundary between two large branches of a tree, the tree distance between points on one branch to points on the other branch is a large multiple of the distance in the plane.
To test my understanding, I ran simulations (using Mathematica's Computational Geometry package and Combinatorica package). Here are two Euclidean MST's, the first with 1500 random points in the unit square, the second with 10,000. You can plainly see that paths connecting pairs of vertices are not converging to rectifiable, Lipschitz paths.
As a further test, I calculated the diameter of the Euclidean MST's for 25 uniform pseudo-random distributions of $2^k$ points, for each $k$ ranging from 2 through 8. The sample mean diameters are {2.64, 5.8, 10.32, 18.32, 29.88, 49.12, 78.88}. The best linear fit to to the log of the diameter as a function of $k$ is $.03 + .55 k$. The sample standard deviations for the log of the diameter were nearly constant, all about .15. (that is, the diameters fluctuate on a multiplicative scale, typically about $\pm 16 \%$). It's curious that $Exp[.55] = 1.733$, close to $\sqrt 3 = 1.732$. This suggests a power law that asymptotic diameter $\approx n^{]log(\sqrt 3 / 2)}$, but this could be coincidence, since the numerical experiment was modest in size. Here's the plot of log of mean sample diameter vs. $k$:
The actual asymptotic behavior of diameter seems intricately tied with percolation, a subject that I do not understand very well. That is: you can think of building up the MST by successively adding edges of length $t$ if they join points on different connected components. This gives an increasing union of equivalence relations on elements of $P$, consisting of clumps that can be connected by steps not greater than $t$. One would expect that for $t$ greater than some critical distance $t(n)$ that is approximately a constant time $1/\sqrt n$, large-scale clusters are likely. Paths between points in the a clusters must stay in the cluster. The geometry of the increasing clusters should enable one to estimate the Hausdorff dimension of tree geodesics, which should in turn give an exponent of growth for the diameter of the Euclidean MST.
The Mathematica notebook containing these computations is here. The code in the notebook itself is brief, since algorithms for Delaunay triangulations and minimum spanning trees and graph diameter are provided in packages in the Mathematica distribution. It took me more effort to make it work than it should have, because functions in these auxiliary packages are poorly documented. Here are some simple ideas for better showing what's happening, and analyzing further:
- Draw the Delaunay triangulation in a different color, along with the tree
- Indicate the increasing clumps accessible with changing step size by edge thickness and/or color coding edges of the spanning tree. One idea is to use random colors for short edges within clumps, then use a saturated average color (with average weighted by clump size) when new edges make clumps collide and merge. Delaunay triangles could also be color coded to indicate clumping.
- Make a 3-dimensional plot of graph distance from a randomly selected member of $P$ to other elements of the tree, using the
TriangularSurfacePlot
function from theComputationalGeometry
package. Also: try showing distance from basepoint by color coding, perhaps - Do bigger experiments, and make plots showing the actual data: e.g. make a
ListPlot[]
of the log of the actual diameter for trees of sizes something likeFloor[Exp[ Range[2,6,.01]]]
, but start with a much more modest range and aim for a more ambitious range. - Draw contour plots or 3-D plots of Delaunay graph distance from a randomly selected vertex. How quickly do the contour lines begin to look like circles as the size of $P$ increases?
The following is an earlier answer, which essentially shows that Delaunay "arc length" converges to a multiple of Euclidean arc length. Missing point: how much might larger detours around radars shorten the paths?
Geometry of the Delaunay triangulation.
Let's look at this problem on the surface of a sphere, instead of the unit square. The Delaunay triangulation of a set $P$ on the sphere is invariant under Moebius transformations (easy to verify by the definition in terms of circles), and it is equivalent to the triangulation of the convex hull of $P$. A set in the plane can be transformed to the sphere by stereographic projection. Its Dealaunay triangulation is equivalent to a subset of the polyhedron, namely the part on the far side from the center of projection. The problems, for the uniform distribution on the square and the uniform distribution on $S^2$, are equivalent, but the spherical setting has more symmetry.
In the projective ball model of hyperbolic space, the convex hull inherits a metric as a hyperbolic surface of finite area. I want to describe a related metric. Imagine there is a radar installation at each point of $P$, and that a smuggler wants to fly a small plane below the horizon of any radar. They need to slow down when they're flying lower, at a speed proportional to the distance to $P$, so the metric will be 1/minimum distance to $P$ times spherical arc length. Near an element of $P$, this metric is asymptotically that of a cylinder of circumference $2 \pi$, but slightly smaller near where it is attached because we're rescaling the spherical metric, rather than the Euclidean metric.
Define a "thick" part of $S^2 \setminus P$ to be the set $Q$ obtained by removing a disk around each element of $P$ whose radius is 1/3 the distance to its nearest neighbor.
[Inessential mathematical sidenote: this metric is comparable to the Poincaré metric in Q, except in situations where $P$ is confined to a tiny disk on the sphere. The definition could be modified for that situation, but it's a distraction. The smuggling metric is also comparable to the hyperbolic metric on the image of $Q$ projected to the nearest point in the convex hull of $P$.]
Claim: the smuggling diameter is comparable to the diameter of the Delaunay triangulation.
Proof of Claim: The boundary circles of $Q$ all have comparable circumference in the smuggling metric, slightly less than $2 \pi$. The Delaunay triangles all have diameter bounded above and below in the smuggling metric. This gives an easy way to transform a path in the 1-skeleton of the Delaunay triangulation to a path in $Q$ without increasing length more than a bounded factor. Conversely, for a path in $Q$, the rate of near-encounters with elements of $P$ per smuggling distance is bounded. Therefore, you can do a simplicial approximation, pushing the path to the 1-skeleton with number of edges less than a bounded multiple of its length.
Let's now consider a variation of the problem: A smuggler wants to go between point $X$ on the sphere and point $Y$ on the sphere, in the presence of $N$ uniformly randomly distributed radar installations. The smuggler is conservative, and decides to never exceed spherical speed $N^{-.5}$. This will be less than speed 1 in the smuggling metric except when there is a radar within distance $N^{-.5}$. The density of the set of points is $4 \pi / N$ and the strip has area about $d(X,Y) * 2 N^{-.5}$, so the expected number of radars in the strip is $\Theta(\sqrt N)$. The distribution (Poisson) has standard deviation proportional to square root of expectation, so for moderately large $N$ it is extremely unlikely that there will be more than twice the expected number within the strip: enough that even that the probability that the $\sup$ number over all $X$ and $Y$ tends rapidly to 0 with $N$.
For each radar dish that is within the strip, the smuggler merely takes a detour around the corresponding boundary circle of $Q$, adding only a constant amount of time to the trip. This gives a path in the smuggling metric of $S^2 \setminus P$ with length $\Theta(\sqrt N)$. We want to know the length of a path in $Q$. This is now easy, since there is a distance-decreasing retraction from $S^2 \setminus P \rightarrow Q$: each cylinder can be mapped to its boundary without increasing lengths.
Therefore, the expected maximum diameter of the Delaunay triangulation is $O(\sqrt N)$
This post should really have pictures ... maybe later, or maybe someone else can supply them.
A colleague, Nicolas Broutin, who is not on MO, pointed me to a recent "preliminary report" of Gábor Pete, about joint work with Christophe Garban and Oded Schramm, on the scaling limit of the MST. In the references of that note I found a paper of Aizenmann et.al. which contains the following information about a suitably defined (subsequential) scaling limit of the MST (I won't get into the precise definition of the scaling limit as it's not particularly important for this answer):
The branches of all trees in $\mathcal{F}(\omega)$ are random curves $\mathcal{C}$ with Hausdorff dimensions bounded above and below: $d_{\mathrm{min}} \leq \mathrm{dim}~\mathcal{C} \leq d_{\mathrm{max}}$, for some non-random $1 < d_{\mathrm{min}} < d_{\mathrm{max}} < 2$.
A couple comments are in order.
First, the paper in fact shows that this result holds for any of three models: uniform spanning tree on the lattice (the limit is as the lattice spacing goes to zero); minimum spanning tree on the lattice with iid uniform bond weights (this is the one most naturally connected with percolation); and third, Euclidean MST on a Poisson point process with intensity $r$ (where the rescaled limit is as $r \to \infty$ appropriately). The last of these is the one most directly connected with my question.
Second, the reason for "all trees" is because the scaling limit is in fact described in terms of a graded collection of finite trees spanning any finite collection of points in one-point compactification of $\mathbb{R}^2$, and satisfying certain consistency conditions. In the paper they show that the collection also turns out to "describe a single spanning tree" in the sense that for sets of points in $\mathbb{R}^2$ (i.e. not including the point at infinity), the tree in $\mathcal{F}(\omega)$ spanning them stays within some finite region in $\mathbb{R}^2$. (Edit: this explanation of "all trees" replaces an earlier false explanation which was based on an overhasty reading of the results.)
Third, given the answer and comments of Bill Thurston, this suggests that the diameter should indeed scale like $n^{\alpha}$ for some $1/2 < \alpha < 1$. (Though, inspired by Peter Shor's comment, I will also mention that it doesn't a priori rule out scaling behaviour of the form $n^{\alpha}\log^{\beta} n$.)