How do I group near points with GPS positions?
One quick and dirty way uses a recursive spherical subdivision. Beginning with a triangulation of the earth's surface, recursively split each triangle from a vertex across to the middle of its longest side. (Ideally you will split the triangle into two equal-diameter parts or equal-area parts, but because those involve some fiddly calculation, I just split the sides exactly in half: this causes the various triangles eventually to vary a little in size, but that does not seem critical for this application.)
Of course you will maintain this subdivision in a data structure that allows quickly identifying the triangle in which any arbitrary point lies. A binary tree (based on the recursive calls) works nicely: each time a triangle is split, the tree is split at that triangle's node. Data concerning the splitting plane are retained, so that you can quickly determine on which side of the plane any arbitrary point lies: that will determine whether to travel left or right down the tree.
(Did I say splitting "plane"? Yes--if model the earth's surface as a sphere and use geocentric (x,y,z) coordinates, then most of our calculations take place in three dimensions, where sides of triangles are pieces of intersections of the sphere with planes through its origin. This makes the calculations fast and easy.)
I will illustrate by showing the procedure on one octant of a sphere; the other seven octants are processed in the same manner. Such an octant is a 90-90-90 triangle. In my graphics I will draw Euclidean triangles spanning the same corners: they don't look very good until they get small, but they can be easily and quickly drawn. Here is the Euclidean triangle corresponding to the octant: it is the start of the procedure.
As all sides have equal length, one is chosen randomly as the "longest" and subdivided:
Repeat this for each of the new triangles:
After n steps we will have 2^n triangles. Here is the situation after 10 steps, showing 1024 triangles in the octant (and 8192 on the sphere overall):
As a further illustration, I generated a random point within this octant and traveled the subdivision tree until the triangle's longest side reached less than 0.05 radians. The (cartesian) triangles are shown with the probe point in red.
Incidentally, to narrow a point's location to one degree of latitude (approximately), you would note that this is about 1/60 radian and so covers around (1/60)^2 / (Pi/2) = 1/6000 of the total surface. Since each subdivision approximately halves the triangle size, about 13 to 14 subdivisions of the octant will do the trick. That's not much computation--as we will see below--making it efficient not to store the tree at all, but to perform the subdivision on the fly. At the beginning you would note which octant the point lies in--that is determined by the signs of its three coordinates, which can be recorded as a three-digit binary number--and at each step you want to remember whether the point lies in the left (0) or right (1) of the triangle. That gives another 14-digit binary number. The full code for the illustrated point happens to be 111.10111110111100. You may use these codes to group arbitrary points.
(Generally, when two codes are close as actual binary numbers, the corresponding points are close; however, points can still be close and have remarkably different codes. Consider two points one meter apart separated the Equator, for instance: their codes must differ even before the binary point, because they are in different octants. This kind of thing is unavoidable with any fixed partitioning of space.)
I used Mathematica 8 to implement this: you may take it as-is or as pseudocode for an implementation in your favorite programming environment.
Determine what side of the plane 0-a-b point p lies on:
side[p_, {a_, b_}] := If[Det[{p, a, b}] >= 0, left, right];
Refine triangle a-b-c based on point p.
refine[p_, {a_, b_, c_}] := Block[{sides, x, y, z, m},
sides = Norm /@ {b - c, c - a, a - b} // N;
{x, y, z} = RotateLeft[{a, b, c}, First[Position[sides, Max[sides]]] - 1];
m = Normalize[Mean[{y, z}]];
If[side[p, {x, m}] === right, {y, m, x}, {x, m, z}]
]
The last figure was drawn by displaying the octant and, on top of that, by rendering the following list as a set of polygons:
p = Normalize@RandomReal[NormalDistribution[0, 1], 3] (* Random point *)
{a, b, c} = IdentityMatrix[3] . DiagonalMatrix[Sign[p]] // N (* First octant *)
NestWhileList[refine[p, #] &, {a, b, c}, Norm[#[[1]] - #[[2]]] >= 0.05 &, 1, 16]
NestWhileList
repeatedly applies an operation (refine
) while a condition pertains (the triangle is large) or until a maximum operation count has been reached (16).
To display the full triangulation of the octant, I began with the first octant and iterated the refinement ten times. This begins with a slight modification of refine
:
split[{a_, b_, c_}] := Module[{sides, x, y, z, m},
sides = Norm /@ {b - c, c - a, a - b} // N;
{x, y, z} = RotateLeft[{a, b, c}, First[Position[sides, Max[sides]]] - 1];
m = Normalize[Mean[{y, z}]];
{{y, m, x}, {x, m, z}}
]
The difference is that split
returns both halves of its input triangle rather than the one in which a given point lies. The full triangulation is obtained by iterating this:
triangles = NestList[Flatten[split /@ #, 1] &, {IdentityMatrix[3] // N}, 10];
To check, I computed a measure of every triangle's size and looked at the range. (This "size" is proportional to the pyramid-shaped figure subtended by each triangle and the sphere's center; for small triangles like these, this size is essentially proportional to its spherical area.)
Through[{Min, Max}[Map[Round[Det[#], 0.00001] &, triangles[[10]] // N, {1}]]]
{0.00523, 0.00739}
Thus the sizes vary up or down by about 25% from their average: that seems reasonable for achieving an approximately uniform way to group points.
In scanning over this code, you will notice no trigonometry: the only place it will be needed, if at all, will be in converting back and forth between spherical and cartesian coordinates. The code also does not project the earth's surface onto any map, thereby avoiding the attendant distortions. Otherwise, it only uses averaging (Mean
), the Pythagorean Theorem (Norm
) and a 3 by 3 determinant (Det
) to do all the work. (There are some simple list-manipulation commands like RotateLeft
and Flatten
, too, along with a search for the longest side of each triangle.)