Speeding up Nearest with multiple distance thresholds
One idea is to embed the target distance information into the data as a 3rd dimension. To do this, we need to also augment the nearest point set with a 3rd dimension. So, let $t$ be the target distance, $z$ be the augmented 3rd dimension of the target point set, $o$ be the constant used for the nearest point set, and suppose $m$ is the maximal distance. Then using $p$ for the nearest point set and $q$ for the target point set, we have:
$$(p-q_i)\cdot(p-q_i) + (o-z_i)^2 < m^2$$
or
$$(p-q_i)\cdot(p-q_i) < m^2 - (o-z_i)^2$$
and so, we want to choose $z_i$ such that $m^2 - (o - z_i)^2 = t_i^2$. If we make the natural choice $o=0$, this means that:
$$z_i = \sqrt{m^2-t_i^2}$$
Let's try this out on your example:
SeedRandom[1]
pts1 = RandomPoint[Disk[], 5000];
pts2 = RandomPoint[Disk[], 5000];
I will use the following random distance target:
t = RandomReal[{.05, .15}, 5000];
Your "slow" approach gives:
nf = Nearest[pts1 -> Automatic];
res = MapThread[
nf[#, {Infinity, #2}]&,
{pts2, t}
]; //AbsoluteTiming
{0.031681, Null}
Now, for the augmented approach. The maximal target is $m=.15$. Then, the augmented point sets are:
a1 = PadRight[pts1, {Automatic, 3}, 0];
a2 = PadRight[pts2, {Automatic, 3}, Transpose @ {Sqrt[.15^2 - t^2]}];
Using the augmented point sets:
anf = Nearest[a1 -> Automatic];
ares = anf[a2, {Infinity, .15}]; //AbsoluteTiming
res === ares
{8.06355, Null}
True
We get the same result, but the timing is horrendous. I think the reason is that the KD-tree created by Nearest
doesn't work well when all of the points have the same 3rd dimension. One way to fix things up then is to perturb the 3rd dimension slightly so that they aren't all the same. For instance:
a1 = PadRight[pts1, {Automatic, 3}, RandomReal[{0, 10^-10}, {50, 1}]];
anf = Nearest[a1 -> Automatic];
ares = anf[a2, {Infinity, .15}]; //AbsoluteTiming
res === ares
{0.008258, Null}
True
There may be edge cases where the perturbation changes the result.
I wrote an interface to the nanoflann $k$-d tree library to see if I can make things go faster.
My conclusions from this exercise are:
NearestFunction
is already very fast, and its overhead is probably as small as technically possible in the Wolfram Language. However, this overhead is still considerably larger than the time taken to find the point nearest to a single center.If I use a LibraryLink function that searches for the neighbours of a single point using nanoflann, the performance will not be better than with
NearestFunction
. The bottleneck seems to be the LibraryFunction call overhead.The only way to make things go faster is to eliminate repeated calls to a LibraryFunction, and retrieve the neighbours of multiple points (with different distances) with a single function call
Testing and benchmarking the nanoflann-based search
Compiling and loading the library
I used LTemplate for this project. Put the C++ code from the end of this post into the same directory with nanoflann.hpp
, then proceed as follows.
<<LTemplate`
SetDirectory["/to/wherever/your/source/files/are"]
Define the class interface:
template = LClass["Flann2D",
{
LFun["setPoints", {{Real, 2, "Constant"}}, "Void"],
LFun["query", {{Real, 1, "Constant"}, Real}, {Integer, 1}],
LFun["queryMultiple", {{Real, 2, "Constant"}, {Real, 1, "Constant"}}, {Integer, 1}]
}
];
Compile and load the template:
CompileTemplate[template]
LoadTemplate[tem]
Set up the benchmark problem
I use a sub-task of a computation of a Gabriel graph for benchmarking.
We take random points in a unit circle. The neighbour search will be done on these.
pts = RandomPoint[Disk[], 10000];
For the centres of the neighbour search we use the midpoints of the edges of the Delaunay triangulation of pts
.
mesh = DelaunayMesh[pts];
centres = PropertyValue[{mesh, 1}, MeshCellCentroid];
The distance thresholds will be just slightly smaller than the half-edge-lengths of the triangulation
dists = PropertyValue[{mesh, 1}, MeshCellMeasure] 0.5 (1 - 10^Internal`$EqualTolerance $MachineEpsilon);
We build the $k$-d tree data structures both with Nearest
and nanoflann:
nf = Nearest[pts -> Automatic];
flann = Make["Flann2D"];
flann@"setPoints"[pts]
Finally, we need a helper function for unpacking the results of the library function that queries multiple points at the same time. This function produces a list of integer lists. There is no direct way to transfer such data efficiently with LibraryLink. Thus we encode it into a single integer list where: (1) the first integer is the number of sublists (2) followed by the lengths of sublists (3) followed by the flattened contents of sublists. Here's the unpacker function for this format:
unpack[packed_] :=
With[{len = First[packed]},
Internal`PartitionRagged[packed[[len + 2 ;; All]], packed[[2 ;; len + 1]]]
]
Run the benchmark
Query centres one-by-one with Nearest:
res1 = MapThread[nf[#1, {Infinity, #2}] &, {centres, dists}]; // RepeatedTiming
(* {0.045, Null} *)
Querying points one-by-one is slightly slower with the nanoflann-based library.
res2 = MapThread[flann@"query"[#1, #2] &, {centres, dists}]; // RepeatedTiming
(* {0.056, Null} *)
Query all points at once, but do not unpack the result:
flann@"queryMultiple"[centres, dists]; // RepeatedTiming
(* {0.0056, Null} *)
Also unpack the result, to get the desired format:
res3 = unpack[flann@"queryMultiple"[centres, dists]]; // RepeatedTiming
(* {0.0072, Null} *)
Check that the results are the same for all methods. Note that both nanoflann and Nearest order results based on distance from the query centre. This is why the results are identical even in ordering.
res1 === res2 === res3
(* True *)
We were able to achieve a ~6x speedup with the library, but only because we query all centres with a single function call. The bulk of the time seems to be spent in the function call overhead, not in the actual neighbour lookup.
The C++ code
// Flann2D.h
#include "nanoflann.hpp"
#include <LTemplate.h>
#include <vector>
using namespace nanoflann;
class PointSet {
mma::RealMatrixRef pts;
public:
PointSet(mma::RealMatrixRef pts) : pts(pts.clone()) { }
~PointSet() {
pts.free();
}
size_t kdtree_get_point_count() const { return pts.rows(); }
double kdtree_get_pt(const size_t idx, const size_t dim) const {
if (dim == 0)
return pts(idx,0);
else
return pts(idx,1);
}
template <class BBOX>
bool kdtree_get_bbox(BBOX& /* bb */) const { return false; }
};
class Flann2D {
typedef KDTreeSingleIndexAdaptor<
L2_Simple_Adaptor<double, PointSet>,
PointSet,
2 /* dim */
> kd_tree_t;
PointSet *ps = nullptr;
kd_tree_t *kdtree = nullptr;
public:
~Flann2D() {
delete kdtree;
delete ps;
}
void setPoints(mma::RealMatrixRef pts) {
if (pts.cols() != 2)
throw mma::LibraryError("setPoints: input must be two-dimensional");
delete kdtree;
delete ps;
ps = new PointSet(pts);
kdtree = new kd_tree_t(2, *ps, KDTreeSingleIndexAdaptorParams(10));
kdtree->buildIndex();
}
mma::IntTensorRef query(mma::RealTensorRef pt, double d) {
if (pt.size() != 2)
throw mma::LibraryError("query: query point must be two-dimensional");
if (d <= 0)
throw mma::LibraryError("query: query distance must be positive");
std::vector<std::pair<size_t, double>> ret_matches;
SearchParams params;
kdtree->radiusSearch(pt.data(), d*d, ret_matches, params);
auto res = mma::makeVector<mint>(ret_matches.size());
for (mint i=0; i < ret_matches.size(); ++i)
res[i] = ret_matches[i].first + 1;
return res;
}
mma::IntTensorRef queryMultiple(mma::RealMatrixRef pts, mma::RealTensorRef ds) {
if (pts.cols() != 2)
throw mma::LibraryError("queryMultiple: query points must be two-dimensional");
std::vector<mint> results;
std::vector<mint> sizes;
SearchParams params;
std::vector<std::pair<size_t, double>> ret_matches;
for (mint i=0; i < pts.rows(); ++i) {
double d = ds[i];
if (d <= 0)
throw mma::LibraryError("queryMultiple: query distances must be positive");
kdtree->radiusSearch(&pts(i,0), d*d, ret_matches, params);
sizes.push_back(ret_matches.size());
for (const auto &el : ret_matches)
results.push_back(el.first + 1);
}
auto res = mma::makeVector<mint>(1 + sizes.size() + results.size());
mint i=0;
res[i++] = sizes.size();
for (const auto &el : sizes)
res[i++] = el;
for (const auto &el : results)
res[i++] = el;
return res;
}
};