KDTree for longitude/latitude

A binary search tree cannot handle the wraparound of the polar representation by design. You might need to transform the coordinates to a 3D cartesian space and then apply your favorite search algorithm, e.g., kD-Tree, Octree etc.

Alternatively, if you could limit the input range of coordinates to a small region on the surface, you could apply an appropriate map projection to this region, i.e., one that does not distort the shape of your area too much, and apply a standard binary search tree on these no-wrap-around cartesian map coordinates.


I believe that the BallTree from scikit-learn with the Haversine metric should do the trick for you.

As an example:

from sklearn.neighbors import BallTree
import numpy as np
import pandas as pd

cities = pd.DataFrame(data={
    'name': [...],
    'lat': [...],
    'lon': [...]
})

query_lats = [...]
query_lons = [...]

bt = BallTree(np.deg2rad(cities[['lat', 'lon']].values), metric='haversine')
distances, indices = bt.query(np.deg2rad(np.c_[query_lats, query_lons]))

nearest_cities = cities['name'].iloc[indices]

Note this returns distances assuming a sphere of radius 1 - to get the distances on the earth multiply by radius = 6371km

see:

  • https://jakevdp.github.io/blog/2013/04/29/benchmarking-nearest-neighbor-searches-in-python/
  • https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.BallTree.html
  • https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.haversine_distances.html#sklearn.metrics.pairwise.haversine_distances