Incremental Nearest Neighbor Algorithm in Python
I think the problem with incremental construction of a KD-tree or KNN-tree is, as you've alluded to in a comment, that the tree will eventually become unbalanced and you can't do simple tree rotation to fix balance problems and keep consistency. At the minimum, the re-balancing task is not trivial and one would definitely not want to do it at each insertion. Often, one will choose to build a tree with a batch method, insert a bunch of new points and allow the tree to become unbalanced up to a point, and then re-balance it.
A very similar thing to do is to build the data structure in batch for M points, use it for M' points, and then re-build the data structure in batch with M+M' points. Since re-balancing is not normal, fast algorithm we are familiar with for trees, rebuilding is not necessarily slow in comparison and in some cases can be faster (depending on how the sequence of the points entering your incremental algorithm).
That being said, the amount of code you write, debugging difficulty, and the ease of others' understanding of your code can be significantly smaller if you take the rebuild approach. If you do so, you can use a batch method and keep an external list of points not yet inserted into the tree. A brute force approach can be used to ensure none of these is closer than the ones in the tree.
Some links to Python implementations/discussions are below, but I haven't found any that explicitly claim to be incremental. Good luck.
http://www.scipy.org/Cookbook/KDTree
http://cgi.di.uoa.gr/~compgeom/pycgalvisual/kdppython.shtml
http://sites.google.com/site/mikescoderama/Home/kd-tree-knn
http://en.wikipedia.org/wiki/Kd-tree
Note: My comments here apply to high-dimensional spaces. If you're working in 2D or 3D, what I've said may not be appropriate. (If you working in very high dimensional spaces, use brute force or approximate nearest neighbor.)
This is way late, but for posterity:
There is actually a technique for converting batch-processed algorithms like KD-Tree into incremental algorithms: it's called a static-to-dynamic transformation.
To generate an incremental variant of a KD-Tree, you store a set of trees instead of just one tree. When there are N elements in your nearest-neighbor structure, your structure will have a tree for each "1" bit in the binary representation of N. Moreover, if tree T_i corresponds to the i-th bit of N, then tree T_i contains 2^i elements.
So, if you have 11 elements in your structure, then N = 11, or 1011 in binary, and therefore you have three trees - T_3, T_1, and T_0 - with 8 elements, 2 elements, and 1 element, respectively.
Now, let's insert an element e into our structure. After insertion, we'll have 12 elements, or 1100 in binary. Comparing the new and the previous binary string, we see that T_3 doesn't change, we have a new tree T_2 with 4 elements, and trees T_1 and T_0 get deleted. We construct the new tree T_2 by doing a batch insertion of e along with all the elements in the trees "below" T_2, which are T_1 and T_0.
In this way, we create an incremental point query structure from a static base structure. There is, however, an asymptotic slowdown in "incrementalizing" static structures like this in the form of an extra log(N) factor:
- inserting N elements in structure: O(N log(N) log(n))
- nearest neighbor query for structure with N elements: O(log(n) log(n))