Quick nearest neighbor search in the 150-dimensional space
PostgreSQL 9.6 using cube
First install the cube extension
CREATE EXTENSION cube;
Now we will create some n-dimensional space with 100,000 points in 50 dimensions. In addition we'll add a GIST index.
CREATE TEMP TABLE space_nd
AS
SELECT i, cube(array_agg(random()::float)) AS c
FROM generate_series(1,1e5) AS i
CROSS JOIN LATERAL generate_series(1,50)
AS x
GROUP BY i;
CREATE INDEX ON space_nd USING gist ( c );
ANALYZE space_nd;
Now we will generate a single point and use the <->
operater to find the nearest point using Eucledian distance.
WITH points AS (
SELECT cube(array_agg(random()::float)) AS c
FROM generate_series(1,50)
AS x
)
SELECT i,
pg_typeof(space_nd.c),
pg_typeof(points.c),
cube_distance(space_nd.c, points.c)
FROM space_nd
CROSS JOIN points
ORDER BY space_nd.c <-> points.c
LIMIT 5;
PostgreSQL 9.6+ supports other distance operators over cube
. All of which can use the GIST index we created. Namely,
a <-> b float8 Euclidean distance between a and b.
a <#> b float8 Taxicab (L-1 metric) distance between a and b.
a <=> b float8 Chebyshev (L-inf metric) distance between a and b.
That said there is one caveat,
To make it harder for people to break things, there is a limit of 100 on the number of dimensions of cubes. This is set in cubedata.h if you need something bigger.
You ask for 150 dimensions. That may present a minor complication.
Consider performing dimension reduction first (eg. Principle Component Analysis).
Then your are doing NN in small number of dimensions with higher performance.
You can use Pl/R to perform PCA inside postgres if needed.