Quick nearest neighbor search in the 150-dimensional space

PostgreSQL 9.6 using `cube`

First install the cube extension

CREATE EXTENSION cube;

Now we will create some n-dimensional space with 100,000 points in 50 dimensions. In addition we'll add a GIST index.

CREATE TEMP TABLE space_nd
AS
  SELECT i, cube(array_agg(random()::float)) AS c
  FROM generate_series(1,1e5) AS i
  CROSS JOIN LATERAL generate_series(1,50)
    AS x
  GROUP BY i;

CREATE INDEX ON space_nd USING gist ( c );
ANALYZE space_nd;

Now we will generate a single point and use the <-> operater to find the nearest point using Eucledian distance.

WITH points AS (
  SELECT cube(array_agg(random()::float)) AS c
  FROM generate_series(1,50)
    AS x
)
SELECT i,
  pg_typeof(space_nd.c),
  pg_typeof(points.c),
  cube_distance(space_nd.c, points.c)
FROM space_nd
CROSS JOIN points
ORDER BY space_nd.c <-> points.c
LIMIT 5;

PostgreSQL 9.6+ supports other distance operators over cube. All of which can use the GIST index we created. Namely,

a <-> b float8  Euclidean distance between a and b.
a <#> b float8  Taxicab (L-1 metric) distance between a and b.
a <=> b float8  Chebyshev (L-inf metric) distance between a and b.

That said there is one caveat,

To make it harder for people to break things, there is a limit of 100 on the number of dimensions of cubes. This is set in cubedata.h if you need something bigger.

You ask for 150 dimensions. That may present a minor complication.

Consider performing dimension reduction first (eg. Principle Component Analysis).

Then your are doing NN in small number of dimensions with higher performance.

You can use Pl/R to perform PCA inside postgres if needed.

Quick nearest neighbor search in the 150-dimensional space

PostgreSQL 9.6 using `cube`

Tags:

Performance

Database Design

Nearest Neighbor

Related

Recent Posts

Quick nearest neighbor search in the 150-dimensional space

PostgreSQL 9.6 using cube

Tags:

Performance

Database Design

Nearest Neighbor

Related

PostgreSQL 9.6 using `cube`