PostGIS nearest points with ST_Distance, kNN
You are nearly there. There is a little trick which is to use Postgres's distinct operator, which will return the first match of each combination -- as you are ordering by ST_Distance, effectively it will return the closest point from each senal to each port.
SELECT
DISTINCT ON (senal.id) senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY") as dist
FROM traffic_signs As senal, entrance_halls As port
ORDER BY senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY");
If you know that the minimum distance in each case is no more than some amount x, (and you have a spatial index on your tables), you can speed this up by putting a WHERE ST_DWithin(port."GEOMETRY", senal."GEOMETRY", distance)
, eg, if all the minumum distances are known to be no more than 10km, then:
SELECT
DISTINCT ON (senal.id) senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY") as dist
FROM traffic_signs As senal, entrance_halls As port
WHERE ST_DWithin(port."GEOMETRY", senal."GEOMETRY", 10000)
ORDER BY senal.id, port.id, ST_Distance(port."GEOMETRY", senal."GEOMETRY");
Obviously, this needs to be used with caution, as if the minimum distance is greater, you will simply get no row for that combination of senal and port.
Note: The order by order must match the distinct on order, which makes sense, as distinct is taking the first distinct group based on some ordering.
It is assumed that you have a spatial index on both tables.
EDIT 1. There is another option, which is to use Postgres's <-> and <#> operators, (center point and bounding box distance calculations, respectively) which make more efficient use of the spatial index and don't require the ST_DWithin hack to avoid n^2 comparisons. There is a good blog article explaining how they work. The general thing to note is that these two operators work in the ORDER BY clause.
SELECT senal.id,
(SELECT port.id
FROM entrance_halls as port
ORDER BY senal.geom <#> port.geom LIMIT 1)
FROM traffic_signs as senal;
EDIT 2. As this question has received a lot of attention and k-nearest neighbours (kNN) is generally a hard problem (in terms of algorithmic run-time) in GIS, it seems worthwhile to expand somewhat on the original scope of this question.
The standard way for find the x nearest neighbours of one object is to use a LATERAL JOIN (conceptually similar to a for each loop). Borrowing shamelessly from dbaston's answer, you would do something like:
SELECT
signs.id,
closest_port.id,
closest_port.dist
FROM traffic_signs
CROSS JOIN LATERAL
(SELECT
id,
ST_Distance(ports.geom, signs.geom) as dist
FROM ports
ORDER BY signs.geom <-> ports.geom
LIMIT 1
) AS closest_port
So, if you want to find the nearest 10 ports, ordered by distance, you simply have to change the LIMIT clause in the lateral sub-query. This is much harder to do without LATERAL JOINS and involves using ARRAY type logic. While this approach works well, it can be sped up enormously if you know you only have to search out to a given distance. In this instance, you can use ST_DWithin(signs.geom, ports.geom, 1000) in the subquery, which because of the way indexing works with the <-> operator -- one of the geometries should be a constant, rather than a column reference -- may be much faster. So, for example, to get the 3 nearest ports, within 10km, you could write something like the following.
SELECT
signs.id,
closest_port.id,
closest_port.dist
FROM traffic_signs
CROSS JOIN LATERAL
(SELECT
id,
ST_Distance(ports.geom, signs.geom) as dist
FROM ports
WHERE ST_DWithin(ports.geom, signs.geom, 10000)
ORDER BY ST_Distance(ports.geom, signs.geom)
LIMIT 3
) AS closest_port;
As always, usage will vary depending on your data distribution and queries, so EXPLAIN is your best friend.
Finally, there is a minor gotcha, if using LEFT instead of CROSS JOIN LATERAL in that you have to add ON TRUE after the lateral queries alias, eg,
SELECT
signs.id,
closest_port.id,
closest_port.dist
FROM traffic_signs
LEFT JOIN LATERAL
(SELECT
id,
ST_Distance(ports.geom, signs.geom) as dist
FROM ports
ORDER BY signs.geom <-> ports.geom
LIMIT 1
) AS closest_port
ON TRUE;
This can be done with a LATERAL JOIN
in PostgreSQL 9.3+:
SELECT
signs.id,
closest_port.id,
closest_port.dist
FROM traffic_signs
CROSS JOIN LATERAL
(SELECT
id,
ST_Distance(ports.geom, signs.geom) as dist
FROM ports
ORDER BY signs.geom <-> ports.geom
LIMIT 1) AS closest_port
The aproach with cross-join doesn't use indexes and requires a lot of memory. So you basically have two choices. Pre 9.3 you'd use a correlated subquery. 9.3+ you can use a LATERAL JOIN
.
KNN GIST with a Lateral twist Coming soon to a database near you
(exact queries to follow soon)