Query takes very long time
You mention your query has difficulties for the case that are no neighbours - this is due to the 'where' statement, which excludes observations beyond 1000 units of your coordinate system. In the documentation of st_dwithin it states the following concerning distance: units are in those of spatial reference.
This brings me to a second point: 1000 units in WGS84 would be very far. You should use a projected coordinate system with st_transform. For the Netherlands this would be EPSG 28992.
To know which points are within 1000 meters you can execute following query:
SELECT
postcode, huisnummer, huisletter, huisnummertoevoeging,
ST_X(ST_Transform(geolocatie,28992) :: geometry) as latitude, -- assuming your geometry is in 4326
ST_Y(ST_Transform(geolocatie,28992) :: geometry) as longitude,
ST_Distance(
ST_Transform(geolocatie,28992),
ST_Transform(ST_SetSRID(ST_Point(4.3387478, 51.9808117),4326),28992)) as afstand ,
ST_DWithin(
ST_Transform(geolocatie,28992) ,
ST_Transform(ST_SetSRID(ST_Point(4.3387478, 51.9808117),4326),28992),1000) as within_distance
FROM inspire
ORDER BY afstand
Performance wise, this can be improved by adding a geometry to the table in advance with the project geometry and adding an index.
-- Add the column
ALTER TABLE inspire ADD COLUMN geom_28992 geometry;
-- Set the column
UPDATE inspire SET geom_28992 = st_transform(geom,28992);
-- Create an index
CREATE INDEX idx_geom_28992_inspire ON inspire USING GIST(geom_28992);
For st_dwithin to work efficiently both geometries need to be indexed: the index of reference geometry needs to be in a table as well, with index.
-- Reference geom
CREATE TABLE reference_geom AS
SELECT ST_Transform(ST_SetSRID(ST_Point(4.3387478, 51.9808117),4326),28992) geom_28992;
-- Create an index
CREATE INDEX idx_geom_28992_reference_geom ON reference_geom USING GIST(geom_28992);
Finally, you can execute the query
SELECT postcode, huisnummer, huisletter, huisnummertoevoeging,
st_distance(i.geom_28992,g.geom_28992) distance,
true::boolean within_distance
FROM inspire i
INNER JOIN reference_geom g ON st_dwithin(i.geom_28992,g.geom_28992,1000)
UNION ALL
SELECT postcode, huisnummer, huisletter, huisnummertoevoeging,
null AS distance,
false::boolean within_distance
FROM inspire i,reference_geom
WHERE st_dwithin(i.geom_28992,g.geom_28992,1000) = FALSE
For more interesting applications have a look at nearest neighbour problems in Postgis, for instance here and here.
You do have a solid query right there, and I can´t seem to reproduce your issue (as per the title), assuming a correctly created and indexed geography column.
If I were to improve things here, I´d use
SELECT
postcode, huisnummer, huisletter, huisnummertoevoeging,
ST_X(geolocatie::geometry) AS latitude,
ST_Y(geolocatie::geometry) AS longitude,
ST_Distance(geolocatie, ST_SetSRID(ST_MakePoint(4.3387478, 51.9808117), 4326)::geography) AS afstand
FROM inspire
ORDER BY geolocatie <-> ST_SetSRID(ST_MakePoint(4.3387478, 51.9808117), 4326)::geography
LIMIT 5;
The KNN operator <->
as the ORDER BY
parameter is ultimate in performance when used properly; I would even forgo a limiting filter, especially as ST_DWithin
, in my experience, adds overhead and denies the planner the much desired index (only) scans more often than not.
I wouldn´t recommend this without proper transformation (as, e.g. @Michael proposed), but an index forcing alternative filter option would be
SELECT
postcode, huisnummer, huisletter, huisnummertoevoeging,
ST_X(geolocatie::geometry) AS latitude,
ST_Y(geolocatie::geometry) AS longitude,
ST_Distance(geolocatie, ST_SetSRID(ST_MakePoint(4.3387478, 51.9808117), 4326)::geography) AS afstand
FROM inspire
WHERE geolocatie::geography && ST_Expand(ST_SetSRID(ST_MakePoint(4.3387478, 51.9808117), 4326), 0.01)
ORDER BY geolocatie <-> ST_SetSRID(ST_MakePoint(4.3387478, 51.9808117), 4326)::geography
LIMIT 5;
i.e. using the bbox intersection operator &&
directly; major drawback here is the dependency on the geometry type (and thus the CRS units) for ST_Expand
, that leaves you with a dangerous 0.01
degrees (~1.1km
, but only at the equator) to be used as distance parameter without a meter based projection.
Without your adding details to the question, issue tracking will be difficult; to start with, the EXPLAIN ANALYZE <your_query>
for both plans is almost mandatory for query performance related questions, as well as your PostgreSQl/PostGIS versions.