What is the precision of SELECT DISTINCT on PostGIS geometry column?
I'm surprised it's quite so coarse, but there it is. It's not DISTINCT, per se, it's the '=' operator, which is defined for geometry as 'equality of the index keys' which means practically 'equality of the 32-bit bounding boxes'.
You can see the same effect just using '=' directly,
select 'POINT (0.000000001 0.000000001)'::geometry = 'POINT (0.000000001 0.000001)'::geometry;
select 'POINT (0.000000001 0.000000001)'::geometry = 'POINT (0.000000001 0.00001)'::geometry;
Making '=' behave "intuitively" would unfortunately involve either a huge computational loss (doing explicit ST_Equals() evaluation for the operator call) or some substantial new complicated code (storing hash values for larger geometries, doing exact tests on the fly for smaller ones, choosing the right code path on the fly, etc)
And of course now lots of applications / users have internalized the existing behaviour, such as it is, so "improving" it would be a downgrade for many folks. You can do an "exact" distinct by calculating your set on ST_AsBinary(geom) instead, which will do exact equality testing on the bytea outputs.
Given Paul Ramsey's excellent explanation of why the next question is what can be done about it. How do you SELECT DISTINCT
on geometry fields and have it perform as expected?
In Paul's answer, I proposed using SELECT MAX(geom) FROM the_table GROUP BY ST_AsBinary(geom);
but MAX()
is slow, apparently requiring a table scan.
Instead, I found this to be faster:
SELECT DISTINCT ON (ST_AsBinary(geom)) geom FROM the_table;
Just an update, for PostGIS 2.4, SELECT DISTINCT
works correctly for the points data in the OP:
CREATE TEMP TABLE test (geom geometry);
CREATE TABLE
user=> INSERT INTO test
user-> VALUES
user-> (St_GeomFromText('POINT (0.1 0.1)')),
user-> (St_GeomFromText('POINT (0.001 0.001)')),
user-> (St_GeomFromText('POINT (0.0001 0.0001)')),
user-> (St_GeomFromText('POINT (0.00001 0.00001)')),
user-> (St_GeomFromText('POINT (0.000001 0.000001)')),
user-> (St_GeomFromText('POINT (0.0000001 0.0000001)')),
user-> (St_GeomFromText('POINT (0.00000001 0.00000001)')),
user-> (St_GeomFromText('POINT (0.000000001 0.000000001)'));
INSERT 0 8
user=>
user=> SELECT COUNT(*) FROM (SELECT DISTINCT geom FROM test) AS test;
count
-------
8
(1 row)
And
user=> select 'POINT (0.000000001 0.000000001)'::geometry = 'POINT (0.000000001 0.000001)'::geometry;
?column?
----------
f
(1 row)
user=>
user=> select 'POINT (0.000000001 0.000000001)'::geometry = 'POINT (0.000000001 0.00001)'::geometry;
?column?
----------
f
(1 row)