How to detect duplicates and keep only one?
After pondering this all day, I think you are much closer to the answer than you think.
Your initial query showed all the points matching one station within a certain distance, but excluded the initial point. This also excluded all points with no duplicates in the vicinity. You did this with the WHERE
clause of t1.gid != t2.gid
.
Suppose you removed that from the where clause, and added the SELECT DISTINCT ON
notation, as follows:
CREATE TABLE allunique AS
SELECT DISTINCT ON (t1.desc_string) t1.*
FROM ciss_poi AS t1
JOIN ciss_poi AS t2
ON ST_Dwithin(t1.geom,t2.geom,(0.5/111.111))
WHERE t1.desc_string=t2.desc_string
ORDER BY t1.desc_string, t1.gid
Here is the help for the DISTINCT clause: SELECT DISTINCT
When you use this with the ON
addition, and name the field, it groups by that selected field, and then returns the first record. The record that is returned can be controlled by using the ORDER BY
parameter.
In this case, if we return the all the joined records within a certain distance, including the initial record, then order by the gid
, it should return just the single one out of a group, but also the individual records with nothing in the vicinity.