SQL Query - Delete duplicates if more than 3 dups?
with cte as (
select row_number() over (partition by dupcol1, dupcol2 order by ID) as rn
from table)
delete from cte
where rn > 2; -- or >3 etc
The query is manufacturing a 'row number' for each record, grouped by the (dupcol1, dupcol2) and ordered by ID. In effect this row number counts 'duplicates' that have the same dupcol1 and dupcol2 and assigns then the number 1, 2, 3.. N, order by ID. If you want to keep just 2 'duplicates', then you need to delete those that were assigned the numbers 3,4,.. N
and that is the part taken care of by the DELLETE.. WHERE rn > 2;
Using this method you can change the ORDER BY
to suit your preferred order (eg.ORDER BY ID DESC
), so that the LATEST
has rn=1
, then the next to latest is rn=2 and so on. The rest stays the same, the DELETE
will remove only the oldest ones as they have the highest row numbers.
Unlike this closely related question, as the condition becomes more complex, using CTEs and row_number() becomes simpler. Performance may be problematic still if no proper access index exists.
HAVING
is your friend
select id, count(*) cnt from table group by id having cnt>2