How get the T-SQL code to find duplicates?
Well, if you have entire rows as duplicates in your table, you've at least not got a primary key set up for that table, otherwise at least the primary key value would be different.
However, here's how to build a SQL to get duplicates over a set of columns:
SELECT col1, col2, col3, col4
FROM table
GROUP BY col1, col2, col3, col4
HAVING COUNT(*) > 1
This will find rows which, for columns col1-col4, has the same combination of values, more than once.
For instance, in the following table, rows 2+3 would be duplicates:
PK col1 col2 col3 col4 col5
1 1 2 3 4 6
2 1 3 4 7 7
3 1 3 4 7 10
4 2 3 1 4 5
The two rows share common values in columns col1-col4, and thus, by that SQL, is considered duplicates. Expand the list of columns to contain all the columns you wish to analyze this for.
If you're using SQL Server 2005+, you can use the following code to see all the rows along with other columns:
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
FROM table
Youd can also delete (or otherwise work with) duplicates using this technique:
WITH cte AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
FROM table
)
DELETE FROM cte WHERE DuplicateRowNumber > 1
ROW_NUMBER is extremely powerful - there is much you can do with it - see the BOL article on it at http://msdn.microsoft.com/en-us/library/ms186734.aspx
I found this solution when I need to dump entire rows with one or more duplicate fields but I don't want to type every field name in the table:
SELECT * FROM db WHERE col IN
(SELECT col FROM db GROUP BY col HAVING COUNT(*) > 1)
ORDER BY col