Get top first record from duplicate records having no unique identity

Sometimes you can use the CROSS APPLY operator like this:

select distinct result.* from data d
cross apply (select top 1 * from data where data.Id = d.Id) result

In this query I need to pick only the first of many duplicates that naturally happen to occur in my data. It works on SQL Server 2005+ databases.


The answer depends on specifically what you mean by the "top 1000 distinct" records.

If you mean that you want to return at most 1000 distinct records, regardless of how many duplicates are in the table, then write this:

SELECT DISTINCT TOP 1000 id, uname, tel
FROM Users
ORDER BY <sort_columns>

If you only want to search the first 1000 rows in the table, and potentially return much fewer than 1000 distinct rows, then you would write it with a subquery or CTE, like this:

SELECT DISTINCT *
FROM
(
    SELECT TOP 1000 id, uname, tel
    FROM Users
    ORDER BY <sort_columns>
) u

The ORDER BY is of course optional if you don't care about which records you return.


Find all products that has been ordered 1 or more times... (kind of duplicate records)

SELECT DISTINCT * from [order_items] where productid in 
(SELECT productid 
  FROM [order_items]
  group by productid 
  having COUNT(*)>0)
order by productid 

To select the last inserted of those...

SELECT DISTINCT productid, MAX(id) OVER (PARTITION BY productid) AS LastRowId from [order_items] where productid in 
(SELECT productid 
  FROM [order_items]
  group by productid 
  having COUNT(*)>0)
order by productid