Are these two queries the same - GROUP BY vs. DISTINCT?
Your example #2 had me scratching me head for a while - I thought to myself: "You can't DISTINCT
a single column, what would that mean?" - until I realised what is going on.
When you have
SELECT DISTINCT(t.ItemNumber)
you are not, despite appearances, actually asking for distinct values of t.ItemNumber
! Your example #2 actually gets parsed the same as
SELECT DISTINCT
(t.ItemNumber)
,
(SELECT TOP 1 ItemDescription
FROM Transactions
WHERE ItemNumber = t.ItemNumber
ORDER BY DateCreated DESC) AS ItemDescription
FROM Transactions t
with syntactically-correct but superfluous parentheses around t.ItemNumber
. It is to the result-set as a whole that DISTINCT
applies.
In this case, since your GROUP BY
groups by the column that actually varies, you get the same results. I'm actually slightly surprised that SQL Server doesn't (in the GROUP BY
example) insist that the subqueried column is mentioned in the GROUP BY
list.
Same results but the second one seems to have a more expensive sort step to apply the DISTINCT
on my quick test.
Both were beaten out of sight by ROW_NUMBER
though...
with T as
(
SELECT ItemNumber,
ItemDescription,
ROW_NUMBER() OVER ( PARTITION BY ItemNumber ORDER BY DateCreated DESC) AS RN
FROM Transactions
)
SELECT * FROM T
WHERE RN=1
edit ...which in turn was thumped by Joe's solution on my test setup.
Test Setup
CREATE TABLE Transactions
(
ItemNumber INT not null,
ItemDescription VARCHAR(50) not null,
DateCreated DATETIME not null
)
INSERT INTO Transactions
SELECT
number, NEWID(),DATEADD(day, cast(rand(CAST(newid() as varbinary))*10000
as int),getdate())
FROM master.dbo.spt_values
ALTER TABLE dbo.Transactions ADD CONSTRAINT
PK_Transactions PRIMARY KEY CLUSTERED
(ItemNumber,DateCreated)
If you're running at least 2005 and can use a CTE, this is a little cleaner IMHO.
EDIT: As pointed out in Martin's answer, this also performs much better.
;with cteMaxDate as (
select t.ItemNumber, max(DateCreated) as MaxDate
from Transactions t
group by t.ItemNumber
)
SELECT t.ItemNumber, t.ItemDescription
FROM cteMaxDate md
inner join Transactions t
on md.ItemNumber = t.ItemNumber
and md.MaxDate = t.DateCreated