Does SQL Server Cache Aggregate Results When Duplicated Across Columns?
SQL Server only calculates the COUNT
once. You can see this by looking at the properties of the execution plan for
create table Orders(order_id int, total int, discount int)
SELECT
COUNT(order_id) AS num_orders
, SUM(total) / COUNT(order_id) as avg_total
, SUM(discount) / COUNT(order_id) AS avg_discount
FROM Orders
The stream aggregate (1) has the following defined values
[Expr1008] = Scalar Operator(COUNT([tempdb].[dbo].[Orders].[order_id])),
[Expr1009] = Scalar Operator(COUNT_BIG([tempdb].[dbo].[Orders].[total])),
[Expr1010] = Scalar Operator(SUM([tempdb].[dbo].[Orders].[total])),
[Expr1011] = Scalar Operator(COUNT_BIG([tempdb].[dbo].[Orders].[discount])),
[Expr1012] = Scalar Operator(SUM([tempdb].[dbo].[Orders].[discount]))
Expr1008
is the calculation of the COUNT
that you ask about.
There are some other COUNT
aggregates for the other two columns. These are needed because the correct result for SUM(total)
(for example) if COUNT(total)
is 0
should be NULL
.
This is carried out by the next compute scalar along (2). This also converts the COUNT
result (Expr1008
) from bigint
to int
and labels that as Expr1003
[Expr1003] = Scalar Operator(CONVERT_IMPLICIT(int,[Expr1008],0)),
[Expr1004] = Scalar Operator(CASE WHEN [Expr1009]=(0) THEN NULL ELSE [Expr1010] END),
[Expr1005] = Scalar Operator(CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012] END)
Finally the left most compute scalar (3) uses Expr1003
in the division operation...
[Expr1006] = Scalar Operator([Expr1004]/[Expr1003]),
[Expr1007] = Scalar Operator([Expr1005]/[Expr1003])
... and outputs columns Expr1003, Expr1006, Expr1007
as the final result
PS: AVG
has been supported much longer than SQL Server 2008. I imagine it has likely been available in the beginning. However it does not have the same semantics as your rewrite in the presence of NULL
s anyway.
I assume order_id
is the primary key and therefore not nullable but for a table with 10 orders and two NOT NULL
total
values of 2
and 4
then AVG(total)
would be 3
but SUM(total) / COUNT(order_id)
would be 0.6
(or 0
once integer division is taken into account).
It should be calculated only once. Here's little test that shows that even if the aggregate calls a UDF, it's not re-executed in every expression in which it appears.
use tempdb
go
create or alter function slow(@ms int)
returns int
as
begin
declare @buf varbinary(max) = cast( replicate(cast(0x as nvarchar(max)),1000*1000*20) as varbinary(max))
declare @start datetime2 = sysdatetime()
while datediff(ms,@start,sysdatetime()) < @ms
begin
declare @hash binary(256) = hashbytes('sha_256',@buf)
end
return 42;
end
go
set statistics time on
go
select max(dbo.slow(10)) a
from sys.objects
select max(dbo.slow(10)) a, sin(max(dbo.slow(10))) b, max(dbo.slow(10))/500 c, count(*) + max(dbo.slow(10)) d
from sys.objects
You are trying to optimize for things that aren't necessary to optimize. Consider these queries:
SET STATISTICS IO ON;
SELECT SUM(OrderQty)/COUNT(SalesOrderID) FROM SalesLT.SalesOrderDetail; -- scans once
SELECT COUNT(SalesOrderID),
SUM(OrderQty)/COUNT(SalesOrderID),
SUM(ProductID)/COUNT(SalesOrderID),
SUM(UnitPriceDiscount)/COUNT(SalesOrderID),
MIN(OrderQty)/COUNT(SalesOrderID),
MAX(OrderQty)/COUNT(SalesOrderID)
FROM SalesLT.SalesOrderDetail; -- still only scans once
DECLARE @order_count AS INT
SELECT @order_count = COUNT(SalesOrderID)
FROM SalesLT.SalesOrderDetail; -- scans once
SELECT @order_count,
SUM(OrderQty)/@order_count,
SUM(ProductID)/@order_count,
SUM(UnitPriceDiscount)/@order_count,
MIN(OrderQty)/@order_count,
MAX(OrderQty)/@order_count
FROM SalesLT.SalesOrderDetail; -- has to scan again
The cost of scanning the entire table once just to get the count and then scanning again to use that count in operations within the second scan is definitely more trouble than it's worth. This is obvious in both the plans and the stats I/O, even on a very small table:
-- first query - 7 reads
Table 'SalesOrderDetail'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- second query - still just 7 reads Table 'SalesOrderDetail'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- variable + last two queries - 11 reads
Table 'SalesOrderDetail'. Scan count 1, logical reads 4, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrderDetail'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
If you're going to be reusing the count in multiple queries, queries against different tables, or queries where you're not otherwise going to have to scan (joins, filtered queries etc.), then it's a different question, potentially. But you shouldn't be concerned that the count will happen more than once during a single scan; as others have demonstrated, it won't. To justify pulling that count out of the query you'd have to have other scenarios happening.