Does SQL Server Cache Aggregate Results When Duplicated Across Columns?

SQL Server only calculates the COUNT once. You can see this by looking at the properties of the execution plan for

create table Orders(order_id int, total int, discount int)


SELECT
    COUNT(order_id) AS num_orders
    , SUM(total) / COUNT(order_id) as avg_total
    , SUM(discount) / COUNT(order_id) AS avg_discount
FROM Orders

enter image description here

The stream aggregate (1) has the following defined values

[Expr1008] = Scalar Operator(COUNT([tempdb].[dbo].[Orders].[order_id])), 
[Expr1009] = Scalar Operator(COUNT_BIG([tempdb].[dbo].[Orders].[total])), 
[Expr1010] = Scalar Operator(SUM([tempdb].[dbo].[Orders].[total])), 
[Expr1011] = Scalar Operator(COUNT_BIG([tempdb].[dbo].[Orders].[discount])), 
[Expr1012] = Scalar Operator(SUM([tempdb].[dbo].[Orders].[discount]))

Expr1008 is the calculation of the COUNT that you ask about.

There are some other COUNT aggregates for the other two columns. These are needed because the correct result for SUM(total) (for example) if COUNT(total) is 0 should be NULL.

This is carried out by the next compute scalar along (2). This also converts the COUNT result (Expr1008) from bigint to int and labels that as Expr1003

[Expr1003] = Scalar Operator(CONVERT_IMPLICIT(int,[Expr1008],0)),
[Expr1004] = Scalar Operator(CASE WHEN [Expr1009]=(0) THEN NULL ELSE [Expr1010] END), 
[Expr1005] = Scalar Operator(CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012] END)

Finally the left most compute scalar (3) uses Expr1003 in the division operation...

[Expr1006] = Scalar Operator([Expr1004]/[Expr1003]), 
[Expr1007] = Scalar Operator([Expr1005]/[Expr1003])

... and outputs columns Expr1003, Expr1006, Expr1007 as the final result

PS: AVG has been supported much longer than SQL Server 2008. I imagine it has likely been available in the beginning. However it does not have the same semantics as your rewrite in the presence of NULLs anyway.

I assume order_id is the primary key and therefore not nullable but for a table with 10 orders and two NOT NULL total values of 2 and 4 then AVG(total) would be 3 but SUM(total) / COUNT(order_id) would be 0.6 (or 0 once integer division is taken into account).

It should be calculated only once. Here's little test that shows that even if the aggregate calls a UDF, it's not re-executed in every expression in which it appears.

use tempdb
go
create or alter function slow(@ms int)
returns int
as
begin

  declare @buf varbinary(max) = cast( replicate(cast(0x as nvarchar(max)),1000*1000*20) as varbinary(max))
  declare @start datetime2 = sysdatetime()

  while datediff(ms,@start,sysdatetime()) < @ms
  begin
     declare @hash binary(256) = hashbytes('sha_256',@buf)     
  end
  return 42;
end 
go
set statistics time on
go
select max(dbo.slow(10)) a
from sys.objects 

select max(dbo.slow(10)) a, sin(max(dbo.slow(10))) b, max(dbo.slow(10))/500 c, count(*) + max(dbo.slow(10)) d
from sys.objects

You are trying to optimize for things that aren't necessary to optimize. Consider these queries:

SET STATISTICS IO ON;

SELECT SUM(OrderQty)/COUNT(SalesOrderID) FROM SalesLT.SalesOrderDetail; -- scans once

SELECT COUNT(SalesOrderID),
 SUM(OrderQty)/COUNT(SalesOrderID),
 SUM(ProductID)/COUNT(SalesOrderID),
 SUM(UnitPriceDiscount)/COUNT(SalesOrderID),
 MIN(OrderQty)/COUNT(SalesOrderID),
 MAX(OrderQty)/COUNT(SalesOrderID)
 FROM SalesLT.SalesOrderDetail; -- still only scans once

DECLARE @order_count AS INT 
 SELECT @order_count = COUNT(SalesOrderID) 
 FROM SalesLT.SalesOrderDetail; -- scans once

SELECT @order_count,
 SUM(OrderQty)/@order_count,
 SUM(ProductID)/@order_count,
 SUM(UnitPriceDiscount)/@order_count,
 MIN(OrderQty)/@order_count,
 MAX(OrderQty)/@order_count
 FROM SalesLT.SalesOrderDetail; -- has to scan again

The cost of scanning the entire table once just to get the count and then scanning again to use that count in operations within the second scan is definitely more trouble than it's worth. This is obvious in both the plans and the stats I/O, even on a very small table:

-- first query - 7 reads
Table 'SalesOrderDetail'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- second query - still just 7 reads Table 'SalesOrderDetail'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- variable + last two queries - 11 reads
Table 'SalesOrderDetail'. Scan count 1, logical reads 4, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrderDetail'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If you're going to be reusing the count in multiple queries, queries against different tables, or queries where you're not otherwise going to have to scan (joins, filtered queries etc.), then it's a different question, potentially. But you shouldn't be concerned that the count will happen more than once during a single scan; as others have demonstrated, it won't. To justify pulling that count out of the query you'd have to have other scenarios happening.

Does SQL Server Cache Aggregate Results When Duplicated Across Columns?

Tags:

Sql Server

Sql Server 2012

Related

Recent Posts