Sort spills to tempdb due to varchar(max)
There are going to be several possible workarounds here.
You can manually adjust the memory grant, though I probably wouldn't go that route.
You can also use a CTE and TOP to push the sort lower, before grabbing the max length column. It will look something like below.
WITH CTE AS (
SELECT TOP 1000000000 r.ID, s.ID AS ID2, s.typeID
FROM Resources r
inner join Settings s on resourceid=r.id
where resourceUID=@UID
ORDER BY s.typeID
)
SELECT c.ID, ca.remark
FROM CTE c
CROSS APPLY (SELECT remark FROM dbo.Settings s WHERE s.id = c.ID2) ca(remark)
ORDER BY c.typeID
Proof-of-concept dbfiddle here. Sample data would still be appreciated!
If you want to read an excellent analysis by Paul White, read here.
Why are the spills only happening when [remark] is selected?
The spill is happening when you include that column because you don't get a big enough memory grant for the large string data being sorted.
You don't get a big enough memory grant because the actual number of rows is 10x more than the estimated number of rows (1,302 actual vs 126 estimated).
Why is the estimate off? Why does SQL Server think there's only one row in dbo.Settings with a resourceid
of 38?
It could be a statistics issue, which you can check by running DBCC SHOW_STATISTICS('dbo.Settings', 'IX_Test')
and see the counts for that histogram step. But the execution plan seems to indicate that the stats are as complete and up-to-date as they could be.
Since stats aren't helping, your best bet is probably a query rewrite - which Forrest has covered in his answer.
To me it appears that the where
clause in the query is giving the issue, and is the cause of the low estimates, even if OPTION(RECOMPILE)
is used.
I created some test data, and in the end came up with two solutions, storing the ID
field from resources
in either a variable (if it is always unique) or a temp table, if we can have more than one ID
's.
Base test records
SET NOCOUNT ON
DECLARE @i int= 1;
WHILE @i <= 10000
BEGIN
INSERT INTO [dbo].[Settings]([resourceId],[typeID],remark)
VALUES(@i,@i,'KEPT THESE VALUES OUT BECAUSE IT WOULD CLUTTER THE EXAMPLES, VALUES OVER 8000 Chars entered here'); -- 23254 character length on each value
INSERT INTO [dbo].[Resources](resourceUID)
VALUES(@i);
SET @i += 1;
END
Insert the 'Seek' values, as to get to the same approximate resultset as OP (1300 records)
INSERT INTO [dbo].[Settings]([resourceId],[typeID],remark)
VALUES(38,38,'KEPT THESE VALUES OUT BECAUSE IT WOULD CLUTTER THE EXAMPLES, VALUES OVER 8000 Chars entered here')
GO 1300
Change compat & Update statistics to match OP
ALTER DATABASE StackOverflow SET COMPATIBILITY_LEVEL = 120;
UPDATE STATISTICS settings WITH FULLSCAN;
UPDATE STATISTICS resources WITH FULLSCAN;
Original query
exec sp_executesql N'
select r.id
FROM Resources r
inner join Settings on resourceid=r.id
where resourceUID=@UID
ORDER BY typeID',
N'@UID int',
@UID=38
My estimates are even worse, with one estimated row, while 1300 are returned.
And like OP stated, it does not matter if I add OPTION(RECOMPILE)
An important thing to note, is that when we get rid of the where clause the estimates are 100% correct, which is expected since we are using all data in both tables.
I forced the indexes just to make sure we use the same ones as in the previous query, to prove the point
exec sp_executesql N'
select r.id,remark
FROM Resources r with(index([IX_UID]))
inner join Settings WITH(INDEX([IX_Test]))
on resourceid=r.id
ORDER BY typeID',
N'@UID int',
@UID=38
As expected, good estimates.
So, what could we change to get better estimates but still seek on our values?
IF @UID is unique, as in the example OP gave, we could put the single id
that was returned from resources
in a variable, then seek on that variable with an OPTION(RECOMPILE)
DECLARE @UID int =38 , @RID int;
SELECT @RID=r.id from
Resources r where resourceUID = @UID;
SELECT @uid, remark
from Settings
where resourceId = @uid
Order by typeID
OPTION(RECOMPILE);
Which gives 100% accurate estimates
But what if there are multiple resourceUID's in resources?
add some test data
INSERT INTO Resources(ResourceUID)
VALUES (38);
go 50
This could be resolved with a temp table
CREATE TABLE #RID (id int)
DECLARE @UID int =38
INSERT INTO #RID
SELECT r.id
from
Resources r where resourceUID = @UID
SELECT @uid, remark
from Settings s
INNER JOIN #RID r
ON r.id =s.resourceId
Order by typeID
OPTION(RECOMPILE)
DROP TABLE #RID
Again with accurate estimates.
This was done with my own dataset,YMMV.
Written with sp_executesql
With a variable
exec sp_executesql N'
DECLARE @RID int;
SELECT @RID=r.id from
Resources r where resourceUID = @UID;
SELECT @uid, remark
from Settings
where resourceId = @uid
Order by typeID
OPTION(RECOMPILE);',
N'@UID int',
@UID=38
With a temp table
exec sp_executesql N'
CREATE TABLE #RID (id int)
INSERT INTO #RID
SELECT r.id
from
Resources r where resourceUID = @UID
SELECT @uid, remark
from Settings s
INNER JOIN #RID r
ON r.id =s.resourceId
Order by typeID
OPTION(RECOMPILE)
DROP TABLE #RID',
N'@UID int',
@UID=38
Still 100% correct estimates on my test