How SQL Server caches data in memory
Test Data
CREATE TABLE dbo.Table1( A INT IDENTITY(1,1) PRIMARY KEY NOT NULL
,B varchar(255),C int,D int,E int);
INSERT INTO dbo.Table1 WITH(TABLOCK)
(B,C,D,E)
SELECT TOP(1000000) 'Some Value ' + CAST((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) % 400) as varchar(255))-- 400 different values
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;
CREATE INDEX IX_B
On dbo.Table1(B);
CREATE INDEX IX_C
On dbo.Table1(C);
Q1
When we do "select top 100 * from Table1" - whole clustered index (table) is being read from disk to memory, even if we need just 100 rows ? or just 100 rows (their data pages) are read from disk to memory ?
Only 100 rows from the clustered index are read. As a result only their data pages are cached into memory.
Example
In my case it reads downward from the clustered index, 1 - 100 values for the column A
.
SET STATISTICS IO, TIME ON;
select top 100 * from dbo.Table1
The reads:
Table 'Table1'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
4 logical reads = 4 pages
Why are we still seeing a scan count = 1
, but only 4
logical reads you might ask?
Because it is executing a range scan:
to satisfy the top operator.
Q2
Same with nonclustered index, when we do "select top 100 * from Table1 where column B = 'some value'", whole nonclustered index + clustered index gets loaded into memory ? or just 100 rows from nonclustered index + 100 rows from clustered index ?
Example 1
SET STATISTICS IO, TIME ON;
select top 100 * from dbo.Table1
WHERE B='Some Value 200'
You might expect for the nonclustered Index to be used here, but in fact the clustered index is still used:
With after clearing the cache 287
logical reads & 2528
read ahead reads.
Table 'Table1'. Scan count 1, logical reads 287, physical reads 1, read-ahead reads 2528, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
We are caching these read ahead reads after reading them from disk to then read 287
pages from memory.
The read-ahead mechanism is SQL Server’s feature which brings data pages into the buffer cache even before the data is requested by the query. Source
If we checked the cached pages:
cached_pages_count objectname indexname indexid
2536 Table1 PK__Table1__3BD019AE039DA497 1
We see this also shown.
So in this case we are caching some more pages to satisfy the query faster, since we are reading more rows to apply a residual predicate on them:
But we are only reading these pages from the clustered index.
Example 2
You could change the query in the hope of utilizing the nonclustered index:
select top 100 B from dbo.Table1
WHERE B='Some Value 200';
Which it does:
This again gives a small range scan:
Table 'Table1'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Only the nonclustered index pages are loaded into memory here.
Example 3
We might force a key lookup to satisfy query from example 1 by using an index hint:
SET STATISTICS IO, TIME ON;
select top 100 *
from dbo.Table1
WITH(INDEX(IX_B))
WHERE B='Some Value 200';
This again results in some read ahead reads and more logical reads than example1:
Table 'Table1'. Scan count 1, logical reads 684, physical reads 3, read-ahead reads 465, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Since we are doing a key lookup from the nonclustered to the clustered index we will be caching pages from both indexes:
even if we need just 100 rows ? or just 100 rows (their data pages) are read from disk to memory ?
This could be due to Logical Processing order of SQL query as the TOP
clause would be considered at last in the logical sequence. for more details... So it would read less pages when filter applied in WHERE
clause to return only 100 rows not by applying TOP 100
.