Why is my index not being used in a SELECT TOP?
If I let the server decide which index to use, it picks
IX_MachineryId
, and it takes up to a minute.
That index is not partitioned, so the optimizer recognizes it can be used to provide the ordering specified in the query without sorting. As a non-unique nonclustered index, it also has the keys of the clustered index as subkeys, so the index can be used to seek on MachineryId
and the DateRecorded
range:
The index does not include OperationalSeconds
, so the plan has to look that value up per row in the (partitioned) clustered index in order to test OperationalSeconds > 0
:
The optimizer estimates that one row will need to be read from the nonclustered index and looked up to satisfy the TOP (1)
. This calculation is based on the row goal (find one row quickly), and assumes a uniform distribution of values.
From the actual plan, we can see the estimate of 1 row is inaccurate. In fact, 19,039 rows have to be processed to discover that no rows satisfy the query conditions. This is the worst case for a row goal optimization (1 row estimated, all rows actually needed):
You can disable row goals with trace flag 4138. This would most likely result in SQL Server choosing a different plan, possibly the one you forced. In any case, the index IX_MachineryId
could be made more optimal by including OperationalSeconds
.
It is quite unusual to have non-aligned nonclustered indexes (indexes partitioned in a different way from the base table, including not at all).
That really suggests to me that I have made the index right, and the server is just making a bad decision. Why?
As usual, the optimizer is selecting the cheapest plan it considers.
The estimated cost of the IX_MachineryId
plan is 0.01 cost units, based on the (incorrect) row goal assumption that one row will be tested and returned.
The estimated cost of the IX_MachineryId_DateRecorded
plan is much higher, at 0.27 units, mostly because it expects to read 5,515 rows from the index, sort them, and return the one that sorts lowest (by DateRecorded
):
This index is partitioned, and cannot return rows in DateRecorded
order directly (see later). It can seek on MachineryId
and the DateRecorded
range within each partition, but a Sort is required:
If this index were not partitioned, a sort would not be required, and it would be very similar to the other (unpartitioned) index with the extra included column. An unpartitioned filtered index would be slightly more efficient still.
You should update the source query so that the data types of the @From
and @To
parameters match the DateRecorded
column (datetime
). At the moment, SQL Server is computing a dynamic range due to the type mismatch at runtime (using the Merge Interval operator and its subtree):
<ScalarOperator ScalarString="GetRangeWithMismatchedTypes([@From],NULL,(22))">
<ScalarOperator ScalarString="GetRangeWithMismatchedTypes([@To],NULL,(22))">
This conversion prevents the optimizer from reasoning correctly about the relationship between ascending partition IDs (covering a range of DateRecorded
values in ascending order) and the inequality predicates on DateRecorded
.
The partition ID is an implicit leading key for a partitioned index. Normally, the optimizer can see that ordering by partition ID (where ascending IDs map to ascending, disjoint values of DateRecorded
) then DateRecorded
is the same as ordering by DateRecorded
alone (given that MachineryID
is constant). This chain of reasoning is broken by the type conversion.
Demo
A simple partitioned table and index:
CREATE PARTITION FUNCTION PF (datetime)
AS RANGE LEFT FOR VALUES ('20160101', '20160201', '20160301');
CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY]);
CREATE TABLE dbo.T (c1 integer NOT NULL, c2 datetime NOT NULL) ON PS (c2);
CREATE INDEX i ON dbo.T (c1, c2) ON PS (c2);
INSERT dbo.T (c1, c2)
VALUES (1, '20160101'), (1, '20160201'), (1, '20160301');
Query with matched types
-- Types match (datetime)
DECLARE
@From datetime = '20010101',
@To datetime = '20090101';
-- Seek with no sort
SELECT T2.c2
FROM dbo.T AS T2
WHERE T2.c1 = 1
AND T2.c2 >= @From
AND T2.c2 < @To
ORDER BY
T2.c2;
Query with mismatched types
-- Mismatched types (datetime2 vs datetime)
DECLARE
@From datetime2 = '20010101',
@To datetime2 = '20090101';
-- Merge Interval and Sort
SELECT T2.c2
FROM dbo.T AS T2
WHERE T2.c1 = 1
AND T2.c2 >= @From
AND T2.c2 < @To
ORDER BY
T2.c2;
The index seems quite good for the query and I'm not sure why it is not chosen by the optimizer (statistics? the partitioning? azure limitation?, no idea really.)
But a filtered index would be even better for the specific query, if the > 0
is a fixed value and doesn't change from one query execution to another:
CREATE NONCLUSTERED INDEX IX_MachineryId_DateRecorded_filtered
ON dbo.MachineryReading
(MachineryId, DateRecorded)
WHERE (OperationalSeconds > 0) ;
There are two differences between the index you have where the OperationalSeconds
is the 3rd column and the filtered index:
First the filtered index is smaller, both in width (narrower) and in number of rows.
This makes the filtered index more efficient in general as SQL Server needs less space to keep it in memory.Second and this is more subtle and important for the query is that it has only rows that match the filter used in the query. This might be extremely important, depending on the values of this 3rd column.
For example a specific set of parameters forMachineryId
andDateRecorded
may yield 1000 rows. If all or almost all of these rows match the(OperationalSeconds > 0)
filter, both indexes will behave well. But if the rows matching the filter are very few (or just the last one or none at all), the first index will have to go through a lot or all those 1000 rows until it finds a match. The filtered index on the other hand needs only one seek to find a matching row (or to return 0 rows) because only rows matching the filter are stored.