Is it possible to force the optimizer to eliminate irrelevant tables in this partitioned view?
Enable trace flag 4199.
I also had to issue:
UPDATE STATISTICS dbo.ObservationDates
WITH ROWCOUNT = 73049;
to get the plans shown below. Statistics for this table were missing from the upload. The 73,049 figure came from the Table Cardinality information in the Plan Explorer attachment. I used SQL Server 2014 SP1 CU4 (build 12.0.4436) with two logical processors, maximum memory set to 2048 MB, and no trace flags aside from 4199.
You should then get an execution plan that features dynamic partition elimination:
select
od.[Year],
AvgValue = avg(ObservationValue)
from dbo.v_Observation o
join dbo.ObservationDates od
on o.ObservationDateKey = od.DateKey
where
od.[Year] >= 2000 and od.[Year] < 2006
group by
od.[Year]
option (querytraceon 4199);
Plan fragment:
This might look worse, but the Filters are all start-up filters. An example predicate is:
Per iteration of the loop, the start-up predicate is tested, and only if it returns true is the Clustered Index Seek below it executed. Hence, dynamic partition elimination.
This is perhaps not quite as efficient as static elimination, especially if the plan is parallel.
You may need to try hints like MAXDOP 1
, FAST 1
or FORCESEEK
on the view to get the same plan. Optimizer costing choices with partitioned views (like partitioned tables) can be tricky.
The point is you need a plan that features start-up filters to get dynamic partition elimination with partitioned views.
Queries with embedded USE PLAN
hints: (via gist.github.com):
- Serial plan
- Parallel plan
My observation has always been that you must specify the value (or range of values) for the partition column explicitly in the query in order to get "table elimination" in a partitioned view. This is based on experience using partitioned views in production from SQL Server 2000 through SQL Server 2014.
SQL Server doesn't have a concept of a loop join operator in which the engine can dynamically aim the seek directly at the proper table on the inner side of the loop based on the value of the row on the outer side of the loop. However, as Paul's answer explains, there is the possibility of a plan with start-up filters in order to dynamically skip irrelevant tables on the inner side of the loop in constant time (as opposed to logarithmic by actually performing the seek).
Note that for partitioned tables, however, this type of seek (to a specific partition) is supported.
If you are fixed on using partitioned views, another option is to split your query into multiple queries, such as:
-- Gather than the min/max values for the partition column
DECLARE @minDateKey INT,
@maxDateKey INT
SELECT @minDateKey = MIN(DateKey),
@maxDateKey = MAX(DateKey)
FROM dbo.ObservationDates od
WHERE od.[Year] >= 2000 and od.[Year] < 2006
-- Since I have a stats-only copy of the database, simulate having run the query above
-- (You can comment this out since you have the actual data.)
SELECT @minDateKey = 20000101, @maxDateKey = 20051231
-- Adjust the query to use the min/max values of the partition column
-- rather than filtering on a different column in the dimension table
select
od.[Year],
AvgValue = avg(ObservationValue)
from dbo.v_Observation o
join dbo.ObservationDates od
on o.ObservationDateKey = od.DateKey
WHERE od.DateKey >= @minDateKey AND od.DateKey <= @maxDateKey
group by od.[Year]
-- Must use OPTION RECOMPILE; otherwise the plan will touch all tables because it
-- must do so in order to be valid for all values of the parameters!
OPTION (RECOMPILE)
This yields the following plan. There is now an extra query that hits the dimension table, but the the query over the (presumably much larger) fact table is optimized.