Create a plan guide to cache (lazy spool) CTE result
Is there ANY way to make the result come up with exactly 3 distinct guids and no more? I'm hoping to be able to better answer questions in future by including plan guides with CTE-type queries that are referenced multiple times to overcome some SQL Server CTE quirks.
Not today. Non-recursive common table expressions (CTEs) are treated as in-line view definitions and expanded into the logical query tree at each place they are referenced (just like regular view definitions are) before optimization. The logical tree for your query is:
LogOp_OrderByCOL: Union1007 ASC COL: Union1015 ASC
LogOp_Project COL: Union1006 COL: Union1007 COL: Union1014 COL: Union1015
LogOp_Join
LogOp_ViewAnchor
LogOp_UnionAll
LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
LogOp_ViewAnchor
LogOp_UnionAll
LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
Notice the two View Anchors and the six calls to the intrinsic function newid
before optimization gets started. Nevertheless, many people consider that the optimizer ought to be able to identify that the expanded sub-trees were originally a single referenced object and simplify accordingly. There have also been several Connect requests to allow explicit materialization of a CTE or derived table.
A more general implementation would have the optimizer consider materializing arbitrary common expressions to improve performance (CASE
with a subquery is another example where problems can occur today). Microsoft Research published a paper (PDF) on that back in 2007, though it remains unimplemented to date. For the time being, we are limited to explicit materialization using things like table variables and temporary tables.
SQLKiwi has mentioned drawing up plans in SSIS, is there a way or useful tool to assist in laying out a good plan for SQL Server?
This was just wishful thinking on my part, and went well beyond the idea of modifying plan guides. It is possible, in principle, to write a tool to manipulate show plan XML directly, but without specific optimizer instrumentation using the tool would likely be a frustrating experience for the user (and the developer come to think of it).
In the particular context of this question, such a tool would still be unable to materialize the CTE contents in a way that could used by multiple consumers (to feed both inputs to the cross join in this case). The optimizer and execution engine do support multi-consumer spools, but only for specific purposes - none of which could be made to apply to this particular example.
While I'm not certain, I have a fairly strong hunch that the RelOps can be followed (Nested Loop, Lazy Spool) even if the query is not exactly the same as the plan - for instance if you added 4 and 5 to the CTE, it still continues to use the same plan (seemingly - tested on SQL Server 2012 RTM Express).
There is a reasonable amount of flexibility here. The broad shape of the XML plan is used to guide the search for a final plan (though many attributes are ignored completely e.g. partitioning type on exchanges) and the normal search rules are considerably relaxed as well. For example, early pruning of alternatives based on cost considerations is disabled, the explicit introduction of cross joins is allowed, and scalar operations are ignored.
There are too many details to go into in depth, but the placement of Filters and Compute Scalars cannot be forced, and predicates of the form column = value
are generalized so a plan containing X = 1
or X = @X
can be applied to a query containing X = 502
or X = @Y
. This particular flexibility can help greatly in finding a natural plan to force.
In the specific example, constant Union All can always be implemented as a Constant Scan; the number of inputs to the Union All does not matter.
There is no way (SQL Server versions up to 2012) to re-use a single spool for both occurences of the CTE. Details can be found in SQLKiwi's answer. Further below are two ways to materialize the CTE twice, which is unavoidable for the nature of the query. Both options result in a net distinct guid count of 6.
The link from Martin's comment to Quassnoi's site on a blog about plan guiding a CTE was partial inspiration for this question. It describes a way to materialize a CTE for the purpose of a correlated subquery, which is referenced only once although the correlation can cause it to be evaluated multiple times. That does not apply to the query in the question.
Option 1 - Plan Guide
Taking hints from SQLKiwi's answer, I have pared down the guide to a bare minimum that will still do the job, e.g. the ConstantScan
nodes only list 2 scalar operators which can sufficiently expand to any number.
;with cte(guid,other) as (
select newid(),1 union all
select newid(),2 union all
select newid(),3)
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other
OPTION(USE PLAN
N'<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="11.0.2100.60" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="1600" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" StatementSubTreeCost="0.0444433" StatementText="with cte(guid,other) as (
 select newid(),1 union all
 select newid(),2 union all
 select newid(),3
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other;
" StatementType="SELECT" QueryHash="0x43D93EF17C8E55DD" QueryPlanHash="0xF8E3B336792D84" RetrievedFromCache="true">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan NonParallelPlanReason="EstimatedDOPIsOne" CachedPlanSize="96" CompileTime="13" CompileCPU="13" CompileMemory="1152">
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="157240" EstimatedPagesCached="1420" EstimatedAvailableDegreeOfParallelism="1" />
<RelOp AvgRowSize="47" EstimateCPU="0.006688" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1600" LogicalOp="Inner Join" NodeId="0" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.0444433">
<OutputList>
<ColumnReference Column="Union1163" />
</OutputList>
<Warnings NoJoinPredicate="true" />
<NestedLoops Optimized="false">
<RelOp AvgRowSize="27" EstimateCPU="0.000432115" EstimateIO="0.0112613" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Sort" NodeId="1" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.0117335">
<OutputList>
<ColumnReference Column="Union1080" />
<ColumnReference Column="Union1081" />
</OutputList>
<MemoryFractions Input="0" Output="0" />
<Sort Distinct="false">
<OrderBy>
<OrderByColumn Ascending="true">
<ColumnReference Column="Union1081" />
</OrderByColumn>
</OrderBy>
<RelOp AvgRowSize="27" EstimateCPU="4.0157E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Constant Scan" NodeId="2" Parallel="false" PhysicalOp="Constant Scan" EstimatedTotalSubtreeCost="4.0157E-05">
<OutputList>
<ColumnReference Column="Union1080" />
<ColumnReference Column="Union1081" />
</OutputList>
<ConstantScan>
<Values>
<Row>
<ScalarOperator ScalarString="newid()">
<Intrinsic FunctionName="newid" />
</ScalarOperator>
<ScalarOperator ScalarString="(1)">
<Const ConstValue="(1)" />
</ScalarOperator>
</Row>
<Row>
<ScalarOperator ScalarString="newid()">
<Intrinsic FunctionName="newid" />
</ScalarOperator>
<ScalarOperator ScalarString="(2)">
<Const ConstValue="(2)" />
</ScalarOperator>
</Row>
</Values>
</ConstantScan>
</RelOp>
</Sort>
</RelOp>
<RelOp AvgRowSize="27" EstimateCPU="0.0001074" EstimateIO="0.01" EstimateRebinds="0" EstimateRewinds="39" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Lazy Spool" NodeId="83" Parallel="false" PhysicalOp="Table Spool" EstimatedTotalSubtreeCost="0.0260217">
<OutputList>
<ColumnReference Column="Union1162" />
<ColumnReference Column="Union1163" />
</OutputList>
<Spool>
<RelOp AvgRowSize="27" EstimateCPU="0.000432115" EstimateIO="0.0112613" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Sort" NodeId="84" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.0117335">
<OutputList>
<ColumnReference Column="Union1162" />
<ColumnReference Column="Union1163" />
</OutputList>
<MemoryFractions Input="0" Output="0" />
<Sort Distinct="false">
<OrderBy>
<OrderByColumn Ascending="true">
<ColumnReference Column="Union1163" />
</OrderByColumn>
</OrderBy>
<RelOp AvgRowSize="27" EstimateCPU="4.0157E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="40" LogicalOp="Constant Scan" NodeId="85" Parallel="false" PhysicalOp="Constant Scan" EstimatedTotalSubtreeCost="4.0157E-05">
<OutputList>
<ColumnReference Column="Union1162" />
<ColumnReference Column="Union1163" />
</OutputList>
<ConstantScan>
<Values>
<Row>
<ScalarOperator ScalarString="newid()">
<Intrinsic FunctionName="newid" />
</ScalarOperator>
<ScalarOperator ScalarString="(1)">
<Const ConstValue="(1)" />
</ScalarOperator>
</Row>
<Row>
<ScalarOperator ScalarString="newid()">
<Intrinsic FunctionName="newid" />
</ScalarOperator>
<ScalarOperator ScalarString="(2)">
<Const ConstValue="(2)" />
</ScalarOperator>
</Row>
</Values>
</ConstantScan>
</RelOp>
</Sort>
</RelOp>
</Spool>
</RelOp>
</NestedLoops>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>'
);
Option 2 - Remote Scan
By increasing the expense of the query and introducing a Remote Scan, the result is materialized.
with cte(guid,other) as (
select *
from OPENQUERY([TESTSQL\V2012], '
select newid(),1 union all
select newid(),2 union all
select newid(),3') x)
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other;
In all seriousness you can't cut up xml execution plans from scratch. Creating them using SSIS is science fiction. Yes it's all XML, but they are from different universes. Looking at Paul's blog on that topic, he's saying "much in the way SSIS allows ..." so possibly you've misunderstood? I don't think he's saying "use SSIS to create plans" but rather "wouldn't it be great to be able to create plans using a drag and drop interface like SSIS". Maybe, for a very simple query, you could just about manage this, but it's a stretch, possibly even a waste of time. Busy work you might say.
If I'm creating a plan for a USE PLAN hint or plan guide, I have a couple of approaches. For example, I might remove records from tables (eg on a copy of the db) to influence the stats and encourage the optimizer to make a different decision. I've also used table variables instead of all the table in the query so the optimizer thinks every table contains 1 record. Then in the generated plan, replace all the table variables with the original table names and swap it in as the plan. Another option would be to use the WITH STATS_STREAM option of UPDATE STATISTICS to spoof statistics which is the method used when cloning statistics-only copies of databases eg
UPDATE STATISTICS
[dbo].[yourTable]([PK_yourTable])
WITH
STATS_STREAM = 0x0100etc,
ROWCOUNT = 10000,
PAGECOUNT = 93
I have spent some time tinkering with xml execution plans in the past and I have found that in the end, SQL just goes "I'm not using that" and runs the query how it wants anyway.
For your specific example, I'm sure you're aware you could use set rowcount 3 or TOP 3 in the query to get that result, but I guess that is not your point. The correct answer would really be: use a temp table. I would upvote that : ) Not a correct answer would be "spend hours even days cutting up your own custom XML execution plan where you attempt to trick the optimzer into doing a lazy spool for the CTE which might not even work anyway, would look clever but would also be impossible to maintain".
Not trying to be unconstructive there, just my opinion - hope that helps.