Optimising plans with XML readers
The reason for the performance difference lies in how scalar expressions are handled in the execution engine. In this case, the expression of interest is:
[Expr1000] = CONVERT(xml,DM_XE_SESSION_TARGETS.[target_data],0)
This expression label is defined by a Compute Scalar operator (node 11 in the serial plan, node 13 in the parallel plan). Compute Scalar operators are different from other operators (SQL Server 2005 onward) in that the expression(s) they define are not necessarily evaluated at the position they appear in the visible execution plan; evaluation can be deferred until the result of the computation is required by a later operator.
In the present query, the target_data
string is typically large, making the conversion from string to XML
expensive. In slow plans, the string to XML
conversion is performed every time a later operator that requires the result of Expr1000
is rebound.
Rebinding occurs on the inner side of a nested loops join when a correlated parameter (outer reference) changes. Expr1000
is an outer reference for most of the nested loops joins in this execution plan. The expression is referenced multiple times by several XML Readers, both Stream Aggregates, and by a start-up Filter. Depending on the size of the XML
, the number of times the string is converted to XML
can easily number in the millions.
The call stacks below show examples of the target_data
string being converted to XML
(ConvertStringToXMLForES
- where ES is the Expression Service):
Start-up Filter
XML Reader (TVF Stream internally)
Stream Aggregate
Converting the string to XML
each time any of these operators rebind explains the performance difference observed with the nested loops plans. This is regardless of whether parallelism is used or not. It just so happens that the optimizer chooses a hash join when the MAXDOP 1
hint is specified. If MAXDOP 1, LOOP JOIN
is specified, performance is poor just as with the default parallel plan (where the optimizer chooses nested loops).
How much performance increases with a hash join depends on whether Expr1000
appears on the build or probe side of the operator. The following query locates the expression on the probe side:
SELECT CAST (
REPLACE (
REPLACE (
XEventData.XEvent.value ('(data/value)[1]', 'varchar(max)'),
'<victim-list>', '<deadlock><victim-list>'),
'<process-list>', '</victim-list><process-list>')
AS XML) AS DeadlockGraph
FROM (SELECT CAST (target_data AS XML) AS TargetData
FROM sys.dm_xe_sessions s
INNER HASH JOIN sys.dm_xe_session_targets st ON s.address = st.event_session_address
WHERE [name] = 'system_health') AS Data
CROSS APPLY TargetData.nodes ('//RingBufferTarget/event') AS XEventData (XEvent)
WHERE XEventData.XEvent.value('@name', 'varchar(4000)') = 'xml_deadlock_report';
I have reversed the written order of the joins from the version shown in the question, because join hints (INNER HASH JOIN
above) also force the order for the whole query, just as if FORCE ORDER
had been specified. The reversal is necessary to ensure Expr1000
appears on the probe side. The interesting part of the execution plan is:
With the expression defined on the probe side, the value is cached:
Evaluation of Expr1000
is still deferred until the first operator needs the value (the start-up filter in the stack trace above) but the computed value is cached (CValHashCachedSwitch
) and reused for later calls by the XML Readers and Stream Aggregates. The stack trace below shows an example of the cached value being reused by an XML Reader.
When the join order is forced such that the definition of Expr1000
occurs on the build side of the hash join, the situation is different:
SELECT CAST (
REPLACE (
REPLACE (
XEventData.XEvent.value ('(data/value)[1]', 'varchar(max)'),
'<victim-list>', '<deadlock><victim-list>'),
'<process-list>', '</victim-list><process-list>')
AS XML) AS DeadlockGraph
FROM (SELECT CAST (target_data AS XML) AS TargetData
FROM sys.dm_xe_session_targets st
INNER HASH JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address
WHERE [name] = 'system_health') AS Data
CROSS APPLY TargetData.nodes ('//RingBufferTarget/event') AS XEventData (XEvent)
WHERE XEventData.XEvent.value('@name', 'varchar(4000)') = 'xml_deadlock_report'
A hash join reads its build input completely to construct a hash table before it starts probing for matches. As a result, we have to store all the values, not just the one per thread being worked on from the probe side of the plan. The hash join therefore uses a tempdb
work table to store the XML
data, and every access to the result of Expr1000
by later operators requires an expensive trip to tempdb
:
The following shows more details of the slow access path:
If a merge join is forced the input rows are sorted (a blocking operation, just like the build input to a hash join) resulting in a similar arrangement where slow access via a tempdb
sort-optimized worktable is required because of the size of the data.
Plans that manipulate large data items can be problematic for all sorts of reasons that are not apparent from the execution plan. Using a hash join (with the expression on the correct input) is not a good solution. It relies on undocumented internal behaviour with no guarantees it will work the same way next week, or on a slightly different query.
The message is that XML
manipulation can be tricky things to optimize today. Writing the XML
to a variable or temporary table before shredding is a much more solid workaround than anything shown above. One way to do this is:
DECLARE @data xml =
CONVERT
(
xml,
(
SELECT TOP (1)
dxst.target_data
FROM sys.dm_xe_sessions AS dxs
JOIN sys.dm_xe_session_targets AS dxst ON
dxst.event_session_address = dxs.[address]
WHERE
dxs.name = N'system_health'
AND dxst.target_name = N'ring_buffer'
)
)
SELECT XEventData.XEvent.value('(data/value)[1]', 'varchar(max)')
FROM @data.nodes ('./RingBufferTarget/event[@name eq "xml_deadlock_report"]') AS XEventData (XEvent)
WHERE XEventData.XEvent.value('@name', 'varchar(4000)') = 'xml_deadlock_report';
Finally, I just want to add Martin's very nice graphic from the comments below:
That is the code from my article originally posted here:
http://www.sqlservercentral.com/articles/deadlock/65658/
If you read the comments you will find a couple of alternatives that don't have the performance problems that you are experiencing, one using a modification of that original query, and the other using a variable to hold the XML before processing it which works out better. (see my comments on Page 2) XML from the DMV's can be slow to process, as can parsing XML from the DMF for the file target which often is better accomplished by reading the data into a temp table first and then processing it. XML in SQL is slow by comparison to using things like .NET or SQLCLR.