cross apply xml query performs exponentially worse as xml document grows
What makes the cross apply query perform so poorly on this simple XML document, and perform exponentially slower as the dataset grows?
It Is the use of the parent axis to get the attribute ID from the item node.
It is this part of the query plan that is problematic.
Notice the 423 rows coming out of the lower Table-valued function.
Adding just one more item node with three field nodes gives you this.
732 rows returned.
What if we double the nodes from the first query to a total of 6 item nodes?
We are up to a whopping 1602 row returned.
The figure 18 in the top function is all field nodes in your XML. We have here 6 items with three fields in each item. Those 18 nodes are used in a nested loops join against the other function so 18 executions returning 1602 rows gives that it is returning 89 rows per iteration. That just happens to be the exact number of nodes in the entire XML. Well it is actually one more than all the visible nodes. I don't know why. You can use this query to check the total number of nodes in your XML.
select count(*)
from @XML.nodes('//*, //@*, //*/text()') as T(X)
So the algorithm used by SQL Server to get the value when you use the parent axis ..
in a values function is that it first finds all the nodes you are shredding on, 18 in the last case. For each of those nodes it shreds and returns the entire XML document and checks in the filter operator for the node you actually want. There you have your exponential growth.
Instead of using the parent axis you should use one extra cross apply. First shred on item and then on field.
select I.X.value('@name', 'varchar(5)') as item_name,
F.X.value('@id', 'uniqueidentifier') as field_id,
F.X.value('@type', 'int') as field_type,
F.X.value('text()[1]', 'nvarchar(15)') as field_value
from #temp as T
cross apply T.x.nodes('/data/item') as I(X)
cross apply I.X.nodes('field') as F(X)
I also changed how you access the text value of the field. Using .
will make SQL Server go look for child nodes to field
and concatenate those values in the result. You have no child values so the result is the same but it is a good thing to avoid to have that part in the query plan (the UDX operator).
The query plan does not have the issue with the parent axis if you are using an XML index but you will still benefit from changing how you fetch the field value.
Adding an XML index did the trick. Now the 6,500 records that took 20 minutes to run takes < 4 seconds.
create table #temp (id int primary key, x xml)
create primary xml index idx_x on #temp (x)
insert into #temp (id, x)
values (1, '
<data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
<item name="1">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
</item>
<item name="2">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
</item>
<item name="3">
<field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.883962036959074</field>
<field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.781459993268713</field>
<field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.228442351572923</field>
</item>
</data>
')
select c.value('(../@name)','varchar(5)') as item_name
,c.value('(@id)','uniqueidentifier') as field_id
,c.value('(@type)','int') as field_type
,c.value('(.)','nvarchar(15)') as field_value
from #temp cross apply
#temp.x.nodes('/data/item/field') as y(c)
drop table #temp