What does "@*|node()" in a XSLT apply-template select mean?
The XPath expression @* | node()
selects the union of attribute nodes (@*
) and all other types of XML nodes (node()
).
It is a shorthand for attribute::* | child::node()
.
In XSLT, XPath is relative to the context node and the default selection axis is the child
axis, so the expression
- selects all attributes and immediate children of the context node (when used as a
select="..."
expression, for example in<xsl:apply-templates>
) - matches all attribute- and other nodes regardless of context (when used as a
match=""
expression in<xsl:template>
) - note that there is a difference between selecting nodes and matching them: the context node only matters for selection.
Imagine the following node is the context node:
<xml attr="value">[
]<child />[
]<!-- comment -->[
]<child>
<descendant />
</child>[
]</xml>
the expression node()
will not only select both <child>
nodes, but also four whitespace-only text nodes (signified by [
and ]
for the sake of visibility) and the comment. The <descendant>
is not selected.
A special characteristic of XML is that attribute nodes are not children of the elements they belong to (although the parent of an attribute is the element it belongs to).
This asymmetric relationship makes it necessary to select them separately, hence the @*
.
It matches any attribute node belonging to the context node, so the attr="value"
will be selected as well.
The |
is the XPath union operator. It creates a singe node set from two separate node-sets.
<xsl:apply-templates>
then finds the appropriate <xsl:template>
for every selected node and runs it for that node. This is the template matching part I mentioned above.
To add to Tomalak's excellent answer:
Most ofthen one would see <xsl:apply-template select="@*|node()"/>
used in a template like this one:
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
This is known as the identity rule or "identity template".
One of the most fundamental and powerful XSLT design patterns is the use and overriding of the identity rule.
If a transformation consists only of the identity rule, the result of the transformation is the source XML document itself -- this is why the template is known as "the identity rule".
Why is this result produced?
The short answer is: because of the XSLT processing model.
A more detailed explanation must start from the top:
node()
matches any element, text-node, comment or processing-instruction. The document- (root)-node is also matched by node()
.
We can imagine the "leaf" nodes of any document tree -- these are any nodes that don't have children themselves such as a text-node, comment and processing-instruction. An empty element should also be considered a leaf node.
The identity rule is initially selected for execution (applied) against all children nodes of the document nodes (these are the single top elements and any comments or processing-instruction siblings it could have). The matched node is shallow copied and if it is a non-element leaf node, the <xsl:apply-templates select="node()|@*"/>
instruction doesn't select any nodes or attributes.
If the matched node is an element, it is shallow copied then the <xsl:apply-templates select="node()|@*"/>
instruction causes the same template (as there isn't any other template in the transformation code) to be applied against each of its attributes and each of its children nodes.
This is the recursion that drives processing every node of the XML document until leaf nodes or attributes are reached and at which place the <xsl:apply-templates select="node()|@*"/>
selects no children or attribute nodes.
Congratulations to @Tomalak for the first correct answer. The tick should be going on his answer. I'm just going to add some clarifications to his answer.
Note One
... @* | node() selects the union of ...
The | operator not just returns the union of the two operands but sorts in document order and removes duplicates. The de-dup part is not relevant here because there are no duplicates to remove, but the sorting part is worth noting. A more correct version would be to say ...
... @* | node() selects the union, sorted in document order, of ...
Note Two
... and all other types of XML child nodes (node())
This is broadly true, but is misleading. When most people read "XML child nodes", they think child nodes in the DOM sense. But this is not what is being selected. Only XDM nodes are selected. For an illustration take a look at the following document.
<?xml version="1.0" encoding="ISO-8859-1"?>
<root-element my-attrib="myattrib-vaue" xmlns:hi="www.abc.com"><child-element />
abc'def
</root-element>
Now suppose the context item is the 'root-element'. A reader of Tomalak's answer is asked the question: what is selected by "@*|node()"? The implication of Tomalak's answer for those thinking of the DOM model, would be that there are 6 things selected:
- The my-attrib attribute
- The node-space attribute (which is a true attribute in DOM)
- The child-element node
- The 'abc' bit
- The entity reference
- The 'def' bit.
But this is not actually true in XSLT. What is actually selected is ...
- The my-attrib attribute
- The child-element node
- The XDM text node, being the concatination of 3 DOM text nodes like: "abc'def"
So a more accurate statement would be ...
The XPath expression @* | node() selects the union, sorted in document order, of (attribute nodes of the context item and XML child nodes of the context item in sense of the XDM). The XD model ignores some node types, such as entity definitions, that are in the DOM, and contiguous text DOM nodes are concatenated into one XDM text node.