Do double slash in XPath predicate work the same as in the path itself
With XPath 1.0 //
is an abbreviation for /descendant-or-self::node()/
so your first path is /descendant-or-self::node()/div[descendant::table/descendant::td[4]]
while the second is rather different with /descendant-or-self::node()/div[/descendant-or-self::node()/table/descendant-or-self::node()/td[4]]
. So the major difference is that inside your first predicate you look down for descendants relative to the div
element while in the second predicate you look down for descendants from the root node /
(also called the document node).
You might want //div[.//table//td[4]]
for the second path expression to come closer to the first one.
[edit] Here is a sample:
<html>
<body>
<div>
<table>
<tbody>
<tr>
<td>1</td>
</tr>
<tr>
<td>2</td>
</tr>
<tr>
<td>3</td>
</tr>
<tr>
<td>4</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
With that sample the path //div[descendant::table/descendant::td[4]]
selects the div
element as it has a table
child which has a fourth td
descendant.
However with //div[.//table//td[4]]
we look for //div[./descendant-or-self::node()/table/descendant-or-self::node()/td[4]]
which is short for //div[./descendant-or-self::node()/table/descendant-or-self::node()/child::td[4]]
and there is no element having a fourth td
child element.
I hope that explains the difference, if you use //div[.//table/descendant::td[4]]
then you should get the same result as with your original form.
There's an important note in W3C document on XPath 1.0 (W3C Recommendation 16 November 1999):
XML Path Language (XPath) Version 1.0
2 Location Paths
2.5 Abbreviated SyntaxNOTE: The location path
//para[1]
does not mean the same as the location path/descendant::para[1]
. The latter selects the first descendantpara
element; the former selects all descendantpara
elements that are the firstpara
children of their parents.
Simlar note in the document on XPath 3.1 (W3C Recommendation 21 March 2017)
XML Path Language (XPath) 3.1
3 Expressions
3.3 Path Expressions
3.3.5 Abbreviated SyntaxNOTE: The path expression
//para[1]
does not mean the same as the path expression/descendant::para[1]
. The latter selects the first descendantpara
element; the former selects all descendantpara
elements that are the firstpara
children of their respective parents.
That means the double slash inside the path is not just a shortcut for /descendant-or-self::node()/
but also a starting point for next level of an XML tree iteration, which implies the step expression to the right of //
is re-run on each descendant of the current context node.
So the exact meaning of the predicate in this path
//div[ descendant::table/descendant::td[4] ]
is:
- build a sequence of all
<table>
nodes descendant to the current<div>
, - for every such
<table>
build a sequence of all descendant<td>
elements and concatenate them into a single sequence, - filter that sequence for its fourth item.
Finally the path returns all <div>
elements in the document, which have at least four data cells in all their nested tables. And since there are tables in the document which have 4 cells or more (including cells in nested tables, of course), the whole expression selects their respective <div>
ancestors.
On the other hand the predicate in
//div[ //table//td[4] ]
means:
- scan the whole document tree for
<table>
elements (more precisely, test the root node and every root's descendant if it has a<table>
child), - for every table found scan its subtree for elements having a fourth
<td>
subelement (i.e. test if the table or any of its descendants has at least four<td>
children).
Please note the predicate subexpression does not depend on the context node. It is a global path, resolving to some sequence of nodes (possibly empty), thus the predicate boolean value depends only on the document's structure. If it is true the whole path returns a sequence of all <div>
elements in the document, else the empty sequence.
Finally the predicate would be true iff there was an element in any table, having 4 (at least) data cells.
And as far as I can see all <tr>
rows contain two or three cells - there is no element with 4 or more <td>
children, so the predicate subexpression returns en empty sequence, the predicate is false and the whole path gets filtered out. Result is: nothing (empty sequence).