How to use XPath contains() here?
This is a new answer to an old question about a common misconception about contains()
in XPath...
Summary: contains()
means contains a substring, not contains a node.
Detailed Explanation
This XPath is often misinterpreted:
//ul[contains(li, 'Model')]
Wrong interpretation:
Select those ul
elements that contain an li
element with Model
in it.
This is wrong because
contains(x,y)
expectsx
to be a string, and- the XPath rule for converting multiple elements to a string is this:
A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.
Right interpretation: Select those ul
elements whose first li
child has a string-value that contains a Model
substring.
Examples
XML
<r>
<ul id="one">
<li>Model A</li>
<li>Foo</li>
</ul>
<ul id="two">
<li>Foo</li>
<li>Model A</li>
</ul>
</r>
XPaths
//ul[contains(li, 'Model')]
selects theone
ul
element.
Note: The two
ul
element is not selected because the string-value of the first li
child
of the two
ul
is Foo
, which does not contain the Model
substring.
//ul[li[contains(.,'Model')]]
selects theone
andtwo
ul
elements.
Note: Both ul
elements are selected because contains()
is applied to each li
individually. (Thus, the tricky multiple-element-to-string conversion rule is avoided.) Both ul
elements do have an li
child whose string value contains the Model
substring -- position of the li
element no longer matters.
See also
- Testing text() nodes vs string values in XPath
I already gave my +1 to Jeff Yates' solution.
Here is a quick explanation why your approach does not work. This:
//ul[@class='featureList' and contains(li, 'Model')]
encounters a limitation of the contains()
function (or any other string function in XPath, for that matter).
The first argument is supposed to be a string. If you feed it a node list (giving it "li
" does that), a conversion to string must take place. But this conversion is done for the first node in the list only.
In your case the first node in the list is <li><b>Type:</b> Clip Fan</li>
(converted to a string: "Type: Clip Fan
") which means that this:
//ul[@class='featureList' and contains(li, 'Type')]
would actually select a node!
You are only looking at the first li
child in the query you have instead of looking for any li
child element that may contain the text, 'Model'
. What you need is a query like the following:
//ul[@class='featureList' and ./li[contains(.,'Model')]]
This query will give you the elements that have a class
of featureList
with one or more li
children that contain the text, 'Model'
.