How to use XPath contains() here?

This is a new answer to an old question about a common misconception about contains() in XPath...

Summary: contains() means contains a substring, not contains a node.

Detailed Explanation

This XPath is often misinterpreted:

//ul[contains(li, 'Model')]

Wrong interpretation: Select those ul elements that contain an li element with Model in it.

This is wrong because

  1. contains(x,y) expects x to be a string, and
  2. the XPath rule for converting multiple elements to a string is this:

A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.

Right interpretation: Select those ul elements whose first li child has a string-value that contains a Model substring.

Examples

XML

<r>
  <ul id="one">
    <li>Model A</li>
    <li>Foo</li>
  </ul>
  <ul id="two">
    <li>Foo</li>
    <li>Model A</li>
  </ul>
</r> 

XPaths

  • //ul[contains(li, 'Model')] selects the one ul element.

Note: The two ul element is not selected because the string-value of the first li child of the two ul is Foo, which does not contain the Model substring.

  • //ul[li[contains(.,'Model')]] selects the one and two ul elements.

Note: Both ul elements are selected because contains() is applied to each li individually. (Thus, the tricky multiple-element-to-string conversion rule is avoided.) Both ul elements do have an li child whose string value contains the Model substring -- position of the li element no longer matters.

See also

  • Testing text() nodes vs string values in XPath

I already gave my +1 to Jeff Yates' solution.

Here is a quick explanation why your approach does not work. This:

//ul[@class='featureList' and contains(li, 'Model')]

encounters a limitation of the contains() function (or any other string function in XPath, for that matter).

The first argument is supposed to be a string. If you feed it a node list (giving it "li" does that), a conversion to string must take place. But this conversion is done for the first node in the list only.

In your case the first node in the list is <li><b>Type:</b> Clip Fan</li> (converted to a string: "Type: Clip Fan") which means that this:

//ul[@class='featureList' and contains(li, 'Type')]

would actually select a node!


You are only looking at the first li child in the query you have instead of looking for any li child element that may contain the text, 'Model'. What you need is a query like the following:

//ul[@class='featureList' and ./li[contains(.,'Model')]]

This query will give you the elements that have a class of featureList with one or more li children that contain the text, 'Model'.

Tags:

Xml

Xpath