Difference between text() and string()

Most of the time, if you want the content of an element node X, you can refer to it as ".", if it's the context node, or as "X" if it's a child of the context node. For example:

<xsl:if test="X = 'abcd'">...

or

<xsl:value-of select="."/>

In both cases, because the context demands a string, the string() function is applied automatically. (That's a slight simplification, if you're running schema-aware XSLT 2.0 the rules are a little more complicated).

Using "string()" here is unnecessary, because it's done automatically; and using text() is a mistake (one that seems to be increasingly common, encouraged by some bad tutorials on the web). Using ./text() orX/text() in this situation gives you all the text node children of the element. Often the element has one text node child whose string value happens to be the same as the string value of the element, but your code fails if someone adds a comment or processing instruction, because the value is then split into multiple text nodes. It also fails if the element is one (say "title") that allows mixed content: string(title) and title/text() are going to give the same answer until you hit an article with the title

<title>On the wetness of H<sub>2</sub>O</title>

Can someone explain the difference between text() and string() functions.

I. text() isn't a function but a node test.

It is used to select all text-node children of the context node.

So, if the context node is an element named x, then text() selects all text-node children of x.

Other examples:

/a/b/c/text()

selects all text-node children of any c element that is a child of any b element that is a child of the top element a.

II. The string() function

By definition string(exprSelectingASingleNode) returns the string value of the node.

The string value of an element is the concatenation of all of its text-node descendents -- in document order.

Therefore, if in the following XML document:

<a>
  <b>2</b>
  <c>3
    <d>4</d>
  </c>
  5
</a>

string(/a) returns (without the surrounding quotes):

"
  2
  3
    4

  5
"

As we see, the string value reflects three white-space-only text-nodes, which we typically fail to notice and account for.

Some XML parsers have the option of stripping-off white-space-only text nodes. If the above document was parsed with the white-space-only text nodes stripped off, then the same function:

string(/a)

now returns:

"23
    4
  5
"

Tags:

Xpath