Difference between .string and .text BeautifulSoup
.string
on a Tag
type object returns a NavigableString
type object. On the other hand, .text
gets all the child strings and return concatenated using the given separator. Return type of .text is unicode
object.
From the documentation, A NavigableString
is just like a Python Unicode
string, except that it also supports some of the features described in Navigating the tree and Searching the tree.
From the documentation on .string
, we can see that, If the html is like this,
<td>Some Table Data</td>
<td></td>
Then, .string
on the second td will return None
.
But .text
will return and empty string which is a unicode
type object.
For more convenience,
string
- Convenience property of a
tag
to get the single string within this tag. - If the
tag
has a single string child then the return value is that string. - If the
tag
has no children or more than one child then the return value isNone
- If this
tag
has one child tag then the return value is the 'string' attribute of the child tag, recursively.
And text
- Get all the child strings and return concatenated using the given separator.
If the html
is like this:
<td>some text</td>
<td></td>
<td><p>more text</p></td>
<td>even <p>more text</p></td>
.string
on the four td
will return,
some text
None
more text
None
.text
will give result like this,
some text
more text
even more text
If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None:
example:
<td>sometext<p>sometext</p></td>
The above code will return NoneType if: td.string is done because the td contains texts as well as another p tag. But td.text will give : sometextsometext