How to get text of an element in Selenium WebDriver, without including child element text?
Here's a general solution:
def get_text_excluding_children(driver, element):
return driver.execute_script("""
return jQuery(arguments[0]).contents().filter(function() {
return this.nodeType == Node.TEXT_NODE;
}).text();
""", element)
The element passed to the function can be something obtained from the find_element...()
methods (i.e. it can be a WebElement
object).
Or if you don't have jQuery or don't want to use it you can replace the body of the function above above with this:
return self.driver.execute_script("""
var parent = arguments[0];
var child = parent.firstChild;
var ret = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
ret += child.textContent;
child = child.nextSibling;
}
return ret;
""", element)
I'm actually using this code in a test suite.
In the HTML which you have shared:
<div id="a">This is some
<div id="b">text</div>
</div>
The text This is some
is within a text node. To depict the text node in a structured way:
<div id="a">
This is some
<div id="b">text</div>
</div>
This Usecase
To extract and print the text This is some
from the text node using Selenium's python client you have 2 ways as follows:
Using
splitlines()
: You can identify the parent element i.e.<div id="a">
, extract theinnerHTML
and then usesplitlines()
as follows:using xpath:
print(driver.find_element_by_xpath("//div[@id='a']").get_attribute("innerHTML").splitlines()[0])
using xpath:
print(driver.find_element_by_css_selector("div#a").get_attribute("innerHTML").splitlines()[0])
Using
execute_script()
: You can also use theexecute_script()
method which can synchronously execute JavaScript in the current window/frame as follows:using xpath and firstChild:
parent_element = driver.find_element_by_xpath("//div[@id='a']") print(driver.execute_script('return arguments[0].firstChild.textContent;', parent_element).strip())
using xpath and childNodes[n]:
parent_element = driver.find_element_by_xpath("//div[@id='a']") print(driver.execute_script('return arguments[0].childNodes[1].textContent;', parent_element).strip())
def get_true_text(tag):
children = tag.find_elements_by_xpath('*')
original_text = tag.text
for child in children:
original_text = original_text.replace(child.text, '', 1)
return original_text