Can Xpath expressions access shadow-root elements?
Most web scrapers, including Scrapy, don't support the Shadow DOM, so you will not be able to access elements in shadow trees at all.
And even if a web scraper did support the Shadow DOM, XPath is not supported at all. Only selectors are supported to some extent, as documented in the CSS Scoping spec.
One way to scrape pages containing shadow DOMs with tools that don't work with shadow DOM API is to recursively iterate over shadow DOM elements and replace them with their HTML code:
// Returns HTML of given shadow DOM.
const getShadowDomHtml = (shadowRoot) => {
let shadowHTML = '';
for (let el of shadowRoot.childNodes) {
shadowHTML += el.nodeValue || el.outerHTML;
}
return shadowHTML;
};
// Recursively replaces shadow DOMs with their HTML.
const replaceShadowDomsWithHtml = (rootElement) => {
for (let el of rootElement.querySelectorAll('*')) {
if (el.shadowRoot) {
replaceShadowDomsWithHtml(el.shadowRoot)
el.innerHTML += getShadowDomHtml(el.shadowRoot);
}
}
};
replaceShadowDomsWithHtml(document.body);
If you are scraping using a full browser (Chrome with Puppeteer, PhantomJS, etc.) then just inject this script to the page. Important is to execute this after the whole page is rendered because it possibly breaks the JS code of shadow DOM components.
Check full article I wrote on this topic: https://kb.apify.com/tips-and-tricks/how-to-scrape-pages-with-shadow-dom