How to use xpath in chrome headless+puppeteer evaluate()?
$x()
is not a standard JavaScript method to select element by XPath. $x()
it's only a helper in chrome devtools. They claim this in the documentation:
Note: This API is only available from within the console itself. You cannot access the Command Line API from scripts on the page.
And page.evaluate()
is treated here as a "scripts on the page".
You have two options:
- Use
document.evaluate
Here is a example of selecting element (featured article) inside page.evaluate()
:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' });
const text = await page.evaluate(() => {
// $x() is not a JS standard -
// this is only sugar syntax in chrome devtools
// use document.evaluate()
const featureArticle = document
.evaluate(
'//*[@id="mp-tfa"]',
document,
null,
XPathResult.FIRST_ORDERED_NODE_TYPE,
null
)
.singleNodeValue;
return featureArticle.textContent;
});
console.log(text);
await browser.close();
})();
- Select element by Puppeteer
page.$x()
and pass it topage.evaluate()
This example achieves the same results as in the 1. example:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' });
// await page.$x() returns array of ElementHandle
// we are only interested in the first element
const featureArticle = (await page.$x('//*[@id="mp-tfa"]'))[0];
// the same as:
// const featureArticle = await page.$('#mp-tfa');
const text = await page.evaluate(el => {
// do what you want with featureArticle in page.evaluate
return el.textContent;
}, featureArticle);
console.log(text);
await browser.close();
})();
Here is a related question how to inject $x()
helper function to your scripts.