Programmatically capturing AJAX traffic with headless Chrome
I finally found how to do what I wanted. It can be done with chrome-remote-interface
(CRI), and node.js
. I'm attaching the minimal code required.
const CDP = require('chrome-remote-interface');
(async function () {
// you need to have a Chrome open with remote debugging enabled
// ie. chrome --remote-debugging-port=9222
const protocol = await CDP({port: 9222});
const {Page, Network} = protocol;
await Page.enable();
await Network.enable(); // need this to call Network.getResponseBody below
Page.navigate({url: 'http://localhost/'}); // your URL
const onDataReceived = async (e) => {
try {
let response = await Network.getResponseBody({requestId: e.requestId})
if (typeof response.body === 'string') {
console.log(response.body);
}
} catch (ex) {
console.log(ex.message)
}
}
protocol.on('Network.dataReceived', onDataReceived)
})();
Update
As @Alejandro pointed out in the comment, resourceType
is a function and the return value is lowercased
page.on('request', request => {
if (request.resourceType() === 'xhr')
// do something
});
Original answer
Puppeteer's API makes this really easy:
page.on('request', request => {
if (request.resourceType === 'XHR')
// do something
});
You can also intercept requests with setRequestInterception
, but it's not needed in this example if you're not going to modify the requests.
There's an example of intercepting image requests that you can adapt.
resourceType
s are defined here.
Puppeteer's listeners could help you capture xhr response via response
and request
event.
You should check wether request.resourceType()
is xhr
or fetch
first.
listener = page.on('response', response => {
const isXhr = ['xhr','fetch'].includes(response.request().resourceType())
if (isXhr){
console.log(response.url());
response.text().then(console.log)
}
})