Capture AJAX response with selenium and python

I once intercepted some ajax calls injecting javascript to the page using selenium. The bad side of the history is that selenium could sometimes be, let's say "fragile". So for no reason I got selenium exceptions while doing this injection.

Anyway, my idea was intercept the XHR calls, and set its response to a new dom element created by me that I could manipulate from selenium. In the condition for the interception you can even use the url that made the request in order to just intercept the one that you actually want (self._url)

btw, I got the idea from intercept all ajax calls?

Maybe this helps.

browser.execute_script("""
(function(XHR) {
  "use strict";

  var element = document.createElement('div');
  element.id = "interceptedResponse";
  element.appendChild(document.createTextNode(""));
  document.body.appendChild(element);

  var open = XHR.prototype.open;
  var send = XHR.prototype.send;

  XHR.prototype.open = function(method, url, async, user, pass) {
    this._url = url; // want to track the url requested
    open.call(this, method, url, async, user, pass);
  };

  XHR.prototype.send = function(data) {
    var self = this;
    var oldOnReadyStateChange;
    var url = this._url;

    function onReadyStateChange() {
      if(self.status === 200 && self.readyState == 4 /* complete */) {
        document.getElementById("interceptedResponse").innerHTML +=
          '{"data":' + self.responseText + '}*****';
      }
      if(oldOnReadyStateChange) {
        oldOnReadyStateChange();
      }
    }

    if(this.addEventListener) {
      this.addEventListener("readystatechange", onReadyStateChange,
        false);
    } else {
      oldOnReadyStateChange = this.onreadystatechange;
      this.onreadystatechange = onReadyStateChange;
    }
    send.call(this, data);
  }
})(XMLHttpRequest);
""")

I've come up to this page when trying to catch XHR content based on AJAX requests. And I eventually found this package

from seleniumwire import webdriver  # Import from seleniumwire
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# Go to the Google home page
driver.get('https://www.google.com')

# Access requests via the `requests` attribute
for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type']
        )

this package allow to get the content response from any request, such as json :

https://www.google.com/ 200 text/html; charset=UTF-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.png 200 image/png
https://consent.google.com/status?continue=https://www.google.com&pc=s&timestamp=1531511954&gl=GB 204 text/html; charset=utf-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png 200 image/png
https://ssl.gstatic.com/gb/images/i2_2ec824b0.png 200 image/png
https://www.google.com/gen_204?s=webaft&t=aft&atyp=csi&ei=kgRJW7DBONKTlwTK77wQ&rt=wsrt.366,aft.58,prt.58 204 text/html; charset=UTF-8
..

I was unable to capture AJAX response with selenium but here is what works, although without selenium:

1- Find out the XML request by monitoring the network analyzing tools in your browser

2= Once you've identified the request, regenerate it using Python's requests or urllib2 modules. I personally recommend requests because of its additional features, most important to me was requests.Session.

You can find plenty of help and relevant posts regarding these two steps.

Hope it will help someone someday.

Capture AJAX response with selenium and python

Tags:

Python

Web Scraping

Selenium

Related

Recent Posts