Network capturing with Selenium/PhantomJS
I am using a proxy for this
from selenium import webdriver
from browsermobproxy import Server
server = Server(environment.b_mob_proxy_path)
server.start()
proxy = server.create_proxy()
service_args = ["--proxy-server=%s" % proxy.proxy]
driver = webdriver.PhantomJS(service_args=service_args)
proxy.new_har()
driver.get('url_to_open')
print proxy.har # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]
the 'har' (http archive format) has a lot of other information about the requests and responses, it's very useful to me
installing on Linux:
pip install browsermob-proxy
I use a solution without a proxy server for this. I modified the selenium source code according to the link bellow in order to add the executePhantomJS function.
https://github.com/SeleniumHQ/selenium/pull/2331/files
Then I execute the following script after getting the phantomJS driver:
from selenium.webdriver import PhantomJS
driver = PhantomJS()
script = """
var page = this;
page.onResourceRequested = function (req) {
console.log('requested: ' + JSON.stringify(req, undefined, 4));
};
page.onResourceReceived = function (res) {
console.log('received: ' + JSON.stringify(res, undefined, 4));
};
"""
driver.execute_phantomjs(script)
driver.get("http://ariya.github.com/js/random/")
driver.quit()
Then all the requests are logged in the console (usually the ghostdriver.log file)
If anyone here is looking for a pure Selenium/Python solution, the following snippet might help. It uses Chrome to log all requests and, as an example, prints all json requests with their corresponding response.
from time import sleep
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
# make chrome log requests
capabilities = DesiredCapabilities.CHROME
capabilities["loggingPrefs"] = {"performance": "ALL"} # chromedriver < ~75
# capabilities["goog:loggingPrefs"] = {"performance": "ALL"} # chromedriver 75+
driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="./chromedriver"
)
# fetch a site that does xhr requests
driver.get("https://sitewithajaxorsomething.com")
sleep(5) # wait for the requests to take place
# extract requests from logs
logs_raw = driver.get_log("performance")
logs = [json.loads(lr["message"])["message"] for lr in logs_raw]
def log_filter(log_):
return (
# is an actual response
log_["method"] == "Network.responseReceived"
# and json
and "json" in log_["params"]["response"]["mimeType"]
)
for log in filter(log_filter, logs):
request_id = log["params"]["requestId"]
resp_url = log["params"]["response"]["url"]
print(f"Caught {resp_url}")
print(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}))