Scrapy Shell and Scrapy Splash

just wrap the URL you want to shell to in splash HTTP API.

So you would want something like:

scrapy shell 'http://localhost:8050/render.html?url=http://example.com/page-with-javascript.html&timeout=10&wait=0.5'

where:

  • localhost:port is where your splash service is running
  • url is URL you want to crawl and don't forget to urlquote it!
  • render.html is one of the possible HTTP API endpoints, returns redered HTML page in this case
  • timeout time in seconds for timeout
  • wait time in seconds to wait for JavaScript to execute before reading/saving the HTML.

You can run scrapy shell without arguments inside a configured Scrapy project, then create req = scrapy_splash.SplashRequest(url, ...) and call fetch(req).