Scrapy Shell and Scrapy Splash
just wrap the URL you want to shell to in splash HTTP API.
So you would want something like:
scrapy shell 'http://localhost:8050/render.html?url=http://example.com/page-with-javascript.html&timeout=10&wait=0.5'
where:
localhost:port
is where your splash service is runningurl
is URL you want to crawl and don't forget to urlquote it!render.html
is one of the possible HTTP API endpoints, returns redered HTML page in this casetimeout
time in seconds for timeoutwait
time in seconds to wait for JavaScript to execute before reading/saving the HTML.
You can run scrapy shell
without arguments inside a configured Scrapy project, then create req = scrapy_splash.SplashRequest(url, ...)
and call fetch(req)
.