how to get the original start_url in scrapy (before redirect)

This gave me the original 'referer URL', i.e. which of my start_urls led to the URL corresponding to this request object being scraped:

req = response.request
req_headers = req.__dict__['headers']
referer_url = req_headers['Referer'].decode('utf-8')

You can find what you need in response.request.meta['redirect_urls'].

Quote from docs:

The urls which the request goes through (while being redirected) can be found in the redirect_urls Request.meta key.

Hope that helps.

Tags:

Python

Web Scraping

Redirect

Scrapy

Related