python requests http response 500 (site can be reached in browser)
One thing that is different with the browser request is the User-Agent; however you can alter it using requests like this:
url = 'https://www.zomato.com/praha/caf%C3%A9-a-restaurant-z%C3%A1ti%C5%A1%C3%AD-kunratice-praha-4/daily-menu'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.90 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.status_code) #should be 200
Edit
Some web applications will also check the Origin
and/or the Referer
headers (for example for AJAX requests); you can set these in a similar fashion to User-Agent
.
headers = {
'Origin': 'http://example.com',
'Referer': 'http://example.com/some_page'
}
Remember, you are setting these headers to basically bypass checks so please be a good netizen and don't abuse people's resources.
The User-Agent, and also other header elements, could be causing your problem.
When I came accross this error I watched a regular request made by a browser using Wireshark, and it turned out there were things other than just the User-Agent in the header which the server expected to be there.
After emulating the header sent by the browser in python requests, the server stopped throwing errors.
But Wait! There's More!
The above answers did help me on the path to resolution, but I had to find still more things to add to my headers so that certain sites would let me in using python requests. Learning how to use Wireshark (suggested above) was a good new skill for me, but I found an easier way.
If you go to your developer view (right-click then click Inspect in Chrome), then go to the Network tab, and then select one of the Names at left and then look under Headers for Requests Headers and expand, you'll get a complete list of what your system is sending to the server. I started adding elements that I thought were most likely needed one at a time and testing until my errors went away. Then I reduced that set to the smallest possible set that worked. In my case, with my headers having only User-Agent to deal with other code issues, I only needed to add the Accept-Language key to deal with a few other sites. See picture below as a guide to the text above.
I hope this process helps others to find ways to eliminate undesirable python requests return codes where possible.