Python - Request being blocked by Cloudflare
You might want to try this:
import cloudscraper
scraper = cloudscraper.create_scraper() # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper() # CloudScraper inherits from requests.Session
print scraper.get("http://somesite.com").text # => "<!DOCTYPE html><html><head>..."
It does not require Node.js dependency. All credits go to this pypi page
This is due to the fact that the page uses Cloudflare's anti-bot page (or IUAM).
Bypassing this check is quite difficult to solve on your own, since Cloudflare changes their techniques periodically. Currently, they check if the client supports JavaScript, which can be spoofed.
I would recommend using the cfscrape
module for bypassing this.
To install it, use pip install cfscrape
. You'll also need to install Node.js.
You can pass a requests session into create_scraper()
like so:
session = requests.Session()
session.headers = ...
scraper = cfscrape.create_scraper(sess=session)
I had the same problem because they implemented cloudfare in the api, I solved it this way
import cloudscraper
import json
scraper = cloudscraper.create_scraper()
r = scraper.get("MY API").text
y = json.loads(r)
print (y)
curl
and hx
avoid this problem. But how?
I found, they work by default with HTTP/2. But requests
library used only HTTP/1.1.
So, for tests I installed httpx
with h2
python library to support HTTP/2 requests) and it works if I do: httpx --http2 'https://some.url'
.
So, the solution is to use a library that supports http2. For example httpx
with h2
It's not a complete solution, since it won't help to solve Cloudflare's anti-bot ("I'm Under Attack Mode", or IUAM) challenge