Python requests arguments/dealing with api pagination
Read last_page
and make a get request for each page in the range:
import requests
r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs").json()
num_pages = r_sanfran['last_page']
for page in range(2, num_pages + 1):
r_sanfran = requests.get("https://api.angel.co/1/tags/1664/jobs", params={'page': page}).json()
print r_sanfran['page']
# TODO: extract the data
I came across a scenario where the API didn't return pages but rather a min/max value. I created this, and I think it will work for both situations. This will automatically increase the increment until it reaches the end, and then it will stop the while loop.
max_version = [1]
while len(max_version) > 0:
r = requests.get(url, headers=headers, params={"page": max_version[0]}).json()
next_page = r['page']
if next_page is not None:
max_version[0] = next_page
Process data...
else:
max_version.clear() # Stop the while loop
Further improving on @dh762 's answer, you can use while and have all the requests done in it without having 2 yield statements.
Eg:
import requests
session = requests.Session()
def get_jobs():
url = "https://api.angel.co/1/tags/1664/jobs"
currP = 1
totalP = 2 #assuming there's gonna be 2nd page, it'll get overwritten if not.
while (currP <= totalP):
page = session.get(url, params={'page': currP}).json()
totalP = page['last_page']
currP += 1
yield page
for page in get_jobs():
# TODO: process the page
Improving on @alecxe's answer: if you use a Python Generator and a requests HTTP session you can improve the performance and resource usage if you are querying lots of pages or very large pages.
import requests
session = requests.Session()
def get_jobs():
url = "https://api.angel.co/1/tags/1664/jobs"
first_page = session.get(url).json()
yield first_page
num_pages = first_page['last_page']
for page in range(2, num_pages + 1):
next_page = session.get(url, params={'page': page}).json()
yield next_page
for page in get_jobs():
# TODO: process the page