Python - save requests or BeautifulSoup object locally

Since name.content is just HTML, you can just dump this to a file and read it back later.

Usually the bottleneck is not the parsing, but instead the network latency of making requests.

from bs4 import BeautifulSoup
import requests

url = 'https://google.com'
name = requests.get(url)

with open("/tmp/A.html", "w") as f:
  f.write(name.content)


# read it back in
with open("/tmp/A.html") as f:
  soup = BeautifulSoup(f)
  # do something with soup

Here is some anecdotal evidence for the fact that bottleneck is in the network.

from bs4 import BeautifulSoup
import requests
import time

url = 'https://google.com'

t1 = time.clock();
name = requests.get(url)
t2 = time.clock();
soup = BeautifulSoup(name.content)
t3 = time.clock();

print t2 - t1, t3 - t2

Output, from running on Thinkpad X1 Carbon, with a fast campus network.

0.11 0.02

Python - save requests or BeautifulSoup object locally

Tags:

Python

File

Beautifulsoup

Scrape

Related

Recent Posts