Download file from web in Python 3
I hope I understood the question right, which is: how to download a file from a server when the URL is stored in a string type?
I download files and save it locally using the below code:
import requests
url = 'https://www.python.org/static/img/python-logo.png'
fileName = 'D:\Python\dwnldPythonLogo.png'
req = requests.get(url)
file = open(fileName, 'wb')
for chunk in req.iter_content(100000):
file.write(chunk)
file.close()
I use requests
package whenever I want something related to HTTP requests because its API is very easy to start with:
first, install requests
$ pip install requests
then the code:
from requests import get # to make GET request
def download(url, file_name):
# open in binary mode
with open(file_name, "wb") as file:
# get request
response = get(url)
# write to file
file.write(response.content)
If you want to obtain the contents of a web page into a variable, just read
the response of urllib.request.urlopen
:
import urllib.request
...
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read() # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary
The easiest way to download and save a file is to use the urllib.request.urlretrieve
function:
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, file_name)
import urllib.request
...
# Download the file from `url`, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
file_name, headers = urllib.request.urlretrieve(url)
But keep in mind that urlretrieve
is considered legacy and might become deprecated (not sure why, though).
So the most correct way to do this would be to use the urllib.request.urlopen
function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj
.
import urllib.request
import shutil
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
If this seems too complicated, you may want to go simpler and store the whole download in a bytes
object and then write it to a file. But this works well only for small files.
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
data = response.read() # a `bytes` object
out_file.write(data)
It is possible to extract .gz
(and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.
import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at `url`
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_header = uncompressed.read(64) # a `bytes` object
# Or do anything shown above using `uncompressed` instead of `response`.