Get file size using python-requests, while only getting the header
Send a HEAD request:
>>> import requests
>>> response = requests.head('http://example.com')
>>> response.headers
{'connection': 'close',
'content-encoding': 'gzip',
'content-length': '606',
'content-type': 'text/html; charset=UTF-8',
'date': 'Fri, 11 Jan 2013 02:32:34 GMT',
'last-modified': 'Fri, 04 Jan 2013 01:17:22 GMT',
'server': 'Apache/2.2.3 (CentOS)',
'vary': 'Accept-Encoding'}
A HEAD request is like a GET request that only downloads the headers. Note that it's up to the server to actually honor your HEAD request. Some servers will only respond to GET requests, so you'll have to send a GET request and just close the connection instead of downloading the body. Other times, the server just never specifies the total size of the file.
use requests.get(url, stream=True).headers['Content-length']
stream=True
means when function returns, only the response header is downloaded, response body is not.
Both requests.get
and request.head
can get you headers but there's an advantage of using get
get
is more flexible, if you want to download the response body after inspecting the length, you can start by simply access thecontent
property or using aniterator
which will download the content in chunks- "HEAD request SHOULD be identical to the information sent in response to a GET request." but its not always the case.
here is an example of getting the length of a MIT open course video
MitOpenCourseUrl = "http://www.archive.org/download/MIT6.006F11/MIT6_006F11_lec01_300k.mp4"
resHead = requests.head(MitOpenCourseUrl)
resGet = requests.get(MitOpenCourseUrl,stream=True)
resHead.headers['Content-length'] # output 169
resGet.headers['Content-length'] # output 121291539