python: check if url to jpg exists
Looks like http://www.fakedomain.com/fakeImage.jpg
automatically redirected to http://www.fakedomain.com/index.html
without any error.
Redirecting for 301 and 302 responses are automatically done without giving any response back to user.
Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.
Here is the one sample from Dive Into Python:
http://diveintopython3.ep.io/http-web-services.html#redirects
>>> import httplib
>>>
>>> def exists(site, path):
... conn = httplib.HTTPConnection(site)
... conn.request('HEAD', path)
... response = conn.getresponse()
... conn.close()
... return response.status == 200
...
>>> exists('http://www.fakedomain.com', '/fakeImage.jpg')
False
If the status is anything other than a 200, the resource doesn't exist at the URL. This doesn't mean that it's gone altogether. If the server returns a 301 or 302, this means that the resource still exists, but at a different URL. To alter the function to handle this case, the status check line just needs to be changed to return response.status in (200, 301, 302)
.
thanks for all the responses everyone, ended up using the following:
try:
f = urllib2.urlopen(urllib2.Request(url))
deadLinkFound = False
except:
deadLinkFound = True
The code below is equivalent to tikiboy's answer, but using a high-level and easy-to-use requests library.
import requests
def exists(path):
r = requests.head(path)
return r.status_code == requests.codes.ok
print exists('http://www.fakedomain.com/fakeImage.jpg')
The requests.codes.ok
equals 200
, so you can substitute the exact status code if you wish.
requests.head
may throw an exception if server doesn't respond, so you might want to add a try-except construct.
Also if you want to include codes 301
and 302
, consider code 303
too, especially if you dereference URIs that denote resources in Linked Data. A URI may represent a person, but you can't download a person, so the server will redirect you to a page that describes this person using 303 redirect.