In Python, how do I use urllib to see if a website is 404 or 200?
The getcode() method (Added in python2.6) returns the HTTP status code that was sent with the response, or None if the URL is no HTTP URL.
>>> a=urllib.urlopen('http://www.google.com/asdfsf')
>>> a.getcode()
404
>>> a=urllib.urlopen('http://www.google.com/')
>>> a.getcode()
200
You can use urllib2 as well:
import urllib2
req = urllib2.Request('http://www.python.org/fish.html')
try:
resp = urllib2.urlopen(req)
except urllib2.HTTPError as e:
if e.code == 404:
# do something...
else:
# ...
except urllib2.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
else:
# 200
body = resp.read()
Note that HTTPError
is a subclass of URLError
which stores the HTTP status code.
For Python 3:
import urllib.request, urllib.error
url = 'http://www.google.com/asdfsf'
try:
conn = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
# Return code error (e.g. 404, 501, ...)
# ...
print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
print('URLError: {}'.format(e.reason))
else:
# 200
# ...
print('good')