How can I check if a URL exists with Django’s validators?
Edit: Please note, this is no longer valid for any version of Django above 1.5
I assume you want to check if the file actually exists, not if there is just an object (which is just a simple if statement)
First, I will recommend always looking through Django's source code because you will find some great code that you could use :)
I assume you want to do this within a template. There is no built-in template tag to validate a URL but you could essentially use that URLValidator
class within a template tag to test it. Simply:
from django.core.validators import URLValidator
from django.core.exceptions import ValidationError
validate = URLValidator(verify_exists=True)
try:
validate('http://www.somelink.com/to/my.pdf')
except ValidationError, e:
print e
The URLValidator
class will spit out the ValidationError
when it can't open the link. It uses urllib2
to actually open the request so it's not just using basic regex checking (But it also does that.)
You can plop this into a custom template tag, which you will find out how to create in the django docs and off you go.
Hope that is a start for you.
Problem
from django.core.validators import URLValidator
says that www.google.ro
is invalid. Which is wrong in my point of view. Or at least not enough.
How to solve it?
The clue Is to look at the source code for models.URLField
, you will see that it uses forms.FormField
as a validator. Which does more than URLValidator
from above
Solution
If I want to validate a url
like http://www.google.com
or like www.google.ro
, I would do the following:
from django.forms import URLField
def validate_url(url):
url_form_field = URLField()
try:
url = url_form_field.clean(url)
except ValidationError:
return False
return True
I found this useful. Maybe it helps someone else.
Anything based on the verify_exists
parameter to django.core.validators.URLValidator
will stop working with Django 1.5 — the documentation helpfully says nothing about this, but the source code reveals that using that mechanism in 1.4 (the latest stable version) leads to a DeprecationWarning
(you'll see it has been removed completely in the development version):
if self.verify_exists:
import warnings
warnings.warn(
"The URLField verify_exists argument has intractable security "
"and performance issues. Accordingly, it has been deprecated.",
DeprecationWarning
)
There are also some odd quirks with this method related to the fact that it uses a HEAD
request to check URLs — bandwidth-efficient, sure, but some sites (like Amazon) respond with an error (to HEAD
, where the equivalent GET
would have been fine), and this leads to false negative results from the validator.
I would also (a lot has changed in two years) recommend against doing anything with urllib2
in a template — this is completely the wrong part of the request/response cycle to be triggering potentially long-running operations: consider what happens if the URL does exist, but a DNS problem causes urllib2
to take 10 seconds to work that out. BAM! Instant 10 extra seconds on your page load.
I would say the current best practice for making possibly-long-running tasks like this asynchronous (and thus not blocking page load) is using django-celery
; there's a basic tutorial which covers using pycurl
to check a website, or you could look into how Simon Willison implemented celery tasks (slides 32-41) for a similar purpose on Lanyrd.