Why is referer checking needed for Django to prevent CSRF
First of all, thanks for the interesting question. I did not know about the details of CSRF before and had to look up the answer to your question myself, but I think I know the correct explanation for Django's behavior now.
The Django developers are treating HTTP and HTTPS refers differently because users expect different things from insecure and secure web services. More specifically, if a web page is using transport layer security, users expect to be protected against man-in-the-middle attacks, meaning they trust in the principle that even if someone sat directly between them and the remote server and intercepted every single message, they couldn't make any use of that information. Note that this is not expected of plain HTTP connections.
Now consider the following scenario, quoted from a Django dev's post here :
- user browses to http://example.com/
- a MITM modifies the page that is returned, so that is has a POST form which targets https://example.com/detonate-bomb/ . The MITM has to include a CSRF token, but that's not a problem because he can invent one and send a CSRF cookie to match.
- the POST form is submitted by javascript from the user's browser and so includes the CSRF cookie, a matching CSRF token and the user's session cookie, and so will be accepted.
I did not instantly understand this attack myself, so I'm gonna try to explain the details. Note first that we are looking at a page that displays forms over plain connections but submits data via SSL/TLS. Part of the problem, as I understand it, is that the cookie and hidden form value (aka "the CSRF token") are only compared against each other, not against any value that is stored server-side. This makes it easy for the attacker to supply their victim with a cookie-token-combination that will be accepted by the server - remember, the page displaying the form is not secured, so Set-Cookie
headers and the contents of the form itself can be spoofed. Once the manipulated form is submitted (via injected JS, for example), the server sees a perfectly valid request.
Adding strict Referer
checking is the answer to this exact problem. Checking these headers, only requests originating from https://example.com
will be accepted at another endpoint of https://example.com
. Insecure pages from the same domain will be treated as completely untrusted, and rightly so.
Now to come back to the question why plain HTTP requests are treated differently, we just have to imagine a site that doesn't use encryption at all. In that case, a man in the middle could also spoof the Referer
headers sent with the actual form data, so checking those does not provide any additional security. In other words: There is no protection against CSRF attacks by a man in the middle - but, as I mentioned earlier, users do not expect this kind of security from plain HTTP sites.
Regarding your question about how other web frameworks handle this attack vector, I honestly have to say I don't know.
I'll briefly summarize what I've found since I asked this question.
A user's first request to the website cannot be guaranteed to be https (from the server side). An attacker might use this request to set a specific CSRF cookie. He can then use this to do a cross-site request on behalf of the user from a http domain outside your control. This is what the referrer checking prevents.
A solution that might come to mind is to reset the CSRF token when starting an authenticated session. The attacker could then only fake requests to anonymous services, which has little benefit (he could just do them directly by himself, it's anonymous after all).
The problem with that is that the login form itself is also vulnerable to CSRF, despite needing a password. The attacker, in that case, doesn't take over the user's session, but logs the user in into the attacker's session. The attacker might then be able to see things the user did or entered.
A few rare cases where I think it could be turned off, all assuming HSTS is on and you change the CSRF token on login:
You have a login process that does one of:
- Involve two pages, with the CSRF token changed after the first one, so that it can't be automated (also set
X-Frame-Options
). E.g. required two-factor authentication, or just a second page with a manual okay button. - The login page has a CAPTCHA that protects against fake requests.
- The user won't be tricked into using the wrong account for our particular service (e.g. it's highly personalized in a way that's obvious to the user but the attacker can't detect).
- Involve two pages, with the CSRF token changed after the first one, so that it can't be automated (also set
You know the first request is going to be through https, e.g. the page is only supposed to be accessed through your app or software, rather than a normal browsers.
- Browsers implement a way to see whether a cookie was https-only / was set on a https connection (then attackers couldn't set a usable cookie over http). But that's impossible at this time, and outside our control.
(Making CSRF tokens be session-dependent in a secret way, and be stored server-side, doesn't fix this. The attacker can't overwrite or generate the CSRF token, but he can just ask your server for it by opening a form using the session he chose).