New denial-of-service vulnerability in TLS protocol, based on shooting down other users' sessions?
To summarize:
- It may work, or not, depending on how the server manages its cache for session parameters.
- The RFC are not consistent.
- It is not a "real" vulnerability.
TLS sessions were initially meant to be an optimization, to avoid client and server doing the full handshake with its "heavy asymmetric crypto" for each connection (the actual cost of such crypto is often overrated but that's not the point). Existing, deployed clients and servers tend to remember sessions for some time, then forget them when it becomes inconvenient to keep remembering (typically, on the server side, when the RAM buffer dedicated to such storage is full, old sessions are evicted). The idea is still that client and server may still do a full handshake, transparently, when needed; the session resumption is opportunistic.
Some deployed systems rely on session resumption to work a bit more reliably; in particular, Web-based applications with client authentication with a smart card: using the smart card implies making a signature with the card, which has a small computational cost (say 1 second) and a high user cost (the human user may have to type a PIN code). However, even in these cases, it is understood that session parameters are stored in RAM only, so if the client browser is closed and then reopened, no session will be resumed and a full handshake will occur.
RFC 5246, in section 7.2.2, contains this paragraph:
Error handling in the TLS Handshake protocol is very simple. When an error is detected, the detecting party sends a message to the other party. Upon transmission or receipt of a fatal alert message, both parties immediately close the connection. Servers and clients MUST forget any session-identifiers, keys, and secrets associated with a failed connection. Thus, any connection terminated with a fatal alert MUST NOT be resumed.
This is a blanket statement meant, informally, to deter some as yet unspecified brute force attacks relying on the attacker "trying out" a lot of session resumptions. There are a few enlightening points that must be made about that prescription:
Historically, sessions had also to be "forgotten" when a connection was not properly closed (with an explicit
close_notify
). However, existing Web servers close connection abruptly, so Web browsers have adapted by keeping sessions "alive" even when they were terminated in a way that TLS-1.0 would frown upon.This notion of forgetting the session upon a fatal alert does not work on the server when session tickets are used, since, by definition, a server with session tickets does not manage his own memory. In that respect, RFC 5077 and RFC 5246 are inconsistent: this "MUST NOT be resumed" cannot be enforced by a server that uses session tickets.
It so happens that most SSL implementations are reluctant to break laws of physics in order to comply to RFC.
Even without sending an alert, a "fatal condition" can always be forced by an attacker: it suffices for the attacker to open a connection by himself, reusing the same session ID as a connection used by the genuine client. The server will try to resume the session (
ServerHello
,ChangeCipherSpec
,Finished
) then expect theChangeCipherSpec
thenFinished
from the client. Since the client is the attacker and the attacker does not know the master key for the session he is purporting to resume, hisFinished
message won't decrypt properly, triggering abad_record_mac
from the server. If section 7.2.2 is to be followed, then the server should then forget the session.It shall be pointed out that the "kill the session by failed
Finished
" attack expressed in the previous paragraph may be easier to pull off than the one you are describing, since it involves only observing a genuine connection (to get the session ID), not modifying that connection.
The third point above is important because it shows that until it has received a properly encrypted-and-MACed Finished
message from the client, the server has no proof that it is really talking to the real client. Arguably, the session is not "resumed" until that point, and therefore should not be "invalidated" for any erroneous condition occurring before that point (be it an explicit alert from the client, or a MAC failure, or anything else). However, the RFC is not clear about that either, so what really happens really depends on how the server manages its cache, at the whims of the server implementor.
The vulnerability is not "real" in that any attacker who is in position to fiddle with client connections can already do a lot more harm by, for instance, responding to the client's ClientHello
message with a synthetic alert message or simply random junk that will convince the client that there is no working SSL/TLS server on the other side -- a much more comprehensive denial-of-service than simply making the server and client spend a couple of milliseconds of CPU for a full handshake.
If you've empirically verified that it functions as proposed* then you should notify the OpenSSL security team.
The impact of this bug would potentially be on the Denial-of-Service spectrum; basically, you can force a client/server pair to perform full negotiations each time, which takes up more server resources (...which is why resumption exists in the first place). And, on the Availability scale, this is probably not huge - it merely reduces performance to the base level expected when resumption isn't used, which is the accepted worst-case scenario**.
I don't see a threat here to Confidentiality or Integrity at this time (although history is littered with examples of minor issue X and minor issue Y combining to make major issue Z).
As described so far, this could be an implementation issue or it could be a protocol issue. As you say, "nothing is said [in the RFCs] about either the full session resumption is really required for such dropping or not." So various SSL implementations might handle this case differently. Since you are looking at OpenSSL already, I'd continue along those lines and, when you have proof-of-concept code, you can test other SSL implementations to see how they handle it.
If it does turn out to be a wider protocol issue, and you're already working with OpenSSL security team, then I'd ask their advice on how to broaden the advisory.
* I recommend that you devise a proof-of-concept*** before escalating. Firstly, you'll ensure that you actually have an issue! Secondly, your issue will get better attention if you have a exploit code to enable them to reproduce the issue. It's not clear what you mean by "Have just checked the actual OpenSSL implementation" so I'm assuming you have just done code review at this time.
** The severity of this bug is rather low; perhaps "degradation of service" would be a better term than "denial of service." For this reason, I would not worry about having publicly disclosed this issue... it's just not impactful enough to call for secrecy or go through the cautious steps of "ethical disclosure". (That's not to minimize the issue; it appears to be a fascinating place where some ambiguity in the standard's language allowed an undesirable behavior to creep in.)
*** @Trueblacker created a proof-of-concept and opened an issue for the OpenSSL team - documenting here since comments are ephemeral.