What is the mask in a WebSocket frame?

Websockets are defined in RFC6455, which states in Section 5.3:

The unpredictability of the masking key is essential to prevent authors of malicious applications from selecting the bytes that appear on the wire.

In a blog entry about Websockets I found the following explanation:

masking-key (32 bits): if the mask bit is set (and trust me, it is if you write for the server side) you can read for unsigned bytes here which are used to xor the payload with. It's used to ensure that shitty proxies cannot be abused by attackers from the client side.

But the most clearly answer I found in an mailing list archive. There John Tamplin states:

Basically, WebSockets is unique in that you need to protect the network infrastructure, even if you have hostile code running in the client, full hostile control of the server, and the only piece you can trust is the client browser. By having the browser generate a random mask for each frame, the hostile client code cannot choose the byte patterns that appear on the wire and use that to attack vulnerable network infrastructure.

As kmkaplan stated, the attack vector is described in Section 10.3 of the RFC.
This is a measure to prevent proxy cache poisoning attacks1. What it does, is creating some randomness. You have to XOR the payload with the random masking-key.

By the way: It isn't just recommended. It is obligatory.

1: See Huang, Lin-Shung, et al. "Talking to yourself for fun and profit." Proceedings of W2SP (2011)


From this article:

Masking of WebSocket traffic from client to server is required because of the unlikely chance that malicious code could cause some broken proxies to do the wrong thing and use this as an attack of some kind. Nobody has proved that this could actually happen, but since the fact that it could happen was reason enough for browser vendors to get twitchy, masking was added to remove the possibility of it being used as an attack.

So assuming attackers were able to compromise both the JavaScript code executed in a browser as well as the the backend server, masking is designed to prevent the the sequence of bytes sent between these two endpoints being crafted in a special way that could disrupt any broken proxies between these two endpoints (by broken this means proxies that might attempt to interpret a websocket stream as HTTP when in fact they shouldn't).

The browser (and not the JavaScript code in the browser) has the final say on the randomly generated mask used to send the message which is why it's impossible for the attackers to know what the final stream of bytes the proxy might see will be.

Note that the mask is redundant if your WebSocket stream is encrypted (as it should be). Article from the author of Python's Flask:

Why is there masking at all? Because apparently there is enough broken infrastructure out there that lets the upgrade header go through and then handles the rest of the connection as a second HTTP request which it then stuffs into the cache. I have no words for this. In any case, the defense against that is basically a strong 32bit random number as masking key. Or you know… use TLS and don't use shitty proxies.