How can a web server know when an HTTP request is fully received?
HTTP/1.1 is a text-based protocol, with binary POST data added in a somewhat hacky way. When writing a "receive loop" for HTTP, you cannot completely separate the data receiving part from the HTTP parsing part. This is because in HTTP, certain characters have special meaning. In particular, the CRLF
(0x0D 0x0A
) token is used to separate headers, but also to end the request using two CRLF
tokens one after the other.
So to stop receiving, you need to keep receiving data until one of the following happens:
- Timeout – follow by sending a timeout response
- Two
CRLF
in the request – follow by parsing the request, then respond as needed (parsed correctly? request makes sense? send data?) - Too much data – certain HTTP exploits aim to exhaust server resources like memory or processes (see e.g. slow loris)
And perhaps other edge cases. Also note that this only applies to requests without a body. For POST requests, you first wait for two CRLF
tokens, then read Content-Length
bytes in addition. And this is even more complicated when the client is using multipart encoding.
A request header is terminated by an empty line (two CRLFs with nothing between them).
So, when the server has received a request header, and then receives an empty line, and if the request was a GET
(which has no payload), it knows the request is complete and can move on to dealing with forming a response. In other cases, it can move on to reading Content-Length worth of payload and act accordingly.
This is a reliable, well-defined property of the syntax.
No Content-Length is required or useful for a GET
: the content is always zero-length. A hypothetical Header-Length is more like what you're asking about, but you'd have to parse the header first in order to find it, so it does not exist and we use this property of the syntax instead. As a result of this, though, you may consider adding an artificial timeout and maximum buffer size, on top of your normal parsing, to protect yourself from the occasional maliciously slow or long request.
The solution is within your link
A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.
There it says:
It must use CRLF line endings, and it must end in \r\n\r\n