What does enctype='multipart/form-data' mean?
when should we use it?
Quentin's answer is right: use multipart/form-data
if the form contains a file upload, and application/x-www-form-urlencoded
otherwise, which is the default if you omit enctype
.
I'm going to:
- add some more HTML5 references
- explain why he is right with a form submit example
HTML5 references
There are three possibilities for enctype
:
application/x-www-form-urlencoded
multipart/form-data
(spec points to RFC7578)text/plain
. This is "not reliably interpretable by computer", so it should never be used in production, and we will not look further into it.
How to generate the examples
Once you see an example of each method, it becomes obvious how they work, and when you should use each one.
You can produce examples using:
nc -l
or an ECHO server: HTTP test server accepting GET/POST requests- a user agent like a browser or cURL
Save the form to a minimal .html
file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>upload</title>
</head>
<body>
<form action="http://localhost:8000" method="post" enctype="multipart/form-data">
<p><input type="text" name="text1" value="text default">
<p><input type="text" name="text2" value="aωb">
<p><input type="file" name="file1">
<p><input type="file" name="file2">
<p><input type="file" name="file3">
<p><button type="submit">Submit</button>
</form>
</body>
</html>
We set the default text value to aωb
, which means aωb
because ω
is U+03C9
, which are the bytes 61 CF 89 62
in UTF-8.
Create files to upload:
echo 'Content of a.txt.' > a.txt
echo '<!DOCTYPE html><title>Content of a.html.</title>' > a.html
# Binary file containing 4 bytes: 'a', 1, 2 and 'b'.
printf 'a\xCF\x89b' > binary
Run our little echo server:
while true; do printf '' | nc -l localhost 8000; done
Open the HTML on your browser, select the files and click on submit and check the terminal.
nc
prints the request received.
Tested on: Ubuntu 14.04.3, nc
BSD 1.105, Firefox 40.
multipart/form-data
Firefox sent:
POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150
Content-Length: 834
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text1"
text default
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text2"
aωb
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file1"; filename="a.txt"
Content-Type: text/plain
Content of a.txt.
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file2"; filename="a.html"
Content-Type: text/html
<!DOCTYPE html><title>Content of a.html.</title>
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file3"; filename="binary"
Content-Type: application/octet-stream
aωb
-----------------------------735323031399963166993862150--
For the binary file and text field, the bytes 61 CF 89 62
(aωb
in UTF-8) are sent literally. You could verify that with nc -l localhost 8000 | hd
, which says that the bytes:
61 CF 89 62
were sent (61
== 'a' and 62
== 'b').
Therefore it is clear that:
Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150
sets the content type tomultipart/form-data
and says that the fields are separated by the givenboundary
string.But note that the:
boundary=---------------------------735323031399963166993862150
has two less dashes
--
than the actual barrier-----------------------------735323031399963166993862150
This is because the standard requires the boundary to start with two dashes
--
. The other dashes appear to be just how Firefox chose to implement the arbitrary boundary. RFC 7578 clearly mentions that those two leading dashes--
are required:4.1. "Boundary" Parameter of multipart/form-data
As with other multipart types, the parts are delimited with a boundary delimiter, constructed using CRLF, "--", and the value of the "boundary" parameter.
every field gets some sub headers before its data:
Content-Disposition: form-data;
, the fieldname
, thefilename
, followed by the data.The server reads the data until the next boundary string. The browser must choose a boundary that will not appear in any of the fields, so this is why the boundary may vary between requests.
Because we have the unique boundary, no encoding of the data is necessary: binary data is sent as is.
TODO: what is the optimal boundary size (
log(N)
I bet), and name / running time of the algorithm that finds it? Asked at: https://cs.stackexchange.com/questions/39687/find-the-shortest-sequence-that-is-not-a-sub-sequence-of-a-set-of-sequencesContent-Type
is automatically determined by the browser.How it is determined exactly was asked at: How is mime type of an uploaded file determined by browser?
application/x-www-form-urlencoded
Now change the enctype
to application/x-www-form-urlencoded
, reload the browser, and resubmit.
Firefox sent:
POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: application/x-www-form-urlencoded
Content-Length: 51
text1=text+default&text2=a%CF%89b&file1=a.txt&file2=a.html&file3=binary
Clearly the file data was not sent, only the basenames. So this cannot be used for files.
As for the text field, we see that usual printable characters like a
and b
were sent in one byte, while non-printable ones like 0xCF
and 0x89
took up 3 bytes each: %CF%89
!
Comparison
File uploads often contain lots of non-printable characters (e.g. images), while text forms almost never do.
From the examples we have seen that:
multipart/form-data
: adds a few bytes of boundary overhead to the message, and must spend some time calculating it, but sends each byte in one byte.application/x-www-form-urlencoded
: has a single byte boundary per field (&
), but adds a linear overhead factor of 3x for every non-printable character.
Therefore, even if we could send files with application/x-www-form-urlencoded
, we wouldn't want to, because it is so inefficient.
But for printable characters found in text fields, it does not matter and generates less overhead, so we just use it.
When you make a POST request, you have to encode the data that forms the body of the request in some way.
HTML forms provide three methods of encoding.
application/x-www-form-urlencoded
(the default)multipart/form-data
text/plain
Work was being done on adding application/json
, but that has been abandoned.
(Other encodings are possible with HTTP requests generated using other means than an HTML form submission. JSON is a common format for use with web services and some still use SOAP.)
The specifics of the formats don't matter to most developers. The important points are:
- Never use
text/plain
.
When you are writing client-side code:
- use
multipart/form-data
when your form includes any<input type="file">
elements - otherwise you can use
multipart/form-data
orapplication/x-www-form-urlencoded
butapplication/x-www-form-urlencoded
will be more efficient
When you are writing server-side code:
- Use a prewritten form handling library
Most (such as Perl's CGI->param
or the one exposed by PHP's $_POST
superglobal) will take care of the differences for you. Don't bother trying to parse the raw input received by the server.
Sometimes you will find a library that can't handle both formats. Node.js's most popular library for handling form data is body-parser which cannot handle multipart requests (but has documentation that recommends some alternatives which can).
If you are writing (or debugging) a library for parsing or generating the raw data, then you need to start worrying about the format. You might also want to know about it for interest's sake.
application/x-www-form-urlencoded
is more or less the same as a query string on the end of the URL.
multipart/form-data
is significantly more complicated but it allows entire files to be included in the data. An example of the result can be found in the HTML 4 specification.
text/plain
is introduced by HTML 5 and is useful only for debugging — from the spec: They are not reliably interpretable by computer — and I'd argue that the others combined with tools (like the Network Panel in the developer tools of most browsers) are better for that).
enctype='multipart/form-data
is an encoding type that allows files to be sent through a POST. Quite simply, without this encoding the files cannot be sent through POST.
If you want to allow a user to upload a file via a form, you must use this enctype.