MySQL - varchar length and performance
This is a very common "exam/interview question". I will answer as good as I can:
In the standard row formats for InnoDB and MyISAM (dynamic/compact) a VARCHAR(50)
and a VARCHAR(255)
will store the string text in the same way- 1 byte for the length and the actual string with between 1 and 4 bytes per character (depending on the encoding and the actual character stored).
In fact, if I remember correctly, I recall someone modifying the data dictionary with an hexadecimal editor in order to change something like a VARCHAR(50)
into a VARCHAR(100)
, so it could be done dynamically (normally, that requires a table reconstruction). And that was possible, because the actual data was not affected by that change.
That is not true with VARCHAR(256)
, because then 2 bytes (at least) for the length are always required.
So, that means that we should always do VARCHAR(255)
, shouldn't we? No. There are several reasons.
While InnoDB may store a varchar in a dynamic way, that is not true for other engines. MyISAM has a fixed row size format, and MEMORY tables are always fixed in size. Should we care about those other engines? Yes, we should, because even if we do not use them directly, MEMORY tables are very commonly used for intermediate results (temporary tables on memory), and as the results are not known beforehand, the table has to be created with the maximum size possible -VARCHAR(255)
if that is our type. If you can think about the wasted space, if we are using MySQL's 'utf8' charset
encoding, MEMORY will reserve 2 bytes for the length + 3 * 255 bytes per row (for values that may only take a few bytes on InnoDB). That is almost 1GB on a 1 million table -only for the VARCHAR. Not only this causes unnecessary memory stress, it may provoke the actions to be performed on disk, potentially slowing it down thousands of times. All of that because of a poor selection of its defined data type (independently of the contents).
It has some consequences for InnoDB, too. Index size is restricted to 3072 bytes and single column indexes, to 767 bytes*. So, it is very likely that you won't be able to index fully a VARCHAR(255)
field (assuming you use utf8 or any other variable length-encoding).
Additionally, the maximum inline row size for InnoDB is half a page (around 8000 bytes), and variable-lenght fields like BLOB or varchar, can be stored off-page if they do not fit on the half-page. That has some consequences in performance (sometimes good, sometimes bad, depending on the usage) that cannot be ignored. This caused some weirdness between the COMPACT and DYNAMIC formats. See, for example: error 1118: row size too large. utf8 innodb
Last but not least, as @ypercube has reminded me, more than 1 byte for the length may be required even if you are using VARCHAR(255)
, because the definition is in characters, while the length stores bytes. For example REPEAT('ñ', 255)
has more than 2^255 bytes in utf8, so it would require more than 1 byte for storing its length:
mysql> SELECT LENGTH(REPEAT('ñ', 255));
+---------------------------+
| LENGTH(REPEAT('ñ', 255)) |
+---------------------------+
| 510 |
+---------------------------+
1 row in set (0.02 sec)
mysql> SELECT CHAR_LENGTH(REPEAT('ñ', 255));
+--------------------------------+
| CHAR_LENGTH(REPEAT('ñ', 255)) |
+--------------------------------+
| 255 |
+--------------------------------+
1 row in set (0.00 sec)
So the general piece of advice is to use the smallest type possible, because it can potentially create performance or management problems otherwise. A VARCHAR(100)
is better than VARCHAR(255)
(although a VARCHAR(20)
would be better), even if you do not know the exact length. Try to be conservative because, unless the table is too large, you can always change the definition later.
Update: Because the exploding popularity of variable-length strings, for example, with the usage of emojis, Oracle has been pushing for improved performance for those cases. In the latest MySQL versions (5.6, 5.7), InnoDB has been set as the default engine for both intrinsic and explicit temporary tables meaning that variable-length fields are now first-class citizens. That means that there may be less reasons to have very constrained character lengths (but those still exist).
(*) Second Update: large_prefix_index is now enabled by default on the latest MySQL versions (8.0), but that is still true for older versions or if you are using lagacy innodb file/row formats (other than dynamic or compressed), but now by default, single column indexes can be up to those 3072 bytes.