Using MAX text or more specific, smaller type
Should I always use
(n)varchar(max)
for text columns?
No.
For SQL Server, the max
data types should only be specified when there is no alternative. One should instead choose the correct base type (varchar
or nvarchar
) and specify an explicit maximum length that is appropriate to the data to be stored.
Physical storage is identical whether the column is typed as varchar(n)
or varchar(max)
, so that is not the concern.
The reasons to not choose (n)varchar(max)
everywhere revolve around features, plan quality, and performance.
An exhaustive list is probably not practical, but among other things, max
columns:
Features
- Require a separate constraint to enforce a maximum length
- Cannot be a key in an index (so no unique constraints either)
- May prevent online DDL (including index rebuilds and adding a new non-null column)
- Are generally not supported for 'newer' features e.g. columnstore
- See the product documentation for more specific features and limitations. The general pattern is that there are awkward limitations and restrictions around
max
data types. Not all limitations and side-effects are documented.
Performance
- Require special handling in the execution engine, to account for the potentially very large size. Typically, this involves using a less efficient code path, with a streaming interface
- May have similar unanticipated consequences for external code (and other SQL Server components like SSIS), which must also be prepared to handle data up to 2GB in size
- Are assumed to be 4000 bytes wide in memory grant calculations. This is likely to lead to excessive memory reservation, which limits concurrency, and pushes valuable index and data pages out of cache memory
- Disable several important performance optimizations
- May extend lock duration
- May prevent the optimizer choosing a (non-dynamic) seek plan
- Prevent filters being pushed into scans and seeks as a residual
- May increase tempdb pressure and contention (version dependant), since variables and parameters are also likely to be typed as
max
to match column definitions
In summary, there are so many subtle (and undesirable) side effects of unnecessarily using the max
specifier that it makes no sense to do this. The minor 'convenience' of using a single declaration is no sort of compensation.
Evaluate each type in context, use the correct base type (varchar
or nvarchar
), and a sensible explicit length.
Further reading:
- Performance comparison of varchar(max) vs. varchar(n) by Remus Rusanu
- Read Committed and Large Objects by Craig Freedman
- Capacity planning for tempdb
- Deletes that split pages and forwarded ghosts (me)
This is going to read like a paranoid's answer, but there aren't only storage and performance considerations.
The database itself doesn't control its clients, and clients can't be assumed to always securely insert user input - even if a database is designed to be used only with a .net application that uses Entity Framework to encapsulate transactions and ensure parameterized queries are systematically used, you cannot know that it's always going to be the case.
I wouldn't know exactly how to do this, but by making all text fields varchar(max)
, if a client has Bobby Tables issues and/or your stored procedures' parameters are also varchar(max)
, then you're making it easier for an attacker to come up with a valid but cleverly evil parameter value that can do things that clients aren't supposed to be doing - whatever that is.
By limiting the length to what you actually need, you're not shielding yourself from these clever attacks (I'm not even sure what it's actually called, I just remember reading about it a while back), but you're not saying "go ahead, try to give me a 2GB script to run" either.