Which datatype to store XML data in: VARCHAR(MAX) or XML
what factors should I be considering when trying to decide between storing XML in an
xml
column vs. avarchar(MAX)
column
The factors are:
The
XML
type is queryable / parseable through XQuery expressions, including being able to use FLWOR Statement and IterationData in
XML
variables and columns can be modified inline using XQuery expressions via XML DML.XML
data is stored as UTF-16 LE (Little Endian), soVARCHAR(MAX)
would be a poor choice as it could result in data loss. Hence, the true decision should be betweenXML
andNVARCHAR(MAX)
, given thatNCHAR
/NVARCHAR
is also UTF-16 LE.XML
data can be validated against an XSD /XML SCHEMA COLLECTION
. No validation (outside of ensuring well-formedness) is done if no XML Schema Collection is specified, but this option is not available when usingNVARCHAR(MAX)
.One major benefit of the XML type is that it is stored in a highly optimized format (not
VARBINARY(MAX)
as stated in @Oleg's answer) that does not store the exact string representation that you see, but instead has a dictionary of Element and Attribute names and refers to them by their ID. It also removes whitespace. Try the following:DECLARE @Test1 XML = N'<Test><TagName>1</TagName><TagName>2</TagName></Test>'; DECLARE @String1 NVARCHAR(MAX) = CONVERT(NVARCHAR(MAX), @Test1); SELECT DATALENGTH(@Test1) AS [XmlBytes], LEN(@String1) AS [StringCharacters], DATALENGTH(@String1) AS [StringBytes]; SET @Test1 = N'<Test><TagName>1</TagName><TagName>2</TagName><TagName>3</TagName> <TagName>4</TagName><TagName>5</TagName><TagName>6</TagName></Test>'; SET @String1 = CONVERT(NVARCHAR(MAX), @Test1); SELECT DATALENGTH(@Test1) AS [XmlBytes], LEN(@String1) AS [StringCharacters], DATALENGTH(@String1) AS [StringBytes];
Returns:
<!-- language: lang-none -->
XmlBytes StringCharacters StringBytes
56 53 106
XmlBytes StringCharacters StringBytes
84 133 266
As you can see in the example output above, adding four elements (#s 3, 4, 5, and 6) added 80 characters (hence 80 bytes if using VARCHAR
) and 160 bytes to the NVARCHAR
variable. Yet, it only added 28 bytes to the XML variable, which is less than it added for VARCHAR
(just in case someone was going to argue in favor of VARCHAR
over XML
because XML
is UTF-16 which is [mostly] double-byte). This optimization can save tons of space, and is reason enough by itself to use the XML
datatype.
If Queries against the XML will happen by sql server xml capabilities, then use XML type to store a xml to avoid casting
And
keep in mind, that XML type may be stored little bit slower due to xml validation, but underlying type of XML is ordinary varbinary(max)