Which datatype to store XML data in: VARCHAR(MAX) or XML

what factors should I be considering when trying to decide between storing XML in an xml column vs. a varchar(MAX) column

The factors are:

  1. The XML type is queryable / parseable through XQuery expressions, including being able to use FLWOR Statement and Iteration

  2. Data in XML variables and columns can be modified inline using XQuery expressions via XML DML.

  3. XML data is stored as UTF-16 LE (Little Endian), so VARCHAR(MAX) would be a poor choice as it could result in data loss. Hence, the true decision should be between XML and NVARCHAR(MAX), given that NCHAR / NVARCHAR is also UTF-16 LE.

  4. XML data can be validated against an XSD / XML SCHEMA COLLECTION. No validation (outside of ensuring well-formedness) is done if no XML Schema Collection is specified, but this option is not available when using NVARCHAR(MAX).

  5. One major benefit of the XML type is that it is stored in a highly optimized format (not VARBINARY(MAX) as stated in @Oleg's answer) that does not store the exact string representation that you see, but instead has a dictionary of Element and Attribute names and refers to them by their ID. It also removes whitespace. Try the following:

     DECLARE @Test1 XML = N'<Test><TagName>1</TagName><TagName>2</TagName></Test>';
    
     DECLARE @String1 NVARCHAR(MAX) = CONVERT(NVARCHAR(MAX), @Test1);
    
     SELECT DATALENGTH(@Test1) AS [XmlBytes],
            LEN(@String1) AS [StringCharacters],
            DATALENGTH(@String1) AS [StringBytes];
    
     SET @Test1 = N'<Test><TagName>1</TagName><TagName>2</TagName><TagName>3</TagName>
     <TagName>4</TagName><TagName>5</TagName><TagName>6</TagName></Test>';
    
     SET @String1 = CONVERT(NVARCHAR(MAX), @Test1);
    
     SELECT DATALENGTH(@Test1) AS [XmlBytes],
            LEN(@String1) AS [StringCharacters],
            DATALENGTH(@String1) AS [StringBytes];
    

Returns:

<!-- language: lang-none -->

    XmlBytes   StringCharacters   StringBytes
    56         53                 106

    XmlBytes   StringCharacters   StringBytes
    84         133                266

As you can see in the example output above, adding four elements (#s 3, 4, 5, and 6) added 80 characters (hence 80 bytes if using VARCHAR) and 160 bytes to the NVARCHAR variable. Yet, it only added 28 bytes to the XML variable, which is less than it added for VARCHAR (just in case someone was going to argue in favor of VARCHAR over XML because XML is UTF-16 which is [mostly] double-byte). This optimization can save tons of space, and is reason enough by itself to use the XML datatype.


If Queries against the XML will happen by sql server xml capabilities, then use XML type to store a xml to avoid casting

And

keep in mind, that XML type may be stored little bit slower due to xml validation, but underlying type of XML is ordinary varbinary(max)