What Content-Type value should I send for my XML sitemap?
The difference between text/xml and application/xml is the default character encoding if the charset parameter is omitted:
Text/xml and application/xml behave differently when the charset parameter is not explicitly specified. If the default charset (i.e., US-ASCII) for text/xml is inconvenient for some reason (e.g., bad web servers), application/xml provides an alternative (see "Optional parameters" of application/xml registration in Section 3.2).
For text/xml:
Conformant with [RFC2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii"[ASCII]. In cases where the XML MIME entity is transmitted via HTTP, the default charset value is still "us-ascii".
For application/xml:
If an application/xml entity is received where the charset parameter is omitted, no information is being provided about the charset by the MIME Content-Type header. Conforming XML processors MUST follow the requirements in section 4.3.3 of [XML] that directly address this contingency. However, MIME processors that are not XML processors SHOULD NOT assume a default charset if the charset parameter is omitted from an application/xml entity.
So if the charset parameter is omitted, the character encoding of text/xml is US-ASCII while with application/xml the character encoding can be specified in the document itself.
Now a rule of thumb on the internet is: “Be strict with the output but be tolerant with the input.” That means make sure to meet the standards as much as possible when delivering data over the internet. But build in some mechanisms to overlook faults or to guess when receiving and interpreting data over the internet.
So in your case just pick one of the two types (I recommend application/xml) and make sure to specify the used character encoding properly (I recommend to use the respective default character encoding to play safe, so in case of application/xml use UTF-8 or UTF-16).
As a rule of thumb, the safest bet towards making your document be treated properly by all web servers, proxies, and client browsers, is probably the following:
- Use the application/xml content type
- Include a character encoding in the content type, probably UTF-8
- Include a matching character encoding in the encoding attribute of the XML document itself.
In terms of the RFC 3023 spec, which some browsers fail to implement properly, the major difference in the content types is in how clients are supposed to treat the character encoding, as follows:
For application/xml, application/xml-dtd, application/xml-external-parsed-entity, or any one of the subtypes of application/xml such as application/atom+xml, application/rss+xml or application/rdf+xml, the character encoding is determined in this order:
- the encoding given in the charset parameter of the Content-Type HTTP header
- the encoding given in the encoding attribute of the XML declaration within the document,
- utf-8.
For text/xml, text/xml-external-parsed-entity, or a subtype like text/foo+xml, the encoding attribute of the XML declaration within the document is ignored, and the character encoding is:
- the encoding given in the charset parameter of the Content-Type HTTP header, or
- us-ascii.
Most parsers don't implement the spec; they ignore the HTTP Context-Type and just use the encoding in the document. With so many ill-formed documents out there, that's unlikely to change any time soon.