XML vs comma delimited text files

Advantages

A number of advantages XML has over CSV:

  • Hierarchical data organization
  • Automatic data validation (XML Schemas or DTDs)
  • Easily convert formats (using XSL)
  • Easy to identify relational structure
  • Can be used in combination with XML-RPC
  • Suitable for object persistence (marshalling)
  • Simplifies business-to-business communications
  • Helpful related technologies (XPath, DOM)
  • Tight integration with modern Web browsers
  • Extract, Transform, and Load (ETL) tools
  • Backwards file format compatibility (version attribute)
  • Digital signatures

It completely depends on the problem domain and what you are trying to solve.

Example

The last item is something that many people miss when writing web pages. Consider the situation where you have a large data store of songs. Songs have artists, albums, beats per minute, and so forth. You could export the data to XML, write a simple stylesheet to render the XML as XHTML, then point the browser at the XML page. The browser will render the XML as a web page.

You cannot do that with CSV.

Disadvantages

Joel Spolsky has a great article on why XML is a poor choice as a complex data store: it is slow. (Unlike a database, which can retrieve previous or next records with a single CPU instruction, traversing records in an XML document is much slower.) Arguably, this could be considered an optimization problem, resolved by waiting 18 months. Thus:

  • Slower to parse than other formats
  • Syntactical redundancy can detract from readability
  • Document bloat could affect storage costs
  • Cannot easily model overlapping (non-hierarchical) data structures
  • Poorly designed XML file formats are not uncommon (in my experience; citation needed)

Related Question

See also: Why Should I Use A Human Readable File Format.


These aren't the only two options, you can also use JSON or YAML which are much lighter weight than xml.

In general, if you have simple tabular data with out many special characters, CSV isn't a bad choice. For structured data, consider using one of the other 3.

Tags:

Xml

Csv

Text