Why is JSON replacing XML as a data format?

Short answer: yes and no (EDITED as per comments below)

There are fundamental differences and trade-offs. XML is a markup language, particularly suitable for textual documents (xhtml, docbook, various kinds of office docs). And good enough for many other tasks. Problems mostly arise for it having hierarchic model (instead of, relational as in SQL, or object-graph as in oo languages).

JSON is an object notation, meaning it has bit more natural fit for handling data-oriented use cases; cases where xml sort of works, but where there is more cost in overcoming impedance between object and hierarchic models. JSON is not a perfect fit -- it's still data, not objects (no identity, can't do full graphs) -- but it is more natural than XML. And as such, it is easier to build tools to do good decent and simple data binding.

So: there's plenty room for both, and I would expect both to be used for long time to come. Not always in optimal way, but both can do plenty of use cases well enough.

For what it is worth, since writing my original answer, I have seen JSON absolutely annihilate XML for data-oriented/data-interchange use cases for companies I have worked for. SOAP (etc) will start significantly shrinking, and "plain old JSON" data interchange (esp. with RESTish frameworks, JAX-RS for Java for example) will take over.

And yet XML is much better for textual markup.


My bold thesis is that such replacement is impossible after all, since these data-formats (JSON and XML) are different.

Short version: XML is not equivalent to JSON (or similar) format since XML nodes (tags) support attribute notation and namespacing. It turns out to be crucial.

So, the best way to answer this question is actually to show how these formats are different, i.e. to complete the comparison. Forgive me for stating the obvious but I only hope this will be interesting or even useful. It will help if we first agree with simple terminology that:

  1. Data-format is actually a formal language, which governs how data can be recognized (in its representation, i.e. how to "read/write" it from memory according to the way it is stored there).
  2. Data-structure is an abstract way of modeling (describing) how this data is organized or linked.

So, actually both concepts address different aspects of data maintenance (e.g. IO). For example, indexed array of a particular data-type is a (homogenues) structure and it can be accessed (read/written) as a serial sequence (contiguous format).

Wikipedia has a great article about JSON containing a lot of alternatives like (already named lisp's) S-Expressions, Python Nested Structures, PHP arrays, YAML, etc (note we are not considering dictionaries like .ini files since they lack multiple nesting). All these formats can be seen as representation of a certain data-structure - a tree. We can state that they are isomorphic in that sense. Each representation can be mapped to a tree in such manner that no extra processing should be done (e.g. grammar of a formal language is not changed). Also there exists a reverse mapping.

Well you may say that's "some" theory but what does it mean for practice? Implications are that if we compare XML and JSON by:

  • design purpose and motivation
  • application domain - set of task a format is used to solve
  • syntactical complexity (well, simplicity - to which extend format is more readable/writable/human friendly/etc)
  • maturity (like how many versions the format is around)
  • and so on

we will discover further practical differences. Major of them all is that XML is a MARKUP language (as been mentioned). Yes, to do folding it is able to mix namespaces and attributes which results in a higher-order of "parallel" nesting.

For the past two years I was busy transforming XML representation into python nested structures back and forth. To my only bitter conclusion they are very poorly compatible. To represent attributes and namespacing one should escape (e.g. with prefixes) this information in the tree representation. So once again XML is definitely not a tree ;-) it immediately (without the need to encode, encapsulate or escape) allows representation of much more sophisticated structures than trees due to "markup" capabilities, i.e. typed trees. Trees with specialized types of vertexes (again by namespaces and attributes).

There are other difficulties and dangers like parsing and mapping

<body>The <strong>marked up</strong> text</body>

into a tree without some pre-decided convention (How to break "The .. text"?) or preserving order followed in XML.

Obviously things which are not equivalent are naturally having trouble to substitute each other. In that sense XML is more complex than nested structures.

The part of the question regarding industries seems pretty well answered by a prognoses that XML will stay server-side and document-oriented technology. Mainly because of its superior data-typing abilities. Also there have been done a lot of research motivated by XML solely as a markup language.

Excuse me for being far off the topic further, discussing the popularity of JSON but it seems partially relevant ;)

I want to emphasize that JSON (being an object notation) completely fails to grasp any of the custom typing information (it enumerates the type without providing a "runtime"-reference or a context) by design (it is JavaScript), hence fails to pass highly-coupled objectified data. Type information will be always abstracted to JSON native types. This limits the abilities for type oriented development (type checking, constraining, casting, delegation, etc.). But IMHO this very crucial problem is shared with JSON by the most of modern programming languages (I know), which lack sophisticated nested custom data-typing as XML does (objects or functions are not documents). It seems that XML itself is doing this only by accident and not by design.

As the result while working with JSON one applies similar tactics as by processing "duck"- typed data in popular dynamical languages. So this is another characteristics for JSON - allows fast coding but risks to get bulky when is growing too big (nested and complex).

JSON is more of a swiss-knife than XML since it is simpler.

So, JSON does not help to interoperate with strongly-typed languages like Java but on the other hand it allows to lower the coupling by encouraging abstract decomposition. Since losing type information sometimes may be a good thing (reduction factor) it allows simpler architectures. ActionScript prefers to communicate de-facto in JSON (but they have also proposed own AMF). Finally, JSON works great with KISS (e.g. RESTful) designs. JSON buys with speed and simplicity. But what one usually tends to ignore is when KISS is impossible and domain logic is too complicated - designing DTDs and XSDs, thinking formats through and so on - is the work that should be done by someone (often later on when cool KISS approach failed because of lack of designing competence and experience). The point is JSON is a great tool which lacks application scale.

Tags:

Xml

Json