What is the difference between YAML and JSON?
Technically YAML is a superset of JSON. This means that, in theory at least, a YAML parser can understand JSON, but not necessarily the other way around.
See the official specs, in the section entitled "YAML: Relation to JSON".
In general, there are certain things I like about YAML that are not available in JSON.
- As @jdupont pointed out, YAML is visually easier to look at. In fact the YAML homepage is itself valid YAML, yet it is easy for a human to read.
- YAML has the ability to reference other items within a YAML file using "anchors." Thus it can handle relational information as one might find in a MySQL database.
- YAML is more robust about embedding other serialization formats such as JSON or XML within a YAML file.
In practice neither of these last two points will likely matter for things that you or I do, but in the long term, I think YAML will be a more robust and viable data serialization format.
Right now, AJAX and other web technologies tend to use JSON. YAML is currently being used more for offline data processes. For example, it is included by default in the C-based OpenCV computer vision package, whereas JSON is not.
You will find C libraries for both JSON and YAML. YAML's libraries tend to be newer, but I have had no trouble with them in the past. See for example Yaml-cpp.
Differences:
- YAML, depending on how you use it, can be more readable than JSON
- JSON is often faster and is probably still interoperable with more systems
- It's possible to write a "good enough" JSON parser very quickly
- Duplicate keys, which are potentially valid JSON, are definitely invalid YAML.
- YAML has a ton of features, including comments and relational anchors. YAML syntax is accordingly quite complex, and can be hard to understand.
- It is possible to write recursive structures in yaml:
{a: &b [*b]}
, which will loop infinitely in some converters. Even with circular detection, a "yaml bomb" is still possible (see xml bomb). - Because there are no references, it is impossible to serialize complex structures with object references in JSON. YAML serialization can therefore be more efficient.
- In some coding environments, the use of YAML can allow an attacker to execute arbitrary code.
Observations:
- Python programmers are generally big fans of YAML, because of the use of indentation, rather than bracketed syntax, to indicate levels.
- Many programmers consider the attachment of "meaning" to indentation a poor choice.
- If the data format will be leaving an application's environment, parsed within a UI, or sent in a messaging layer, JSON might be a better choice.
- YAML can be used, directly, for complex tasks like grammar definitions, and is often a better choice than inventing a new language.
Bypassing esoteric theory
This answers the title, not the details as most just read the title from a search result on google like me so I felt it was necessary to explain from a web developer perspective.
- YAML uses space indentation, which is familiar territory for Python developers.
- JavaScript developers love JSON because it is a subset of JavaScript and can be directly interpreted and written inside JavaScript, along with using a shorthand way to declare JSON, requiring no double quotes in keys when using typical variable names without spaces.
- There are a plethora of parsers that work very well in all languages for both YAML and JSON.
- YAML's space format can be much easier to look at in many cases because the formatting requires a more human-readable approach.
- YAML's form while being more compact and easier to look at can be deceptively difficult to hand edit if you don't have space formatting visible in your editor. Tabs are not spaces so that further confuses if you don't have an editor to interpret your keystrokes into spaces.
- JSON is much faster to serialize and deserialize because of significantly less features than YAML to check for, which enables smaller and lighter code to process JSON.
- A common misconception is that YAML needs less punctuation and is more compact than JSON but this is completely false. Whitespace is invisible so it seems like there are less characters, but if you count the actual whitespace which is necessary to be there for YAML to be interpreted properly along with proper indentation, you will find YAML actually requires more characters than JSON. JSON doesn't use whitespace to represent hierarchy or grouping and can be easily flattened with unnecessary whitespace removed for more compact transport.
The Elephant in the room: The Internet itself
JavaScript so clearly dominates the web by a huge margin and JavaScript developers prefer using JSON as the data format overwhelmingly along with popular web APIs so it becomes difficult to argue using YAML over JSON when doing web programming in the general sense as you will likely be outvoted in a team environment. In fact, the majority of web programmers aren't even aware YAML exists, let alone consider using it.
If you are doing any web programming, JSON is the default way to go because no translation step is needed when working with JavaScript so then you must come up with a better argument to use YAML over JSON in that case.