Best practice for embedding arbitrary JSON in the DOM?

This method of embedding json in a script tag has a potential security issue. Assuming the json data originated from user input, it is possible to craft a data member that will in effect break out of the script tag and allow direct injection into the dom. See here:

http://jsfiddle.net/YmhZv/1/

Here is the injection

<script type="application/json" id="stuff">
{
    "unicorns": "awesome",
    "abc": [1, 2, 3],
    "badentry": "blah </script><div id='baddiv'>I should not exist.</div><script type="application/json" id='stuff'> ",
}
</script>

There is just no way around escaping/encoding.


As a general direction, I would try using HTML5 data attributes instead. There's nothing to stop you putting in valid JSON. e.g.:

<div id="mydiv" data-unicorns='{"unicorns":"awesome", "abc":[1,2,3]}' class="hidden"></div>

If you're using jQuery, then retrieving it is as easy as:

var stuff = JSON.parse($('#mydiv').attr('data-unicorns'));

See Rule #3.1 in OWASP's XSS prevention cheat sheet.

Say you want to include this JSON in HTML:

{
    "html": "<script>alert(\"XSS!\");</script>"
}

Create a hidden <div> in HTML. Next, escape your JSON by encoding unsafe entities (e.g., &, <, >, ", ', and, /) and put it inside the element.

<div id="init_data" style="display:none">
        {&#34;html&#34;:&#34;&lt;script&gt;alert(\&#34;XSS!\&#34;);&lt;/script&gt;&#34;}
</div>

Now you can access it by reading the textContent of the element using JavaScript and parsing it:

var text = document.querySelector('#init_data').textContent;
var json = JSON.parse(text);
console.log(json); // {html: "<script>alert("XSS!");</script>"}

I think your original method is the best. The HTML5 spec even addresses this use:

"When used to include data blocks (as opposed to scripts), the data must be embedded inline, the format of the data must be given using the type attribute, the src attribute must not be specified, and the contents of the script element must conform to the requirements defined for the format used."

Read here: http://dev.w3.org/html5/spec/Overview.html#the-script-element

You've done exactly that. What is not to love? No character encoding as needed with attribute data. You can format it if you want. It's expressive and the intended use is clear. It doesn't feel like a hack (e.g. as using CSS to hide your "carrier" element does). It's perfectly valid.