XSS prevention and .innerHTML

A simple way to make sure the contents of your element is properly encoded (and will not be parsed as HTML) is to use textContent instead of innerHTML:

Click to copy

element.textContent = "User provided variable with <img src=a>";

Another option is to use innerHTML only after you have encoded (preferably on the server if you get the chance) the values you intend to use.

Could somebody provide an example of why I should HTML encode and then JS encode, and not double encode in HTML when using the .innerHTML method?

Sure.

Assuming the "user provided data" is populated in your JavaScript by the server, then you will have to JS encode to get it there.

This following is pseudocode on the server-side end, but in JavaScript on the front end:

Click to copy

var userProdividedData = "<%=serverVariableSetByUser %>";
element.innerHTML = userProdividedData;

Like ASP.NET <%= %> outputs the server side variable without encoding. If the user is "good" and supplies the value foo then this results in the following JavaScript being rendered:

Click to copy

var userProdividedData = "foo";
element.innerHTML = userProdividedData;

So far no problems.

Now say a malicious user supplies the value "; alert("xss attack!");//. This would be rendered as:

Click to copy

var userProdividedData = ""; alert("xss attack!");//";
element.innerHTML = userProdividedData;

which would result in an XSS exploit where the code is actually executed in the first line of the above.

To prevent this, as you say you JS encode. The OWASP XSS prevention cheat sheet rule #3 says:

Except for alphanumeric characters, escape all characters less than 256 with the \xHH format to prevent switching out of the data value into the script context or into another attribute.

So to secure against this your code would be

Click to copy

var userProdividedData = "<%=JsEncode(serverVariableSetByUser) %>";
element.innerHTML = userProdividedData;

where JsEncode encodes as per the OWASP recommendation.

This would prevent the above attack as it would now render as follows:

Click to copy

var userProdividedData = "\x22\x3b\x20alert\x28\x22xss\x20attack\x21\x22\x29\x3b\x2f\x2f";
element.innerHTML = userProdividedData;

Now you have secured your JavaScript variable assignment against XSS.

However, what if a malicious user supplied <img src="xx" onerror="alert('xss attack')" /> as the value? This would be fine for the variable assignment part as it would simply get converted into the hex entity equivalent like above.

However the line

Click to copy

element.innerHTML = userProdividedData;

would cause alert('xss attack') to be executed when the browser renders the inner HTML. This would be like a DOM Based XSS attack as it is using rendered JavaScript rather than HTML, however, as it passes though the server it is still classed as reflected or stored XSS depending on where the value is initially set.

This is why you would need to HTML encode too. This can be done via a function such as:

Click to copy

function escapeHTML (unsafe_str) {
    return unsafe_str
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/\"/g, '&quot;')
      .replace(/\'/g, '&#39;')
      .replace(/\//g, '&#x2F;')
}

making your code

Click to copy

element.innerHTML = escapeHTML(userProdividedData);

or could be done via JQuery's text() function.

Update regarding question in comments

I just have one more question: You mentioned that we must JS encode because an attacker could enter "; alert("xss attack!");//. But if we would use HTML encoding instead of JS encoding, wouldn't that also HTML encode the " sign and make this attack impossible because we would have: var userProdividedData =""; alert("xss attack!");//";

I'm taking your question to mean the following: Rather than JS encoding followed by HTML encoding, why don't we don't just HTML encode in the first place, and leave it at that?

Well because they could encode an attack such as <img src="xx" onerror="alert('xss attack')" /> all encoded using the \xHH format to insert their payload - this would achieve the desired HTML sequence of the attack without using any of the characters that HTML encoding would affect.

There are some other attacks too: If the attacker entered \ then they could force the browser to miss the closing quote (as \ is the escape character in JavaScript).

This would render as:

Click to copy

var userProdividedData = "\";

which would trigger a JavaScript error because it is not a properly terminated statement. This could cause a Denial of Service to the application if it is rendered in a prominent place.

Additionally say there were two pieces of user controlled data:

Click to copy

var userProdividedData = "<%=serverVariableSetByUser1 %>" + ' - ' + "<%=serverVariableSetByUser2 %>";

the user could then enter \ in the first and ;alert('xss');// in the second. This would change the string concatenation into one big assignment, followed by an XSS attack:

Click to copy

var userProdividedData = "\" + ' - ' + ";alert('xss');//";

Because of edge cases like these it is recommended to follow the OWASP guidelines as they are as close to bulletproof as you can get. You might think that adding \ to the list of HTML encoded values solves this, however there are other reasons to use JS followed by HTML when rendering content in this manner because this method also works for data in attribute values:

Click to copy

<a href="javascript:void(0)" onclick="myFunction('<%=JsEncode(serverVariableSetByUser) %>'); return false">

Despite whether it is single or double quoted:

Click to copy

<a href='javascript:void(0)' onclick='myFunction("<%=JsEncode(serverVariableSetByUser) %>"); return false'>

Or even unquoted:

Click to copy

<a href=javascript:void(0) onclick=myFunction("<%=JsEncode(serverVariableSetByUser) %>");return false;>

If you HTML encoded like mentioned in your comment an entity value:

onclick='var userProdividedData ="";"' (shortened version)

the code is actually run via the browser's HTML parser first, so userProdividedData would be

Click to copy

";;

instead of

Click to copy

&quot;;

so when you add it to the innerHTML call you would have XSS again. Note that <script> blocks are not processed via the browser's HTML parser, except for the closing </script> tag, but that's another story.

It is always wise to encode as late as possible such as shown above. Then if you need to output the value in anything other than a JavaScript context (e.g. an actual alert box does not render HTML, then it will still display correctly).

That is, with the above I can call

Click to copy

alert(serverVariableSetByUser);

just as easily as setting HTML

Click to copy

element.innerHTML = escapeHTML(userProdividedData);

In both cases it will be displayed correctly without certain characters from disrupting output or causing undesirable code execution.

XSS prevention and .innerHTML

Update regarding question in comments

Tags:

Javascript

Encoding

Xss

Innerhtml

Related

Recent Posts