Remove HTML Tags in Javascript with Regex

This is an old question, but I stumbled across it and thought I'd share the method I used:

var body = '<div id="anid">some <a href="link">text</a></div> and some more text';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;

sanitized will now contain: "some text and some more text"

Simple, no jQuery needed, and it shouldn't let you down even in more complex cases.

Warning

This can't safely deal with user content, because it's vulnerable to script injections. For example, running this:

var body = '<img src=fake onerror=alert("dangerous")> Hello';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;

Leads to an alert being emitted.


Here is how TextAngular (WYSISYG Editor) is doing it. I also found this to be the most consistent answer, which is NO REGEX.

@license textAngular
Author : Austin Anderson
License : 2013 MIT
Version 1.5.16
// turn html into pure text that shows visiblity
function stripHtmlToText(html)
{
    var tmp = document.createElement("DIV");
    tmp.innerHTML = html;
    var res = tmp.textContent || tmp.innerText || '';
    res.replace('\u200B', ''); // zero width space
    res = res.trim();
    return res;
}

This worked for me.

   var regex = /(&nbsp;|<([^>]+)>)/ig
      ,   body = tt
     ,   result = body.replace(regex, "");
       alert(result);

Try this, noting that the grammar of HTML is too complex for regular expressions to be correct 100% of the time:

var regex = /(<([^>]+)>)/ig
,   body = "<p>test</p>"
,   result = body.replace(regex, "");

console.log(result);

If you're willing to use a library such as jQuery, you could simply do this:

console.log($('<p>test</p>').text());