Powershell remove HTML tags in string content
For a pure regex, it should be as easy as <[^>]+>
:
$string -replace '<[^>]+>',''
Debuggex Demo
Note that this could fail with certain HTML comments or the contents of <pre>
tags.
Instead, you could use the HTML Agility Pack (alternative link), which is designed for use in .Net code, and I've used it successfully in PowerShell before:
Add-Type -Path 'C:\packages\HtmlAgilityPack.1.4.6\lib\Net40-client\HtmlAgilityPack.dll'
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml($string)
$doc.DocumentNode.InnerText
HTML Agility Pack works well with non-perfect HTML.