Convert html to plain text in VBA
Tim's answer is excellent. However, a minor adjustment can be added to avoid one foreseeable error response.
Function HtmlToText(sHTML) As String
Dim oDoc As HTMLDocument
If IsNull(sHTML) Then
HtmlToText = ""
Exit Function
End-If
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function
Tim's solution was great, worked liked a charm.
I´d like to contribute: Use this code to add the "Microsoft HTML Object Library" in runtime:
Set ID = ThisWorkbook.VBProject.References
ID.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 2, 5
It worked on Windows XP and Windows 7.
Set a reference to "Microsoft HTML object library".
Function HtmlToText(sHTML) As String
Dim oDoc As HTMLDocument
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function
Tim
A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.
Function StripTags(ByVal html As String) As String
Dim text As String
Dim accumulating As Boolean
Dim n As Integer
Dim c As String
text = ""
accumulating = True
n = 1
Do While n <= Len(html)
c = Mid(html, n, 1)
If c = "<" Then
accumulating = False
ElseIf c = ">" Then
accumulating = True
Else
If accumulating Then
text = text & c
End If
End If
n = n + 1
Loop
StripTags = text
End Function
This can leave lots of extraneous whitespace, but it will help in removing the tags.