Convert html to plain text in VBA

Tim's answer is excellent. However, a minor adjustment can be added to avoid one foreseeable error response.

 Function HtmlToText(sHTML) As String
      Dim oDoc As HTMLDocument

      If IsNull(sHTML) Then
        HtmlToText = ""
        Exit Function
        End-If

      Set oDoc = New HTMLDocument
      oDoc.body.innerHTML = sHTML
      HtmlToText = oDoc.body.innerText
    End Function

Tim's solution was great, worked liked a charm.

I´d like to contribute: Use this code to add the "Microsoft HTML Object Library" in runtime:

Set ID = ThisWorkbook.VBProject.References
ID.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 2, 5

It worked on Windows XP and Windows 7.


Set a reference to "Microsoft HTML object library".

Function HtmlToText(sHTML) As String
  Dim oDoc As HTMLDocument
  Set oDoc = New HTMLDocument
  oDoc.body.innerHTML = sHTML
  HtmlToText = oDoc.body.innerText
End Function

Tim


A very simple way to extract text is to scan the HTML character by character, and accumulate characters outside of angle brackets into a new string.

Function StripTags(ByVal html As String) As String
    Dim text As String
    Dim accumulating As Boolean
    Dim n As Integer
    Dim c As String

    text = ""
    accumulating = True

    n = 1
    Do While n <= Len(html)

        c = Mid(html, n, 1)
        If c = "<" Then
            accumulating = False
        ElseIf c = ">" Then
            accumulating = True
        Else
            If accumulating Then
                text = text & c
            End If
        End If

        n = n + 1
    Loop

    StripTags = text
End Function

This can leave lots of extraneous whitespace, but it will help in removing the tags.