Parse html using C
I would use libhtmltidy + whatever xml parser like expat or libxml. Depends on what you're looking for.
You want to use HTML tidy to do this. The Lib curl page has some source code to get you going. Documents traversing the dom tree. You don't need an xml parser. Doesn't fail on badly formated html.
http://curl.haxx.se/libcurl/c/htmltidy.html