How to parse html file using clojure?
Enlive is a great tool for this. In short:
(ns foo.bar
(:require [net.cgrand.enlive-html :as html]))
(defn fetch-page [url]
(html/html-resource (java.net.URL. url)))
Here is a nice tutorial on using it both as a scraper/parser and as a template engine:
Here is a short example of scraping a page.
Another option is clj-tagsoup. Enlive also uses tagsoup, but in addition has a pluggable parser so you can add support for other parsers.
Clojure's xml parsing library is there for you.
Parses and loads the source s, which can be a File, InputStream or String naming a URI. Returns a tree of the xml/element struct-map, which has the keys :tag, :attrs, and :content. and accessor fns tag, attrs, and content. Other parsers can be supplied by passing startparse, a fn taking a source and a ContentHandler and returning a parser
Or use enlive, it's framework fully on clojure or use Java based HtmlCleaner.