How to parse html file using clojure?

Enlive is a great tool for this. In short:

(ns foo.bar
  (:require [net.cgrand.enlive-html :as html]))

(defn fetch-page [url]
  (html/html-resource (java.net.URL. url)))

Here is a nice tutorial on using it both as a scraper/parser and as a template engine:

Here is a short example of scraping a page.

Another option is clj-tagsoup. Enlive also uses tagsoup, but in addition has a pluggable parser so you can add support for other parsers.


Clojure's xml parsing library is there for you.

Parses and loads the source s, which can be a File, InputStream or String naming a URI. Returns a tree of the xml/element struct-map, which has the keys :tag, :attrs, and :content. and accessor fns tag, attrs, and content. Other parsers can be supplied by passing startparse, a fn taking a source and a ContentHandler and returning a parser

Or use enlive, it's framework fully on clojure or use Java based HtmlCleaner.

Tags:

Html

Clojure