How do I convert Linux man pages to HTML without using groff?
There are plenty of alternatives such as roffit, troff, man2html. There's also perl based online manpage browsers, such as manServer.
My favorite is pandoc
, though sadly it doesn't seem to support ROFF input by default (though you can probably use it if you need to chain multiple transformation filters together.
man2html example:
zcat /usr/share/man/man1/dd.1.gz \
| man2html \
| sudo tee /var/www/html/dd.html
roffit example:
git clone git://github.com/bagder/roffit.git
cd roffit
zcat /usr/share/man/man1/dd.1.gz \
| perl roffit \
| sudo tee /var/www/html/dd-roffit.html
Other tools:
- troffcvt does about the same thing.
- The 'real'
troff
- Gonna try out http://heirloom.sourceforge.net/doctools.html. I suspect schily has OpenSolaris and friends in mind :-).
This first bit is a shameless rip from the official website:
mandoc
is a suite of tools compilingmdoc
, theroff
macro language of choice for BSD manual pages, andman
, the predominant historical language for UNIX manuals. It is small, ISO C, ISC-licensed, and quite fast. The main component of the toolset is themandoc
utility program, based on thelibmandoc
validating compiler, to format output for UNIX terminals (with support for wide-character locales), XHTML, HTML, PostScript, and PDF.
mandoc
has predominantly been developed on OpenBSD and is both an OpenBSD and a BSD.lv project. We strive to support all interested free operating systems, in particular FreeBSD, NetBSD, DragonFly, illumos, Minix 3, and GNU/Linux, as well as all systems running thepkgsrc
portable package build system. To supportmandoc
development, consider donating to the OpenBSD foundation.
pacman
informs me my locally installed mdocml
package-size is 3.28mb, and that it includes the following /usr/bin
located binaries:
/usr/bin/demandoc
/usr/bin/makewhatis
/usr/bin/mandoc
/usr/bin/mapropos
/usr/bin/mman
/usr/bin/mwhatis
With it I can do:
mman -Thtml mman >/tmp/html
firefox file:///tmp/html
You can apply your own stylesheets as you like. All of the documentation is online, as well. And all of that, as I think, is compiled with mandoc
as well.
Firstly, it should be noted that there is more than one program called man2html
.
One utility called man2html
is a C program originaly written in the late 1990's by Richard Verhoeven at the Eindhoven University of Technology in the late 1990's. The program has substantially quirky internals. However, it has the advantage that it works with the raw man page source, rather than troff
or nroff
output. This program was added to Frederico Lucifredi's man suite.
The program understands the semantics of the man
and mandoc
macros, and outputs a reasonable HTML structure. For instance when you use indented paragraphs, like this:
.IP word Definition of word. .RS
the program will put out a HTML definition list.
I maintain one very large man page (most of a megabyte of source, and nearly 400 pages long, when converted to letter size PDF by groff
):
$ ls -l txr.1 -rw-rw-r-- 1 kaz kaz 980549 Jan 3 11:38 txr.1
When I needed to convert this to HTML, some five years ago, the only thing I found which did a reasonable job was the man2html
C program, plus post-processing of its output to "season to taste".
Eventually, I wanted a much better quality HTML document, so I started writing troff
macros. The limitations of the C program became painfully apparent, so I forked it. On my git site, you can find a git repo with 30 patches to man2html. These patches fix a number of bugs, and enhance the program with a much improved ability to interpret troff macros, conditionals, loops and other constructs. I also added a M2
register by means of which you can write code which detects that it's running under man2html
and can conditionally do some things differently (scroll down for an example). As well, I added a .M2SS
command which lets you emit a custom HTML header section.
My large manpage is hosted here. This is produced with man2html
, post-processed by my genman.txr
program, which rearranges the sections, and adds hyper-links throughout the document. It also rewrites the internal links in the table of contents to be stable URLs (based on hashing rather than arbitrary enumeration) and makes the table of contents collapsible via some Javascript.
The exact commands used by my Makefile
:
man2html txr.1 | ./txr genman.txr - > txr-manpage.html tbl txr.1 | pdfroff -man --no-toc - > txr-manpage.pdf
For an example of how the output is conditionally different between HTML and nroff
we can look at a section of the man
output:
9.19.4 Macro defstruct Syntax: (defstruct {<name> | (<name> <arg>*)} <super> <slot-specifier>*) The defstruct macro defines a new structure type and registers it under <name>, which must be a bindable symbol, according to the bindable function. Likewise, the name of every <slot> must also be a bindable symbol.
Above, note how parameters are denoted in <angle>
<brackets>
. In the HTML version, they appear in italics.
The syntax section appears in the source code like this:
.coNP Macro @ defstruct .synb .mets (defstruct >> { name | >> ( name << arg *)} < super .mets \ \ << slot-specifier *) .syne
which is all custom macros defined in the same document. Under .mets
, < b
means b
is a meta-syntactic variable. >> a b
means a
is a concrete syntax, next to which is the meta-syntactic b
without any intervening space, and <> a b c
means b
is a meta-syntactic crunched between a
and c
literals.
My improved version of man2html
understands the fairly complicated macro which implements these markup conventions.
Also, note how the manual has automatically numbered sections: that's all done by troff code, which man2html
understands.