How to create pdf from Linux man pages so that style is presereved?

If you did not know, man pages are already written in a typesetting system that is even older than TeX. This is the *roff family: at different points in history, related programs have gone by names like RUNOFF, runoff, roff, nroff, troff, ditroff, groff, etc. If you're using a Unix-like system (a good guess as you're asking about man pages), you probably have groff installed on your system already, and possibly under names like troff and nroff.

In fact, man pages are written in a macro package for the *roff typesetter. If you type man -d malloc, you get some debugging information: on my computer the last line shows what command it would have run, to (typeset and) display the malloc man page:

cd '/usr/share/man' && (echo ".ll 18.3i"; echo ".nr LL 18.3i"; /bin/cat '/usr/share/man/man3/malloc.3') | /usr/bin/tbl | /usr/bin/groff -Wall -mtty-char -Tascii -mandoc -c | (/usr/bin/less -is || true)

This shows that the file /usr/share/man/man3/malloc.3 is passed through first the tbl preprocessor (which deals with tables), and then is formatted by groff for display on screen. The “input” /usr/share/man/man3/malloc.3 file itself has instructions like this:

The
.Fn malloc
function allocates
.Fa size
bytes of memory and returns a pointer to the allocated memory.

This is analogous to writing (in some hypothetical LaTeX package) something like

The \functionname{malloc} function allocates \functionarg{size} bytes of memory and returns a pointer to the allocated memory.

This is “typeset” by the preprocessors ending with /usr/bin/groff -Wall -mtty-char -Tascii -mandoc -c (definitions of these .Fn and .Fa *roff macros, and the fact that function names should be typeset in bold and arguments should be underlined, are in the manpage-related macro package) and this is why it ends up on screen with appropriate bold and underlines as:

line from malloc man page


Therefore: if you want to generate PDF instead, you just have to change the output format. This you can do in multiple ways (unfortunately there are some differences in the output so you may want to try each of them and pick the one you like the most):

man -t malloc > malloc.ps
ps2pdf malloc.ps

or (yes there exist programs other than TeX that can generate DVI files!)

groff -T dvi -m mandoc '/usr/share/man/man3/malloc.3' > malloc.dvi
dvipdfmx malloc.dvi

or:

groff -T ps -m mandoc '/usr/share/man/man3/malloc.3' > malloc.ps
ps2pdf malloc.ps

or on Linux, you may need:

zcat '/usr/share/man/man3/malloc.3.gz' | tbl | groff -T ps -m mandoc > malloc.ps
ps2pdf malloc.ps

or variants where you use a different converter from PS to PDF or from DVI to PDF. Then you can include the PDF directly into your LaTeX document; you can search on this site for many ways of doing that. If you don't like the page margins, line lengths etc., there are ways you can specify them to groff.

Another alternative is to use the mandoc program, which understands the source format of man files:

zcat '/usr/share/man/man3/malloc.3.gz' | mandoc -T pdf > malloc.pdf

or

zcat '/usr/share/man/man3/malloc.3.gz' | mandoc -T html > malloc.html
# convert from html to pdf or to latex in your preferred way

Note that a conversion to html opens up various possibilities for converting into LaTeX. For example, you can use pandoc. Here is an example that matches some aspects of the display in your screenshot:

  • all bold text to displayed in red (as your terminal evidently does)
  • the background is not white
  • instead of italics, underlines are used (your terminal does that because it does not have an italic font; you may consider whether you want to match that in PDF: underlining is usually considered poor typography)

Create mancolours.tex containing:

\usepackage{pagecolor}

% Set background colour (of the page)
\definecolor{weirdbgcolor}{HTML}{FCF4F0}
\pagecolor{weirdbgcolor}

% Make bold text appear in a particular colour
\definecolor{boldcolor}{HTML}{6E0002}
\let\realtextbf=\textbf
\renewcommand{\textbf}[1]{\textcolor{boldcolor}{\realtextbf{#1}}}

% Use underlines instead of emphasis (ugh)
\renewcommand{\emph}[1]{\underline{#1}}

% % Use fixed-width font by default
% \renewcommand*\familydefault{\ttdefault}

and then:

zcat '/usr/share/man/man3/malloc.3.gz' | mandoc -T html > malloc.html
pandoc -s -o malloc.tex --include-in-header=mancolours.tex malloc.html
pdflatex malloc.tex

This produces stuff like:

matching the screenshot closely


Finally, if none of these are satisfactory, you can look at the source of the man page and write your own tool for translating *roff macros into whatever LaTeX macros you'd like as equivalents. There aren't too many of those, so this should be reasonably doable. (There are some scripts online where people have written similar translators, but I tried a couple and neither worked well enough. So it would be better to write your own.) You may also consider operating on the output from mandoc -Thtml or mandoc -Ttree, if you find those easier.


Yet another option, if you want to match formatted terminal output exactly, is dumping that to a file along with the formatting. When you run man malloc, the pager invoked is most likely something like less. If you dump to a file everything that is displayed, and open the file in a decent editor, you'll see how the terminal does it:

 The m^Hma^Hal^Hll^Hlo^Hoc^Hc() function allocates _^Hs_^Hi_^Hz_^He bytes of memory and…

(the actual character in the file is byte 8, I have changed it to the two characters ^H above so that you can see it). So: to make a character bold, it prints the character, then ^H, and then the character again. To make something underlined, it prints a _, then ^H and then the character. (These make sense if you imagine that ^H acted like moving backwards, and overprinting a character on itself made it bold — this is actually how things worked at some point historically.) On top of that, your terminal preferences get applied, for how it displays such bold and underlined characters.

So, now that you have this file, you can extract the formatting in it, into a format suitable for LaTeX. For example, with the following Python script I turn those into \bold{...} and \underline{...} respectively (man malloc happens to contain no backslashes, but if it did you'd probably want to replace those too):

import re
import sys

def parseFormatting(text):
    """Detects 'bold' and 'underlined' characters in the output to a terminal."""
    chars = [(c, '') for c in text]
    while True:  # Detect bold characters
        m = re.search('(.)\x08\\1', ''.join(c[0] for c in chars))
        if not m: break
        s = m.start()
        chars[s : s + 3] = [(chars[s + 2][0], 'bold')]
    while True:  # Detect underlined characters
        m = re.search('_\x08.', ''.join(c[0] for c in chars))
        if not m: break
        s = m.start()
        chars[s : s + 3] = [(chars[s + 2][0], 'underline')]
    i = 0
    while i < len(chars):  # Collapse runs of identical formatting (for efficiency later)
        j = i
        while j < len(chars) and chars[j][1] == chars[i][1]: j += 1
        chars[i : j] = [(''.join(chars[k][0] for k in range(i, j)), chars[i][1])]
        i += 1
    return chars

def parseFileReplaceFormatting(filename):
    text = open(filename, 'rb').read().decode('utf-8').split('\n')
    newtext = ''
    for line in text:
        for c in parseFormatting(line):
            if c[1] == '':
                newtext += c[0]
            elif c[1] == 'bold':
                newtext += '\\bold{%s}' % c[0]
            elif c[1] == 'underline':
                newtext += '\\underline{%s}' % c[0]
            else: assert False, ('Unknown formatting', c[1], 'for', c[0])
        newtext += '\n'
    return newtext

if __name__ == '__main__':
    infilename = sys.argv[1]
    outfilename = sys.argv[2]
    updated = parseFileReplaceFormatting(infilename)
    with open(outfilename, 'wb') as f:
        f.write(updated.encode('utf-8'))

So after running the above script with something like:

python malloc.man.less malloc.man.less.py2

you can process (\input) the resulting file with TeX. If you wish, you can even preserve line-breaks and whatever crude hyphenation-and-justification your terminal did! (Of course by doing this you lose all the benefits of TeX's beautiful line-breaking algorithm, but you get to match the terminal output exactly.) You just have to make sure that the width of your pages and your terminal are roughly compatible:

\documentclass{article}

\usepackage[paperwidth=11in, textwidth=10in, textheight=4in, paperheight=5in]{geometry}

\usepackage{fontspec}
\setmainfont{Consolas}

\usepackage{xcolor}
% Set background colour (of the page)
\definecolor{weirdbgcolor}{HTML}{FCF4F0}
\usepackage[pagecolor=weirdbgcolor]{pagecolor}
% Make bold text appear in a particular colour
\definecolor{boldcolor}{HTML}{6E0002}
\newcommand{\bold}[1]{\textcolor{boldcolor}{\textbf{#1}}}

\begin{document}
% Foreground colour
\definecolor{fgcolor}{HTML}{A57716}
\color{fgcolor}

\def\nextline{\null\par} % \null so that a blank line in input (two consecutive newlines) becomes an empty paragraph.
{\catcode`\^^M=\active \def^^M{\nextline} \catcode`#=12 \catcode`_=12 \catcode32=12\relax\input{malloc.man.less.py2}}

\end{document}

page 1 of output generated from above TeX document

(You can tell that the above output was generated by TeX because of the black page number at the bottom!)


May be man2html plus pandoc could be a simple good start:

$ zcat '/usr/share/man/man3/malloc.3.gz' | man2html > malloc.html
$ pandoc -s -f html -t latex  malloc.html -o malloc.tex

But if you do not need modify the LaTeX source, you can export directly to PDF:

$ pandoc -f html -t latex malloc.html -o malloc.pdf

mwe

Then, with a new preamble (and removing some first lines):

mwe

\documentclass[10pt]{hitec}
\usepackage[tmargin=.5in]{geometry}
\usepackage[english]{babel}
\settextfraction {1}
\setlength\leftmarginwidth{4em}
\setlength\textwidth{.84\paperwidth}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[colorlinks]{hyperref}
\usepackage{longtable,booktabs}
\usepackage{parskip}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\setlength{\emergencystretch}{3em}  % prevent overfull lines
\setcounter{secnumdepth}{0}
\usepackage{xcolor}
\definecolor{texto}{HTML}{801c35}
\definecolor{fondo}{HTML}{FDF6F3}
\definecolor{textob}{HTML}{BB8B04}
\pagecolor{fondo}\color{textob}
\let\oldbfseries\bfseries
\def\bfseries{\color{texto}\oldbfseries}
\def\textbf#1{\textcolor{texto}{\oldbfseries #1}}
\pagestyle{empty}
\title{Man page of MALLOC}

\begin{document}

\section{MALLOC}\label{malloc}
\subsection{NAME}\label{name}

malloc, free, calloc, realloc - allocate and free dynamic memory

... % remaining text is not changed 

\end{document}

Tags:

Pdf

Pdftex