Auto generate List of \url usages within document
The following example uses
hyperref
(the question has mentioned "hyperlinking") and hooks into\hyper@linkurl
to get the URLs.The catched URLs are written into an index file
\jobname-url.idx
:\urlentry{<hex coded URL>}{<page number>}
The URL are hex encoded to avoid trouble with special characters.
Package
filecontents
helps to create a style file\jobname-url.mst
formakeindex
. Makeindex automatically looks for a file with the same name as the input file, but with extension.mst
as style file. Then only the.idx
file needs to be given as argument formakeindex
.Makeindex generates the file
\jobname-url.ind
:\begin{theurls} \urlitem{<hex coded URL>}{<page list>} ... \end{theurls}
Environment
theurls
and\urlitem
are defined appropriately to print the list of urls.\listurlname
contains the title of the section.
Remarks:
- Makeindex takes care of the sorting and removes duplicates.
- Hooking into
\hyper@linkurl
has the advantage, that the URL is normalized (e.g.,%
and\%
are the same, a%
with catcode 12/other). - Hex encoding has the advantage, that special characters such as percent, hash or characters with special meaning for makeindex (at sign, ...) do not need a special treatment.
Example file:
\RequirePackage{filecontents}
\begin{filecontents*}{\jobname-url.mst}
% Input style specifiers
keyword "\\urlentry"
% Output style specifiers
preamble "\\begin{theurls}"
postamble "\\end{theurls}\n"
group_skip ""
headings_flag 0
item_0 "\n\\urlitem{"
delim_0 "}{"
delim_t "}"
line_max 500
\end{filecontents*}
\documentclass{article}
\usepackage[colorlinks]{hyperref}
\usepackage{pdfescape}
\makeatletter
\newwrite\file@url
\openout\file@url=\jobname-url.idx\relax
\newcommand*{\write@url}[1]{%
\begingroup
\EdefEscapeHex\@tmp{#1}%
\protected@write\file@url{}{%
\protect\urlentry{\@tmp}{\thepage}%
}%
\endgroup
}
\let\saved@hyper@linkurl\hyper@linkurl
\renewcommand*{\hyper@linkurl}[2]{%
\write@url{#2}%
\saved@hyper@linkurl{#1}{#2}%
}
\newcommand*{\listurlname}{List of URLs}
\newcommand*{\printurls}{%
\InputIfFileExists{\jobname-url.ind}{}{}%
}
\newenvironment{theurls}{%
\section*{\listurlname}%
\@mkboth{\listurlname}{\listurlname}%
\let\write@url\@gobble
\ttfamily
\raggedright
}{%
\par
}
\newcommand*{\urlitem}[2]{%
\hangindent=1em
\hangafter=1
\begingroup
\EdefUnescapeHex\@tmp{#1}%
\expandafter\url\expandafter{\@tmp}%
\endgroup
\par
}
\makeatother
\usepackage[T1]{fontenc}
\usepackage[variablett]{lmodern}
\begin{document}
This this file answers the
\href{http://tex.stackexchange.com/q/121977/16967}{question}
on \href{http://tex.stackexchange.com/}{\TeX.SE}.
Further examples for URLs:
\url{http://www.dante.de/}\\
\url{http://www.ctan.org/}\\
\url{mailto:[email protected]/}\\
\url{ftp://ftp.dante.de/pub/tex/}\\
\url{http://www.example.com/\%7efoo/index.html}\\
\url{http://www.example.com/%7efoo/index.html}
\printurls
\end{document}
The following commands generate the result (linux/bash):
$ pdflatex test
Generates test-url.mst
and test-url.idx
.
$ makeindex test-url
Generates test-url.ind
.
$ pdflatex test
Update for page numbers
There are many formatting ways for the page numbers. The following example uses dots to separate the URL from the page numbers that appear at the end of the line (similar to the index of package doc
). As requested the page numbers are prefixed with p.
, if only one page number follows and pp.
otherwise. This is implented with the help of package xstring
by testing the page number list, whether it contains a comma separator or a hyphen from a range specifier.
\RequirePackage{filecontents}
\begin{filecontents*}{\jobname-url.mst}
% Input style specifiers
keyword "\\urlentry"
% Output style specifiers
preamble "\\begin{theurls}"
postamble "\n\\end{theurls}\n"
group_skip ""
headings_flag 0
item_0 "\n\\urlitem{"
delim_0 "}{"
delim_t "}"
line_max 500
\end{filecontents*}
\documentclass{article}
\usepackage[colorlinks]{hyperref}
\usepackage{pdfescape}
\usepackage{xstring}
\makeatletter
\newwrite\file@url
\openout\file@url=\jobname-url.idx\relax
\newcommand*{\write@url}[1]{%
\begingroup
\EdefEscapeHex\@tmp{#1}%
\protected@write\file@url{}{%
\protect\urlentry{\@tmp}{\thepage}%
}%
\endgroup
}
\let\saved@hyper@linkurl\hyper@linkurl
\renewcommand*{\hyper@linkurl}[2]{%
\write@url{#2}%
\saved@hyper@linkurl{#1}{#2}%
}
\newcommand*{\listurlname}{List of URLs}
\newcommand*{\printurls}{%
\InputIfFileExists{\jobname-url.ind}{}{}%
}
\newenvironment{theurls}{%
\section*{\listurlname}%
\@mkboth{\listurlname}{\listurlname}%
\let\write@url\@gobble
\ttfamily
\raggedright
\setlength{\parfillskip}{0pt}%
}{%
\par
}
\newcommand*{\urlitem}[2]{%
\hangindent=1em
\hangafter=1
\begingroup
\EdefUnescapeHex\@tmp{#1}%
\expandafter\url\expandafter{\@tmp}%
\endgroup
\urlindex@pfill
\IfSubStr{#2}{,}{pp}{%
\IfSubStr{#2}{-}{pp}{p}%
}.\@\space\ignorespaces
#2%
\par
}
\newcommand*{\urlindex@pfill}{% from \pfill of package `doc'
\unskip~\urlindex@dotfill
\penalty500\strut\nobreak
\urlindex@dotfil~\ignorespaces
}
\newcommand*{\urlindex@dotfill}{% from \dotfill of package `doc'
\leaders\hbox to.6em{\hss .\hss}\hskip\z@ plus 1fill\relax
}
\newcommand*{\urlindex@dotfil}{% from \dotfil of package `doc'
\leaders\hbox to.6em{\hss .\hss}\hfil
}
\makeatother
\usepackage[T1]{fontenc}
\usepackage[variablett]{lmodern}
\begin{document}
This this file answers the
\href{http://tex.stackexchange.com/q/121977/16967}{question}
on \href{http://tex.stackexchange.com/}{\TeX.SE}.
Further examples for URLs:
\url{http://www.dante.de/}\\
\url{http://www.ctan.org/}\\
\url{mailto:[email protected]/}\\
\url{ftp://ftp.dante.de/pub/tex/}\\
\url{http://www.example.com/\%7efoo/index.html}\\
\url{http://www.example.com/%7efoo/index.html}
% further pages to generate more page numbers for testing the url index
\newpage
\url{http://www.ctan.org}
\newpage
\url{http://www.ctan.org}
\url{http://tex.stackexchange.com/}
\newpage
\printurls
\end{document}
Atention: the following code works only for simple URLs, that is, URLs that do not contain special characters, like %
. For a complete solution, please refer to Heiko's answer.
As Nicola mentioned in the comments, redefining \url
might be an interesting idea, but some characters in the URL might cause problems. Sadly my TeX-fu isn't good enough to overcome this issue, but here's a preliminary start:
\documentclass{article}
\usepackage{url}
\usepackage{imakeidx}
\let\originalurl\url
\makeindex[name=urls, title={Links found in this document}, columns=1]
\renewcommand{\url}[1]{\originalurl{#1}\index[urls]{\protect\originalurl{#1}}
}
\begin{document}
Hello, make sure to visit \url{http://www.google.com} and,
of course, our own place \url{http://tex.stackexchange.com}.
By the way, \url{http://tex.stackexchange.com} is awesome!
\printindex[urls]
\end{document}
The list is then generated:
Hope it helps. :)