Find all broken urls and references in a LaTex document with footnote urls and bibtex urls
If your URLs are consistently marked up, eg as \url{...}
then it should be easy to extract a full list of them either just using sed or something or redefining \url to write them out. Given such a list it is easy to check all the url link to available documents, you could just use a command line tool like wget or an online link checker like http://validator.w3.org/checklink
Having the same problem I just used the following scripts. Sure some Perl hacker can make it a one-liner ;-). The first extract all explicit URLs (\url{...}
from .tex
and url = {...}
from .bib
. I called it as extractlinks.pl *.tex *.bib | sort | uniq > urls.txt
to get a list of URLs in a file:
#!/usr/bin/perl
use File::Slurp qw(read_file);
foreach my $file (@ARGV) {
foreach my $line (read_file($file)) {
my @urls = ($file =~ /\.bib$/)
? $line =~ m/^\s*url\s*=\s*{([^}]+)}/
: $line =~ m/\\url{([^}]+)}/g;
print "$_\n" for @urls;
}
}
The second script tries to download each URL with wget
. On success the URL is printed to STDOUT, on failure it is printed to STDERR. I called the script as
./checklinks.sh < urls.txt > url-ok.txt 2> url-fail.txt
:
#!/bin/bash
while read url; do
wget -O/dev/null -q "$url"
if [ $? -eq 0 ]; then
echo $url
else
echo $url 1>&2
fi
done