Find all broken urls and references in a LaTex document with footnote urls and bibtex urls

If your URLs are consistently marked up, eg as \url{...} then it should be easy to extract a full list of them either just using sed or something or redefining \url to write them out. Given such a list it is easy to check all the url link to available documents, you could just use a command line tool like wget or an online link checker like http://validator.w3.org/checklink


Having the same problem I just used the following scripts. Sure some Perl hacker can make it a one-liner ;-). The first extract all explicit URLs (\url{...} from .tex and url = {...} from .bib. I called it as extractlinks.pl *.tex *.bib | sort | uniq > urls.txt to get a list of URLs in a file:

#!/usr/bin/perl
use File::Slurp qw(read_file);
foreach my $file (@ARGV) {
    foreach my $line (read_file($file)) {
        my @urls = ($file =~ /\.bib$/)
            ? $line =~ m/^\s*url\s*=\s*{([^}]+)}/
            : $line =~ m/\\url{([^}]+)}/g;
        print "$_\n" for @urls;
    }
}

The second script tries to download each URL with wget. On success the URL is printed to STDOUT, on failure it is printed to STDERR. I called the script as ./checklinks.sh < urls.txt > url-ok.txt 2> url-fail.txt:

#!/bin/bash
while read url; do
    wget -O/dev/null -q "$url"
    if [ $? -eq 0 ]; then
        echo $url
    else 
        echo $url 1>&2
    fi
done