How to download all files (but not HTML) from a website using wget?
This downloaded the entire website for me:
wget --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://site/path/
wget -m -p -E -k -K -np http://site/path/
man page will tell you what those options do.
wget
will only follow links, if there is no link to a file from the index page, then wget
will not know about its existence, and hence not download it. ie. it helps if all files are linked to in web pages or in directory indexes.
To filter for specific file extensions:
wget -A pdf,jpg -m -p -E -k -K -np http://site/path/
Or, if you prefer long option names:
wget --accept pdf,jpg --mirror --page-requisites --adjust-extension --convert-links --backup-converted --no-parent http://site/path/
This will mirror the site, but the files without jpg
or pdf
extension will be automatically removed.