Pipe output of cat to cURL to download a list of files
xargs -P 10 | curl
GNU xargs -P
can run multiple curl
processes in parallel. E.g. to run 10
processes:
xargs -P 10 -n 1 curl -O < urls.txt
This will speed up download 10x if your maximum download speed if not reached and if the server does not throttle IPs, which is the most common scenario.
Just don't set -P
too high or your RAM may be overwhelmed.
GNU parallel
can achieve similar results.
The downside of those methods is that they don't use a single connection for all files, which what curl
does if you pass multiple URLs to it at once as in:
curl -O out1.txt http://exmple.com/1 -O out2.txt http://exmple.com/2
as mentioned at https://serverfault.com/questions/199434/how-do-i-make-curl-use-keepalive-from-the-command-line
Maybe combining both methods would give the best results? But I imagine that parallelization is more important than keeping the connection alive.
See also: Parallel download using Curl command line utility
A very simple solution would be the following: If you have a file 'file.txt' like
url="http://www.google.de"
url="http://www.yahoo.de"
url="http://www.bing.de"
Then you can use curl and simply do
curl -K file.txt
And curl will call all Urls contained in your file.txt!
So if you have control over your input-file-format, maybe this is the simplest solution for you!
This works for me:
$ xargs -n 1 curl -O < urls.txt
I'm in FreeBSD. Your xargs may work differently.
Note that this runs sequential curl
s, which you may view as unnecessarily heavy. If you'd like to save some of that overhead, the following may work in bash:
$ mapfile -t urls < urls.txt
$ curl ${urls[@]/#/-O }
This saves your URL list to an array, then expands the array with options to curl
to cause targets to be downloaded. The curl
command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1), but it needs the -O
option before each one in order to download and save each target. Note that characters within some URLs ] may need to be escaped to avoid interacting with your shell.
Or if you are using a POSIX shell rather than bash:
$ curl $(printf ' -O %s' $(cat urls.txt))
This relies on printf
's behaviour of repeating the format pattern to exhaust the list of data arguments; not all stand-alone printf
s will do this.
Note that this non-xargs method also may bump up against system limits for very large lists of URLs. Research ARG_MAX and MAX_ARG_STRLEN if this is a concern.
Or you could just do this:
cat urls.txt | xargs curl -O
You only need to use the -I
parameter when you want to insert the cat output in the middle of a command.