Parallel download using Curl command line utility
Well, curl
is just a simple UNIX process. You can have as many of these curl
processes running in parallel and sending their outputs to different files.
curl
can use the filename part of the URL to generate the local file. Just use the -O
option (man curl
for details).
You could use something like the following
urls="http://example.com/?page1.html http://example.com?page2.html" # add more URLs here
for url in $urls; do
# run the curl job in the background so we can start another job
# and disable the progress bar (-s)
echo "fetching $url"
curl $url -O -s &
done
wait #wait for all background jobs to terminate
My answer is a bit late, but I believe all of the existing answers fall just a little short. The way I do things like this is with xargs
, which is capable of running a specified number of commands in subprocesses.
The one-liner I would use is, simply:
$ seq 1 10 | xargs -n1 -P2 bash -c 'i=$0; url="http://example.com/?page${i}.html"; curl -O -s $url'
This warrants some explanation. The use of -n 1
instructs xargs
to process a single input argument at a time. In this example, the numbers 1 ... 10
are each processed separately. And -P 2
tells xargs
to keep 2 subprocesses running all the time, each one handling a single argument, until all of the input arguments have been processed.
You can think of this as MapReduce in the shell. Or perhaps just the Map phase. Regardless, it's an effective way to get a lot of work done while ensuring that you don't fork bomb your machine. It's possible to do something similar in a for loop in a shell, but end up doing process management, which starts to seem pretty pointless once you realize how insanely great this use of xargs
is.
Update: I suspect that my example with xargs
could be improved (at least on Mac OS X and BSD with the -J
flag). With GNU Parallel, the command is a bit less unwieldy as well:
parallel --jobs 2 curl -O -s http://example.com/?page{}.html ::: {1..10}