What's the best way to perform a parallel copy on Unix?

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

parallel -j10 cp {} destdir/ ::: *

You can install GNU Parallel simply by:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22444
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351 93a7668d
21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80 e02a2244 40e8a43f
$ bash install.sh

Explanation of commands, arguments, and options

parallel --- Fairly obvious; a call to the parallel command
- build and execute shell command lines from standard input in parallel - man deeplink
-j10 ------- Run 10 jobs in parallel
- Number of jobslots on each machine. Run up to N jobs in parallel. 0 means as many as possible. Default is 100% which will run one job per CPU on each machine. - man deeplink
cp -------- The command to run in parallel
{} --------- Replace received values here. i.e. source_file argument for command cp.
- This replacement string will be replaced by a full line read from the input source. The input source is normally stdin (standard input), but can also be given with -a, :::, or ::::. The replacement string {} can be changed with -I. If the command line contains no replacement strings then {} will be appended to the command line. - man deeplink
destdir/ - The destination directory
::: -------- Tell parallel to use the next argument as input instead of stdin
- Use arguments from the command line as input source instead of stdin (standard input). Unlike other options for GNU parallel ::: is placed after the command and before the arguments. - man deeplink
* ---------- All files in the current directory

Learn more

Your command line will love you for it.

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Get the book 'GNU Parallel 2018' at http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html or download it at: https://doi.org/10.5281/zenodo.1146014 Read at least chapter 1+2. It should take you less than 20 minutes.

Print the cheat sheet: https://www.gnu.org/software/parallel/parallel_cheat.pdf

As long as you limit the copy commands you're running you could probably use a script like the one posted by Scrutinizer

SOURCEDIR="$1"
TARGETDIR="$2"
MAX_PARALLEL=4
nroffiles=$(ls "$SOURCEDIR" | wc -w)
setsize=$(( nroffiles/MAX_PARALLEL + 1 ))
ls -1 "$SOURCEDIR"/* | xargs -n "$setsize" | while read workset; do
  cp -p "$workset" "$TARGETDIR" &
done
wait

Honestly, the best tool is Google's gsutil. It handles parallel copies with directory recursion. Most of the other methods I've seen can't handle directory recursion. They don't specifically mention local filesystem to local filesystem copies in their docs, but it works like a charm.

It's another binary to install, but probably one you might already run considering all of the cloud service adoption nowadays.

What's the best way to perform a parallel copy on Unix?

Tags:

Unix

Parallel Processing

Threads

Related

Recent Posts