Bash split a list of files
Method #1 - Using head & tail
You can use the command head
to pull out the first 40 files from a file listing like so:
$ head -40 input_files | xargs ...
To get the next 40:
$ tail -n +41 input_file | head -40 | xargs ...
...
$ tail -n +161 input_file | head -40 | xargs ...
You can keep walking down the list, 40 at a time using this same technique.
Method #2 - Using xargs
If you happen to have all your filenames in a variable, you can use xargs
like so to break up the list into chunks of X number of elements.
Example
Pretend my files are called 1-200. So I load them up into a variable like so:
$ files=$(seq 200)
You can see the first couple of items in this variable:
$ echo $files | head -c 20
1 2 3 4 5 6 7 8 9 10
Now we use xargs
to divide it up:
$ xargs -n 40 <<<$files
1 2 3 4 5 6 7 8 9 10 ...
41 42 43 44 45 46 47 ...
81 82 83 84 85 86 87 ...
121 122 123 124 125 ...
141 142 143 144 145 ...
161 162 163 164 165 ...
181 182 183 184 185 ...
You could then pass the above command to another xargs
which would then run your program:
$ xargs -n 40 <<<$files | xargs ...
If the contents of the list of files isn't easily accessible from a variable you can give xargs
a list via a file instead:
$ xargs -n 40 <input_file
1 2 3 4 5 6 7 8 9 10 ...
41 42 43 44 45 46 47 ...
81 82 83 84 85 86 87 ...
121 122 123 124 125 ...
141 142 143 144 145 ...
161 162 163 164 165 ...
181 182 183 184 185 ...
Method #3 - Bash arrays
Say you had your filenames in a Bash array. Again I'm using a sequence of number 1-200 to represent my filenames.
$ foo=( $(seq 200) )
You can see the contents of the array like so:
$ echo ${foo[@]}
1 2 3 4 5 ....
Now to get the 1st 40:
$ echo "${foo[@]:0:40}"
The 2nd 40, etc:
$ echo "${foo[@]:40:40}"
...
$ echo "${foo[@]:160:40}"
This is a perfect recipe for xargs
:
cat list_of_files | xargs -n 40 command
Quoting from man xargs
:
-n number Set the maximum number of arguments taken from standard input
for each invocation of the utility. An invocation of utility
will use less than number standard input arguments if the
number of bytes accumulated (see the -s option) exceeds the
specified size or there are fewer than number arguments
remaining for the last invocation of utility. The current
default value for number is 5000.
In order to perform different actions for each set, you'd need to get relevant lines before passing those to xargs
:
sed -n '1,40p' list_of_files | xargs command1
sed -n '41,80p' list_of_files | xargs command2
...
FYI, I LOVE the xargs -n 40 <<<$files
but since it does "40 args" per line I did
threads=10
xargs -n $((40/threads)) <<<$files
or if in an array..
n=(1 2 3 4 5 6)
xargs -n $((${#n[@]}/threads))
while read -r input; do
for item in $input; do
<..stuff..>
done &
done <<< $(for x in ${n[@]}; do echo $x; done | xargs -n $((${#n[@]}/threads)))
wait