cat a very large number of files together in correct order
Using find
, sort
and xargs
:
find . -maxdepth 1 -type f -name 'file_*.pdb' -print0 |
sort -zV |
xargs -0 cat >all.pdb
The find
command finds all relevant files, then prints their pathnames out to sort
that does a "version sort" to get them in the right order (if the numbers in the filenames had been zero-filled to a fixed width we would not have needed -V
). xargs
takes this list of sorted pathnames and runs cat
on these in as large batches as possible.
This should work even if the filenames contains strange characters such as newlines and spaces. We use -print0
with find
to give sort
nul-terminated names to sort, and sort
handles these using -z
. xargs
too reads nul-terminated names with its -0
flag.
Note that I'm writing the result to a file whose name does not match the pattern file_*.pdb
.
The above solution uses some non-standard flags for some utilities. These are supported by the GNU implementation of these utilities and at least by the OpenBSD and the macOS implementation.
The non-standard flags used are
-maxdepth 1
, to makefind
only enter the top-most directory but no subdirectories. POSIXly, usefind . ! -name . -prune ...
-print0
, to makefind
output nul-terminated pathnames (this was considered by POSIX but rejected). One could use-exec printf '%s\0' {} +
instead.-z
, to makesort
take nul-terminated records. There is no POSIX equivalence.-V
, to makesort
sort e.g.200
after3
. There is no POSIX equivalence, but could be replaced by a numeric sort on specific parts of the filename if the filenames have a fixed prefix.-0
, to makexargs
read nul-terminated records. There is no POSIX equivalence. POSIXly, one would need to quote the file names in a format recognised byxargs
.
If the pathnames are well behaved, and if the directory structure is flat (no subdirectories), then one could make do without these flags, except for -V
with sort
.
With zsh
(where that {1..15000}
operator comes from):
autoload zargs # best in ~/.zshrc
zargs file_{1..15000}.pdb -- cat > file_all.pdb
Or for all file_<digits>.pdb
files in numerical order:
zargs file_<->.pdb(n) -- cat > file_all.pdb
(where <x-y>
is a glob operator that matches on decimal numbers x to y. With no x
nor y
, it's any decimal number. Equivalent to extendedglob
's [0-9]##
or kshglob
's +([0-9])
(one or more digits)).
With ksh93
, using its builtin cat
command (so not affected by that limit of the execve()
system call since there's no execution):
command /opt/ast/bin/cat file_{1..15000}.pdb > file_all.pdb
With bash
/zsh
/ksh93
(which support zsh
's {x..y}
and have printf
builtin):
printf '%s\n' file_{1..15000}.pdb | xargs cat > file_all.pdb
On a GNU system or compatible, you could also use seq
:
seq -f 'file_%.17g.pdb' 15000 | xargs cat > file_all.pdb
For the xargs
-based solutions, special care would have to be taken for file names that contain blanks, single or double quotes or backslashes.
Like for -It's a trickier filename - 12.pdb
, use:
seq -f "\"./-It's a trickier filename - %.17g.pdb\"" 15000 |
xargs cat > file_all.pdb
A for loop is possible, and very simple.
for i in file_{1..15000}.pdb; do cat $i >> file_all.pdb; done
The downside is that you invoke cat
a hell of a lot of times. But if you can't remember exactly how to do the stuff with find
and the invocation overhead isn't too bad in your situation, then it's worth keeping in mind.