Fastest way to concatenate files
Nope, cat is surely the best way to do this. Why use python when there is a program already written in C for this purpose? However, you might want to consider using xargs
in case the command line length exceeds ARG_MAX
and you need more than one cat
. Using GNU tools, this is equivalent to what you already have:
find . -maxdepth 1 -type f -name 'input_file*' -print0 |
sort -z |
xargs -0 cat -- >>out
Allocating the space for the output file first may improve the overall speed as the system won't have to update the allocation for every write.
For instance, if on Linux:
size=$({ find . -maxdepth 1 -type f -name 'input_file*' -printf '%s+'; echo 0;} | bc)
fallocate -l "$size" out &&
find . -maxdepth 1 -type f -name 'input_file*' -print0 |
sort -z | xargs -r0 cat 1<> out
Another benefit is that if there's not enough free space, the copy will not be attempted.
If on btrfs
, you could copy --reflink=always
the first file (which implies no data copy and would therefore be almost instantaneous), and append the rest. If there are 10000 files, that probably won't make much difference though unless the first file is very big.
There's an API to generalise that to ref-copy all the files (the BTRFS_IOC_CLONE_RANGE
ioctl
), but I could not find any utility exposing that API, so you'd have to do it in C (or python
or other languages provided they can call arbitrary ioctl
s).
If the source files are sparse or have large sequences of NUL characters, you could make a sparse output file (saving time and disk space) with (on GNU systems):
find . -maxdepth 1 -type f -name 'input_file*' -print0 |
sort -z | xargs -r0 cat | cp --sparse=always /dev/stdin out