Speed up copying 1000000 small files
Assuming that
- entries returned by
readdir
are not sorted by inode number - reading files in inode order reduces the number of seek operations
- the content of most files are in the initial 8k allocation (an ext4 optimization) which also should yield less seek operations
you can try to speed up copying via copying files in inode order.
That means using something like this:
$ cd /mnt/src
$ ls -U -i | sort -k1,1 -n | cut -d' ' -f2- > ~/clist
$ xargs cp -t /mnt2/dst < ~/clist
GNU tar
- in the pax
tradition - handles hardlinks on its own.
cd "$srcdir" ; tar --hard-dereference -cf - ./* |
tar -C"${tgtdir}" -vxf -
That way you only have the two tar
processes and you don't need to keep invoking cp
over and over again.
On a similar vein to @maxschlepzig's answer, you can parse the output of filefrag
to sort files in the order that their first fragments appear on disk:
find . -maxdepth 1 -type f |
xargs -d'\n' filefrag -v |
sed -n '
/^ 0: 0../ {
s/^.\{28\}\([0-9][0-9]*\).*/\1/
h
}
/ found$/ {
s/:[^:]*$//
H
g
s/\n/ /p
}' |
sort -nk 1,1 |
cut -d' ' -f 2- |
cpio -p dest_dir
MMV with the above sed
script, so be sure to test thoroughly.
Otherwise, whatever you do, filefrag
(part of e2fsprogs
) will be much faster to use than hdparm
as it can take multiple file arguments. Just the overhead of running hdparm
1,000,000 times is going to add a lot of overhead.
Also it probably wouldn't be so difficult to write a perl
script (or C program), to a FIEMAP
ioctl
for each file, create a sorted array of the blocks that should be copied and the files the belong to and then to copy everything in order by reading the size of each block from the corresponding file (be careful not to run out of files descriptors though).