for vs find in Bash
I tried this on a directory with 2259 entries, and used the time
command.
The output of time for f in *; do echo "$f"; done
(minus the files!) is:
real 0m0.062s
user 0m0.036s
sys 0m0.012s
The output of time find * -prune | while read f; do echo "$f"; done
(minus the files!) is:
real 0m0.131s
user 0m0.056s
sys 0m0.060s
I ran each command several times, so as to eliminate cache misses. This suggests keeping it in bash
(for i in ...) is quicker than using find
and piping the output (to bash
)
Just for completeness, I dropped the pipe from find
, since in your example, it's wholly redundant. The output of of just find * -prune
is:
real 0m0.053s
user 0m0.016s
sys 0m0.024s
Also, time echo *
(output isn't newline separated, alas):
real 0m0.009s
user 0m0.008s
sys 0m0.000s
At this point, I suspect the reason echo *
is quicker is it's not outputting so many newlines, so the output isn't scrolling as much. Let's test...
time find * -prune | while read f; do echo "$f"; done > /dev/null
yields:
real 0m0.109s
user 0m0.076s
sys 0m0.032s
while time find * -prune > /dev/null
yields:
real 0m0.027s
user 0m0.008s
sys 0m0.012s
and time for f in *; do echo "$f"; done > /dev/null
yields:
real 0m0.040s
user 0m0.036s
sys 0m0.004s
and finally: time echo * > /dev/null
yields:
real 0m0.011s
user 0m0.012s
sys 0m0.000s
Some of the variation can be accounted for by random factors, but it seems clear:
- output is slow
- piping costs a bit
for f in *; do ...
is slower thanfind * -prune
, on its own, but for the constructions above involving pipes, is quicker.
Also, as an aside, both approaches appear to handle names with spaces just fine.
EDIT:
Timings for find . -maxdepth 1 > /dev/null
vs. find * -prune > /dev/null
:
time find . -maxdepth 1 > /dev/null
:
real 0m0.018s
user 0m0.008s
sys 0m0.008s
find * -prune > /dev/null
:
real 0m0.031s
user 0m0.020s
sys 0m0.008s
So, additional conclusion:
find * -prune
is slower thanfind . -maxdepth 1
- in the former, the shell is processing a glob, then building a (large) command line forfind
. NB:find . -prune
returns just.
.
More tests: time find . -maxdepth 1 -exec echo {} \; >/dev/null
:
real 0m3.389s
user 0m0.040s
sys 0m0.412s
Conclusion:
- slowest way to do it so far. As was pointed out in the comments for the answer where this approach was suggested, each argument spawns a shell.
I would go definitely with find although I would change your find to just this:
find . -maxdepth 1 -exec echo {} \;
Performance wise, find
is a lot faster depending on your needs of course. What you have currently with for
it will only display the files/directories in the current directory but not the directories contents. If you use find it will also show the contents of the sub-directories.
I say find is better since with your for
the *
will have to be expanded first and I'm afraid that if you have a directory with a huge amount of files it might give the error argument list too long. Same goes for find *
As an example, in one of the systems that I currently use there is a couple of directories with over 2million files (<100k each):
find *
-bash: /usr/bin/find: Argument list too long
1.
The first one:
for f in *; do echo "$f" done
fails for files called -n
, -e
and variants like -nene
and with some bash deployments, with filenames containing backslashes.
The second:
find * -prune | while read f; do echo "$f" done
fails for even more cases (files called !
, -H
, -name
, (
, file names that start or end with blanks or contain newline characters...)
It's the shell that expands *
, find
does nothing but print the files it receives as arguments. You could as well have used printf '%s\n'
instead which as printf
is builtin would also avoid the too many args potential error.
2.
The expansion of *
is sorted, you can make it a bit faster if you don't need the sorting. In zsh
:
for f (*(oN)) printf '%s\n' $f
or simply:
printf '%s\n' *(oN)
bash
has no equivalent as far as I can tell, so you'd need to resort to find
.
3.
find . ! -name . -prune ! -name '.*' -print0 |
while IFS= read -rd '' f; do
printf '%s\n' "$f"
done
(above using a GNU/BSD -print0
non-standard extension).
That still involves spawning a find command and use a slow while read
loop, so it will probably be slower than using the for
loop unless the list of files is huge.
4.
Also, contrary to shell wildcard expansion, find
will do a lstat
system call on each file, so it's unlikely that the non-sorting will compensate for that.
With GNU/BSD find
, that can be avoided by using their -maxdepth
extension which will trigger an optimization saving the lstat
:
find . -maxdepth 1 ! -name '.*' -print0 |
while IFS= read -rd '' f; do
printf '%s\n' "$f"
done
Because find
starts outputting file names as soon as it finds them (except for the stdio output buffering), where it may be faster is if what you do in the loop is time consuming and the list of file names is more than a stdio buffer (4/8 kB). In that case, the processing within the loop will start before find
has finished finding all the files. On GNU and FreeBSD systems, you may use stdbuf
to cause that to happen sooner (disabling stdio buffering).
5.
The POSIX/standard/portable way to run commands for each file with find
is to use the -exec
predicate:
find . ! -name . -prune ! -name '.*' -exec some-cmd {} ';'
In the case of echo
though, that's less efficient than doing the looping in the shell as the shell will have a builtin version of echo
while find
will need to spawn a new process and execute /bin/echo
in it for each file.
If you need to run several commands, you can do:
find . ! -name . -prune ! -name '.*' -exec cmd1 {} ';' -exec cmd2 {} ';'
But beware that cmd2
is only executed if cmd1
is successful.
6.
A canonical way to run complex commands for each file is to call a shell with -exec ... {} +
:
find . ! -name . -prune ! -name '.*' -exec sh -c '
for f do
cmd1 "$f"
cmd2 "$f"
done' sh {} +
That time, we're back to being efficient with echo
since we're using sh
's builtin one and the -exec +
version spawns as few sh
as possible.
7.
In my tests on a directory with 200.000 files with short names on ext4, the zsh
one (paragraph 2.) is by far the fastest, followed by the first simple for i in *
loop (though as usual, bash
is a lot slower than other shells for that).