for vs find in Bash

I tried this on a directory with 2259 entries, and used the time command.

The output of time for f in *; do echo "$f"; done (minus the files!) is:

real    0m0.062s
user    0m0.036s
sys     0m0.012s

The output of time find * -prune | while read f; do echo "$f"; done (minus the files!) is:

Click to copy

real    0m0.131s
user    0m0.056s
sys     0m0.060s

I ran each command several times, so as to eliminate cache misses. This suggests keeping it in bash (for i in ...) is quicker than using find and piping the output (to bash)

Just for completeness, I dropped the pipe from find, since in your example, it's wholly redundant. The output of of just find * -prune is:

Click to copy

real    0m0.053s
user    0m0.016s
sys     0m0.024s

Also, time echo * (output isn't newline separated, alas):

Click to copy

real    0m0.009s
user    0m0.008s
sys     0m0.000s

At this point, I suspect the reason echo * is quicker is it's not outputting so many newlines, so the output isn't scrolling as much. Let's test...

Click to copy

time find * -prune | while read f; do echo "$f"; done > /dev/null

yields:

Click to copy

real    0m0.109s
user    0m0.076s
sys     0m0.032s

while time find * -prune > /dev/null yields:

Click to copy

real    0m0.027s
user    0m0.008s
sys     0m0.012s

and time for f in *; do echo "$f"; done > /dev/null yields:

Click to copy

real    0m0.040s
user    0m0.036s
sys     0m0.004s

and finally: time echo * > /dev/null yields:

Click to copy

real    0m0.011s
user    0m0.012s
sys     0m0.000s

Some of the variation can be accounted for by random factors, but it seems clear:

output is slow
piping costs a bit
for f in *; do ... is slower than find * -prune, on its own, but for the constructions above involving pipes, is quicker.

Also, as an aside, both approaches appear to handle names with spaces just fine.

EDIT:

Timings for find . -maxdepth 1 > /dev/null vs. find * -prune > /dev/null:

time find . -maxdepth 1 > /dev/null:

Click to copy

real    0m0.018s
user    0m0.008s
sys     0m0.008s

find * -prune > /dev/null:

Click to copy

real    0m0.031s
user    0m0.020s
sys     0m0.008s

So, additional conclusion:

find * -prune is slower than find . -maxdepth 1 - in the former, the shell is processing a glob, then building a (large) command line for find. NB: find . -prune returns just ..

More tests: time find . -maxdepth 1 -exec echo {} \; >/dev/null:

Click to copy

real    0m3.389s
user    0m0.040s
sys     0m0.412s

Conclusion:

slowest way to do it so far. As was pointed out in the comments for the answer where this approach was suggested, each argument spawns a shell.

I would go definitely with find although I would change your find to just this:

Click to copy

find . -maxdepth 1 -exec echo {} \;

Performance wise, find is a lot faster depending on your needs of course. What you have currently with for it will only display the files/directories in the current directory but not the directories contents. If you use find it will also show the contents of the sub-directories.

I say find is better since with your for the * will have to be expanded first and I'm afraid that if you have a directory with a huge amount of files it might give the error argument list too long. Same goes for find *

As an example, in one of the systems that I currently use there is a couple of directories with over 2million files (<100k each):

Click to copy

find *
-bash: /usr/bin/find: Argument list too long

The first one:

Click to copy

for f in *; do
  echo "$f"
done

fails for files called -n, -e and variants like -nene and with some bash deployments, with filenames containing backslashes.

The second:

Click to copy

find * -prune | while read f; do 
  echo "$f"
done

fails for even more cases (files called !, -H, -name, (, file names that start or end with blanks or contain newline characters...)

It's the shell that expands *, find does nothing but print the files it receives as arguments. You could as well have used printf '%s\n' instead which as printf is builtin would also avoid the too many args potential error.

The expansion of * is sorted, you can make it a bit faster if you don't need the sorting. In zsh:

Click to copy

for f (*(oN)) printf '%s\n' $f

or simply:

Click to copy

printf '%s\n' *(oN)

bash has no equivalent as far as I can tell, so you'd need to resort to find.

Click to copy

find . ! -name . -prune ! -name '.*' -print0 |
  while IFS= read -rd '' f; do
    printf '%s\n' "$f"
  done

(above using a GNU/BSD -print0 non-standard extension).

That still involves spawning a find command and use a slow while read loop, so it will probably be slower than using the for loop unless the list of files is huge.

Also, contrary to shell wildcard expansion, find will do a lstat system call on each file, so it's unlikely that the non-sorting will compensate for that.

With GNU/BSD find, that can be avoided by using their -maxdepth extension which will trigger an optimization saving the lstat:

Click to copy

find . -maxdepth 1 ! -name '.*' -print0 |
  while IFS= read -rd '' f; do
    printf '%s\n' "$f"
  done

Because find starts outputting file names as soon as it finds them (except for the stdio output buffering), where it may be faster is if what you do in the loop is time consuming and the list of file names is more than a stdio buffer (4/8 kB). In that case, the processing within the loop will start before find has finished finding all the files. On GNU and FreeBSD systems, you may use stdbuf to cause that to happen sooner (disabling stdio buffering).

The POSIX/standard/portable way to run commands for each file with find is to use the -exec predicate:

Click to copy

find . ! -name . -prune ! -name '.*' -exec some-cmd {} ';'

In the case of echo though, that's less efficient than doing the looping in the shell as the shell will have a builtin version of echo while find will need to spawn a new process and execute /bin/echo in it for each file.

If you need to run several commands, you can do:

Click to copy

find . ! -name . -prune ! -name '.*' -exec cmd1 {} ';' -exec cmd2 {} ';'

But beware that cmd2 is only executed if cmd1 is successful.

A canonical way to run complex commands for each file is to call a shell with -exec ... {} +:

Click to copy

find . ! -name . -prune ! -name '.*' -exec sh -c '
  for f do
    cmd1 "$f"
    cmd2 "$f"
  done' sh {} +

That time, we're back to being efficient with echo since we're using sh's builtin one and the -exec + version spawns as few sh as possible.

In my tests on a directory with 200.000 files with short names on ext4, the zsh one (paragraph 2.) is by far the fastest, followed by the first simple for i in * loop (though as usual, bash is a lot slower than other shells for that).

for vs find in Bash

Tags:

Performance

Bash

Shell Script

Related

Recent Posts