How do I parse the output of the find command when filenames have spaces in them?
Ideally you don't do it that way at all, because parsing filenames properly in a shell script is always difficult (fix it for spaces, you'll still have problems with other embedded characters, in particular newline). This is even listed as the first entry in the BashPitfalls page.
That said, there is a way to almost do what you want:
oIFS=$IFS
IFS=$'\n'
find . -name '*.txt' | while read -r i; do
# use "$i" with whatever you're doing
done
IFS=$oIFS
Remember to also quote $i
when using it, to avoid other things interpreting the spaces later. Also remember to set $IFS
back after using it, because not doing so will cause bewildering errors later.
This does have one other caveat attached: what happens inside the while
loop may take place in a subshell, depending on the exact shell you're using, so variable settings may not persist. The for
loop version avoids that but at the price that, even if you apply the $IFS
solution to avoid issues with spaces, you will then get in trouble if the find
returns too many files.
At some point the correct fix for all of this becomes doing it in a language such as Perl or Python instead of shell.
Use find -print0
and pipe it to xargs -0
, or write your own little C program and pipe it to your little C program. This is what -print0
and -0
were invented for.
Shell scripts aren't the best way to handle filenames with spaces in them: you can do it, but it gets clunky.
You can set the "internal field separator" (IFS
) to something else than space for the loop argument splitting, e.g.
ORIGIFS=${IFS}
NL='
'
IFS=${NL}
for i in $(find . -name '*.txt'); do
IFS=${ORIGIFS}
#do stuff
done
IFS=${ORIGIFS}
I reset IFS
after its use in find, mostly because it looks nice, I think. I haven't seen any problems in having it set to newline, but I think this is "cleaner".
Another method, depending on what you want to do with the output from find
, is to either directly use -exec
with the find
command, or use -print0
and pipe it into xargs -0
. In the first case find
takes care of the file name escaping. In the -print0
case, find
prints its output with a null separator, and then xargs
splits on this. Since no file name can contain that character (what I know of), this is always safe as well. This it mostly useful in simple cases; and usually is not a great substitute for a full for
loop.