Why is `while IFS= read` used so often, instead of `IFS=; while read..`?
The trap is that
IFS=; while read..
sets the IFS
for the whole shell environment outside the loop, whereas
while IFS= read
redefines it only for the read
invocation (except in the Bourne shell).
You can check that doing a loop like
while IFS= read xxx; ... done
then after such loop, echo "blabalbla $IFS ooooooo"
prints
blabalbla
ooooooo
whereas after
IFS=; read xxx; ... done
the IFS
stays redefined: now echo "blabalbla $IFS ooooooo"
prints
blabalbla ooooooo
So if you use the second form, you have to remember to reset : IFS=$' \t\n'
.
The second part of this question has been merged here, so I've removed the related answer from here.
Let's look at an example, with some carefully-crafted input text:
text=' hello world\
foo\bar'
That's two lines, the first beginning with a space and ending with a backslash. First, let's look at what happens without any precautions around read
(but using printf '%s\n' "$text"
to carefully print $text
without any risk of expansion). (Below, $
is the shell prompt.)
$ printf '%s\n' "$text" |
while read line; do printf '%s\n' "[$line]"; done
[hello worldfoobar]
read
ate up the backslashes: backslash-newline causes the newline to be ignored, and backslash-anything ignores that first backslash. To avoid backslashes being treated specially, we use read -r
.
$ printf '%s\n' "$text" |
while read -r line; do printf '%s\n' "[$line]"; done
[hello world\]
[foo\bar]
That's better, we have two lines as expected. The two lines almost contain the desired content: the double space between hello
and world
has been retained, because it's within the line
variable. On the other hand, the initial space was eaten up. That's because read
reads as many words as you pass it variables, except that the last variable contains the rest of the line — but it still starts with the first word, i.e. the initial spaces are discarded.
So, in order to read each line literally, we need to make sure that no word splitting is going on. We do this by setting the IFS
variable to an empty value.
$ printf '%s\n' "$text" |
while IFS= read -r line; do printf '%s\n' "[$line]"; done
[ hello world\]
[foo\bar]
Note how we set IFS
specifically for the duration of the read
built-in. The IFS= read -r line
sets the environment variable IFS
(to an empty value) specifically for the execution of read
.
This is an instance of the general simple command syntax: a (possibly empty) sequence of variable assignments followed by a command name and its arguments (also, you can throw in redirections at any point). Since read
is a built-in, the variable never actually ends up in an external process's environment; nonetheless the value of $IFS
is what we're assigning there as long as read
is executing¹. Note that read
is not a special built-in, so the assignment does last only for its duration.
Thus we're taking care not to change the value of IFS
for other instructions that may rely on it. This code will work no matter what the surrounding code has set IFS
to initially, and it will not cause any trouble if the code inside the loop relies on IFS
.
Contrast with this code snippet, which looks files up in a colon-separated path. The list of file names is read from a file, one file name per line.
IFS=":"; set -f
while IFS= read -r name; do
for dir in $PATH; do
## At this point, "$IFS" is still ":"
if [ -e "$dir/$name" ]; then echo "$dir/$name"; fi
done
done <filenames.txt
If the loop was while IFS=; read -r name; do …
, then for dir in $PATH
would not split $PATH
into colon-separated components. If the code was IFS=; while read …
, it would be even more obvious that IFS
is not set to :
in the loop body.
Of course, it would be possible to restore the value of IFS
after executing read
. But that would require knowing the previous value, which is extra effort. IFS= read
is the simple way (and, conveniently, also the shortest way).
¹ And, if read
is interrupted by a trapped signal, possibly while the trap is executing — this is not specified by POSIX and depends on the shell in practice.
Apart from the (already clarified) IFS
scoping differences between the while IFS='' read
, IFS=''; while read
and while IFS=''; read
idioms (per-command vs script/shell-wide IFS
variable scoping), the take-home lesson is that you lose the leading and trailing spaces of an input line if the IFS variable is set to (contain a) space.
This can have pretty serious consequences if file paths are being processed.
Therefore setting the IFS variable to the empty string is anything but a bad idea since it ensures that a line's leading and trailing whitespace does not get stripped.
See also: Bash, read line by line from file, with IFS
(
shopt -s nullglob
touch ' file with spaces '
IFS=$' \t\n' read -r file <<<"$(printf '%s' *file*with*spaces*)"
ls -l "$file"
IFS='' read -r file <<<"$(printf '%s' *file*with*spaces*)"
ls -l "$file"
)