Understanding "IFS= read -r line"
In POSIX shells, read
, without any option doesn't read a line, it reads words from a (possibly backslash-continued) line, where words are $IFS
delimited and backslash can be used to escape the delimiters (or continue lines).
The generic syntax is:
read word1 word2... remaining_words
read
reads stdin one byte at a time¹ until it finds an unescaped newline character (or end-of-input), splits that according to complex rules and stores the result of that splitting into $word1
, $word2
... $remaining_words
.
For instance on an input like:
<tab> foo bar\ baz bl\ah blah\
whatever whatever
and with the default value of $IFS
, read a b c
would assign:
$a
⇐foo
$b
⇐bar baz
$c
⇐blah blahwhatever whatever
Now if passed only one argument, that doesn't become read line
. It's still read remaining_words
. Backslash processing is still done, IFS whitespace characters are still removed from the beginning and end.
The -r
option removes the backslash processing. So that same command above with -r
would instead assign
$a
⇐foo
$b
⇐bar\
$c
⇐baz bl\ah blah\
Now, for the splitting part, it's important to realise that there are two classes of characters for $IFS
: the IFS whitespace characters (namely space and tab (and newline, though here that doesn't matter unless you use -d), which also happen to be in the default value of $IFS
) and the others. The treatment for those two classes of characters is different.
With IFS=:
(:
being not an IFS whitespace character), an input like :foo::bar::
would be split into ""
, "foo"
, ""
, bar
and ""
(and an extra ""
with some implementations though that doesn't matter except for read -a
). While if we replace that :
with space, the splitting is done into only foo
and bar
. That is leading and trailing ones are ignored, and sequences of them are treated like one. There are additional rules when whitespace and non-whitespace characters are combined in $IFS
. Some implementations can add/remove the special treatment by doubling the characters in IFS (IFS=::
or IFS=' '
).
So here, if we don't want the leading and trailing unescaped whitespace characters to be stripped, we need to remove those IFS white space characters from IFS.
Even with IFS-non-whitespace characters, if the input line contains one (and only one) of those characters and it's the last character in the line (like IFS=: read -r word
on a input like foo:
) with POSIX shells (not zsh
nor some pdksh
versions), that input is considered as one foo
word because in those shells, the characters $IFS
are considered as terminators, so word
will contain foo
, not foo:
.
So, the canonical way to read one line of input with the read
builtin is:
IFS= read -r line
(note that for most read
implementations, that only works for text lines as the NUL character is not supported except in zsh
).
Using var=value cmd
syntax makes sure IFS
is only set differently for the duration of that cmd
command.
History note
The read
builtin was introduced by the Bourne shell and was already to read words, not lines. There are a few important differences with modern POSIX shells.
The Bourne shell's read
didn't support a -r
option (which was introduced by the Korn shell), so there's no way to disable backslash processing other than pre-processing the input with something like sed 's/\\/&&/g'
there.
The Bourne shell didn't have that notion of two classes of characters (which again was introduced by ksh). In the Bourne shell all characters undergo the same treatment as IFS whitespace characters do in ksh, that is IFS=: read a b c
on an input like foo::bar
would assign bar
to $b
, not the empty string.
In the Bourne shell, with:
var=value cmd
If cmd
is a built-in (like read
is), var
remains set to value
after cmd
has finished. That's particularly critical with $IFS
because in the Bourne shell, $IFS
is used to split everything, not only the expansions. Also, if you remove the space character from $IFS
in the Bourne shell, "$@"
no longer works.
In the Bourne shell, redirecting a compound command causes it to run in a subshell (in the earliest versions, even things like read var < file
or exec 3< file; read var <&3
didn't work), so it was rare in the Bourne shell to use read
for anything but user input on the terminal (where that line continuation handling made sense)
Some Unices (like HP/UX, there's also one in util-linux
) still have a line
command to read one line of input (that used to be a standard UNIX command up until the Single UNIX Specification version 2).
That's basically the same as head -n 1
except that it reads one byte at a time to make sure it doesn't read more than one line. On those systems, you can do:
line=`line`
Of course, that means spawning a new process, execute a command and read its output through a pipe, so a lot less efficient than ksh's IFS= read -r line
, but still a lot more intuitive.
¹ though on seekable input, some implementations can revert to reading by blocks and seek-back afterwards as an optimisation. ksh93 goes even further and remembers what was read and uses it for the next read
invocation, though that's currently broken
The Theory
There are two concepts that are in play here :
IFS
is the Input Field Separator, which means the string read will be split based on the characters inIFS
. On a command line,IFS
is normally any whitespace characters, that's why the command line splits at spaces.- Doing something like
VAR=value command
means "modify the environment of command so thatVAR
will have the valuevalue
". Basically, the commandcommand
will seeVAR
as having the valuevalue
, but any command executed after that will still seeVAR
as having its previous value. In other words, that variable will be modified only for that statement.
In this case
So when doing IFS= read -r line
, what you are doing is setting IFS
to an empty string (no character will be used to split, therefore no splitting will occur) so that read
will read the entire line and see it as one word that will be assigned to the line
variable. The changes to IFS
only affect that statement, so that any following commands won't be affected by the change.
As a side note
While the command is correct and will work as intended, setting IFS
in this case is not might1 not be necessary. As written in the bash
man page in the read
builtin section :
One line is read from the standard input [...] and the first word is assigned to the first name, the second word to the second name, and so on, with leftover words and their intervening separators assigned to the last name. If there are fewer words read from the input stream than names, the remaining names are assigned empty values. The characters in
IFS
are used to split the line into words. [...]
Since you only have the line
variable, every words will be assigned to it anyway, so if you don't need any of the preceding and trailing whitespace characters1 you could just write read -r line
and be done with it.
[1] Just as an example of how an unset
or default $IFS
value will cause read
to regard leading/trailing IFS whitespace, you might try:
echo ' where are my spaces? ' | {
unset IFS
read -r line
printf %s\\n "$line"
} | sed -n l
Run it and you will see that the preceding and trailing characters won't survive if IFS
is not unset. Furthermore, some strange things could happen if $IFS
was to be modified somewhere earlier in the script.
You should read that statement in two parts, the first one clears the value of the IFS variable, i.e. is equivalent to the more readable IFS=""
, the second one is reading the line
variable from stdin, read -r line
.
What is specific in this syntax is the IFS affectation is transcient and only valid for the read
command.
Unless I'm missing something, in that particular case clearing IFS
has no effect though as whatever IFS
is set to, the whole line will be read in the line
variable. There would have been a change in behavior only in the case more than one variable had been passed as parameter to the read
instruction.
Edit:
The -r
is there to allow input ending with \
not to be processed specially, i.e. for the backslash to be included in the line
variable and not as a continuation character to allow multi-line input.
$ read line; echo "[$line]"
abc\
> def
[abcdef]
$ read -r line; echo "[$line]"
abc\
[abc\]
Clearing IFS has the side effect of preventing read to trim potential leading and trailing space or tab characters, eg :
$ echo " a b c " | { IFS= read -r line; echo "[$line]" ; }
[ a b c ]
$ echo " a b c " | { read -r line; echo "[$line]" ; }
[a b c]
Thanks to rici for pointing that difference.