When can I use a temporary IFS for field splitting?
The basic idea is that VAR=VALUE some-command
sets VAR
to VALUE
for the execution of some-command
when some-command
is an external command, and it doesn't get more fancy than that. If you combine this intuition with some knowledge of how a shell works, you should come up with the right answer in most cases. The POSIX reference is “Simple Commands” in the chapter “Shell Command Language”.
If some-command
is an external command, VAR=VALUE some-command
is equivalent to env VAR=VALUE some-command
. VAR
is exported in the environment of some-command
, and its value (or lack of a value) in the shell doesn't change.
If some-command
is a function, then VAR=VALUE some-command
is equivalent to VAR=VALUE; some-command
, i.e. the assignment remains in place after the function has returned, and the variable is not exported into the environment. The reason for that has to do with the design of the Bourne shell (and subsequently with backward compatibility): it had no facility to save and restore variable values around the execution of a function. Not exporting the variable makes sense since a function executes in the shell itself. However, ksh (including both ATT ksh93 and pdksh/mksh), bash and zsh implement the more useful behavior where VAR
is set only during the execution of the function (it's also exported). In ksh, this is done if the function is defined with the ksh syntax function NAME …
, not if it's defined with the standard syntax NAME ()
. In bash, this is done only in bash mode, not in POSIX mode (when run with POSIXLY_CORRECT=1
). In zsh, this is done if the posix_builtins
option is not set; this option is not set by default but is turned on by emulate sh
or emulate ksh
.
If some-command
is a builtin, the behavior depends on the type of builtin. Special builtins behave like functions. Special built-ins are the ones that have to be implemented inside the shell because they affect the state shell (e.g. break
affects control flow, cd
affects the current directory, set
affects positional parameters and options…). Other builtins are built-in only for performance and convenience (mostly — e.g. the bash feature printf -v
can only be implemented by a builtin), and they behave like an external command.
The assignment takes place after alias expansion, so if some-command
is an alias, expand it first to find what happens.
Note that in all cases, the assignment is performed after the command line is parsed, including any variable substitution on the command line itself. So var=a; var=b echo $var
prints a
, because $var
is evaluated before the assignment takes place. And thus IFS=. printf "%s\n" $var
uses the old IFS
value to split $var
.
I've covered all the types of commands, but there's one more case: when there is no command to execute, i.e. if the command consists only of assignments (and possibly redirections). In that case, the assignment remains in place. VAR=VALUE OTHERVAR=OTHERVALUE
is equivalent to VAR=VALUE; OTHERVAR=OTHERVALUE
. So after IFS=. arr=($var)
, IFS
remains set to .
. Since you could use $IFS
in the assignment to arr
with the expectation that it already has its new value, it makes sense that the new value of IFS
is used for the expansion of $var
.
In summary, you can use IFS
for temporary field splitting only:
- by starting a new shell or a subshell (e.g.
third=$(IFS=.; set -f; set -- $var; echo "$3")
is a complicated way of doingthird=${var#*.*.}
except that they behave differently when the value ofvar
contains less than two.
characters); - in ksh, with
IFS=. some-function
wheresome-function
is defined with the ksh syntaxfunction some-function …
; - in bash and zsh, with
IFS=. some-function
as long as they are operating in native mode as opposed to compatibility mode.
The answer of @Gilles is really great, he explains (in detail) a complex issue.
However, I believe that the answer to why this command:
$ IFS=. printf "%s\n" $var
a.b.c
works as it does is the simple idea that the whole command line is parsed before it is executed. And that each "word" is processed once by the shell.
The assignments, like IFS=.
, are delayed (step 4 is the last one):
4.- Each variable assignment shall be expanded ...
until just before the command is executed and all expansions in arguments are processed first to build this executable line:
$ IFS=. printf "%s\n" a.b.c ## IFS=. goes to the environment.
a.b.c
The value of $var
is expanded with the "old" IFS to a.b.c
before the command printf
is given the arguments "%s\n"
and a.b.c
.
Eval
One level of delay may be introduced by eval
:
$ IFS=. eval printf "'%s\n'" \$var
a
b
c
The line is parsed (1st time) and 'IFS=.' is set to the environment as this:
$ printf '%s\n' $var
Then it is parsed again to this:
$ printf '%s\n' a b c
And executed to this:
a
b
c
The value of $var
(a.b.c) is split with the value of IFS in use: .
.
Environment
The complex and tricky part is what is valid in the environment when !!!
That is explained very well in the first part of Gilles answer.
With an additional detail.
When this command is executed:
$ IFS=. arr=($var)
The value of IFS is retained in the present environment, yes:
$ printf '<%s> ' "${arr[@]}" "$IFS"
<a> <b> <c> <.>
IFS for a single statement.
But it could be avoided: Setting IFS for a single statement
$ IFS=. command eval arr\=\(\$var\)
$ printf '<%s> ' "${arr[@]}" "$IFS"
<a> <b> <c> <
>
Your question regarding
var=a.b.c
IFS=. printf "%s\n" $var
is a corner case.
This is because the macro expansion
in the command happens before the shell variable IFS=.
is set.
In other words: when $var
is expanded, the previous IFS
value active, then IFS
is set to '.'
.