When can I use a temporary IFS for field splitting?

The basic idea is that VAR=VALUE some-command sets VAR to VALUE for the execution of some-command when some-command is an external command, and it doesn't get more fancy than that. If you combine this intuition with some knowledge of how a shell works, you should come up with the right answer in most cases. The POSIX reference is “Simple Commands” in the chapter “Shell Command Language”.

If some-command is an external command, VAR=VALUE some-command is equivalent to env VAR=VALUE some-command. VAR is exported in the environment of some-command, and its value (or lack of a value) in the shell doesn't change.

If some-command is a function, then VAR=VALUE some-command is equivalent to VAR=VALUE; some-command, i.e. the assignment remains in place after the function has returned, and the variable is not exported into the environment. The reason for that has to do with the design of the Bourne shell (and subsequently with backward compatibility): it had no facility to save and restore variable values around the execution of a function. Not exporting the variable makes sense since a function executes in the shell itself. However, ksh (including both ATT ksh93 and pdksh/mksh), bash and zsh implement the more useful behavior where VAR is set only during the execution of the function (it's also exported). In ksh, this is done if the function is defined with the ksh syntax function NAME …, not if it's defined with the standard syntax NAME (). In bash, this is done only in bash mode, not in POSIX mode (when run with POSIXLY_CORRECT=1). In zsh, this is done if the posix_builtins option is not set; this option is not set by default but is turned on by emulate sh or emulate ksh.

If some-command is a builtin, the behavior depends on the type of builtin. Special builtins behave like functions. Special built-ins are the ones that have to be implemented inside the shell because they affect the state shell (e.g. break affects control flow, cd affects the current directory, set affects positional parameters and options…). Other builtins are built-in only for performance and convenience (mostly — e.g. the bash feature printf -v can only be implemented by a builtin), and they behave like an external command.

The assignment takes place after alias expansion, so if some-command is an alias, expand it first to find what happens.

Note that in all cases, the assignment is performed after the command line is parsed, including any variable substitution on the command line itself. So var=a; var=b echo $var prints a, because $var is evaluated before the assignment takes place. And thus IFS=. printf "%s\n" $var uses the old IFS value to split $var.

I've covered all the types of commands, but there's one more case: when there is no command to execute, i.e. if the command consists only of assignments (and possibly redirections). In that case, the assignment remains in place. VAR=VALUE OTHERVAR=OTHERVALUE is equivalent to VAR=VALUE; OTHERVAR=OTHERVALUE. So after IFS=. arr=($var), IFS remains set to .. Since you could use $IFS in the assignment to arr with the expectation that it already has its new value, it makes sense that the new value of IFS is used for the expansion of $var.

In summary, you can use IFS for temporary field splitting only:

  • by starting a new shell or a subshell (e.g. third=$(IFS=.; set -f; set -- $var; echo "$3") is a complicated way of doing third=${var#*.*.} except that they behave differently when the value of var contains less than two . characters);
  • in ksh, with IFS=. some-function where some-function is defined with the ksh syntax function some-function …;
  • in bash and zsh, with IFS=. some-function as long as they are operating in native mode as opposed to compatibility mode.

The answer of @Gilles is really great, he explains (in detail) a complex issue.

However, I believe that the answer to why this command:

$ IFS=. printf "%s\n" $var
a.b.c

works as it does is the simple idea that the whole command line is parsed before it is executed. And that each "word" is processed once by the shell.
The assignments, like IFS=., are delayed (step 4 is the last one):

4.- Each variable assignment shall be expanded ...

until just before the command is executed and all expansions in arguments are processed first to build this executable line:

$ IFS=. printf "%s\n" a.b.c           ## IFS=. goes to the environment.
a.b.c

The value of $var is expanded with the "old" IFS to a.b.c before the command printf is given the arguments "%s\n" and a.b.c.

Eval

One level of delay may be introduced by eval:

$ IFS=. eval printf "'%s\n'" \$var
a
b
c

The line is parsed (1st time) and 'IFS=.' is set to the environment as this:

$ printf '%s\n' $var

Then it is parsed again to this:

$ printf '%s\n' a b c

And executed to this:

a
b
c

The value of $var (a.b.c) is split with the value of IFS in use: ..

Environment

The complex and tricky part is what is valid in the environment when !!!

That is explained very well in the first part of Gilles answer.

With an additional detail.

When this command is executed:

$ IFS=. arr=($var)

The value of IFS is retained in the present environment, yes:

$ printf '<%s>  ' "${arr[@]}" "$IFS"
<a>  <b>  <c>  <.> 

IFS for a single statement.

But it could be avoided: Setting IFS for a single statement

$ IFS=. command eval arr\=\(\$var\)

$  printf '<%s>  ' "${arr[@]}" "$IFS"
<a>  <b>  <c>  < 
> 

Your question regarding

var=a.b.c
IFS=. printf "%s\n" $var

is a corner case.

This is because the macro expansion in the command happens before the shell variable IFS=.is set.

In other words: when $var is expanded, the previous IFS value active, then IFS is set to '.'.

Tags:

Bash