Understanding IFS

  1. Yes, they are the same.
  2. Yes.
  3. In bash, and similar shells, you could do something like IFS=$' \t\n'. Otherwise, you could insert the literal control codes by using [space] CTRL+V [tab] CTRL+V [enter]. If you are planning to do this, however, it's better to use another variable to temporarily store the old IFS value, and then restore it afterwards (or temporarily override it for one command by using the var=foo command syntax).
    • The first code snippet will put the entire line read, verbatim, into $line, as there are no field separators to perform word splitting for. Bear in mind however that since many shells use cstrings to store strings, the first instance of a NUL may still cause the appearance of it being prematurely terminated.
    • The second code snippet may not put an exact copy of the input into $line. For example, if there are multiple consecutive field separators, they will be made into a single instance of the first element. This is often recognised as loss of surrounding whitespace.
    • The third code snippet will do the same as the second, except it will only split on a space (not the usual space, tab, or newline).

Q1: Yes. “Field splitting” and “word splitting” are two terms for the same concept.

Q2: Yes. If IFS is unset (i.e. after unset IFS), it is equivalent IFS being set to $' \t\n' (a space, a tab and a newline). If IFS is set to an empty value (that's what “null” means here) (i.e. after IFS= or IFS='' or IFS=""), no field splitting is performed at all (and $*, which normally uses the first character of $IFS, uses a space character).

Q3: If you want to have the default IFS behavior, you can use unset IFS. If you want to set IFS explicitly to this default value, you can put the literal characters space, tab, newline in single quotes. In ksh93, bash or zsh, you can use IFS=$' \t\n'. Portably, if you want to avoid having a literal tab character in your source file, you can use

IFS=" $(echo t | tr t \\t)
"

Q4: With IFS set to an empty value, read -r line sets line to the whole line except its terminating newline. With IFS=" ", spaces at the beginning and at the end of the line are trimmed. With the default value of IFS, tabs and spaces are trimmed.


Q1. Field splitting.

Is field splitting the same as word splitting ?

Yes, both point to the same idea.

Q2: When is IFS null?.

Is setting IFS='' the same as null, the same as an empty string too?

Yes, all three mean the same: No field/word splitting shall be performed. Also, this affects printing fields (as with echo "$*") all fields will be concatenated together with no space.

Q3: (part a) Unset IFS.

In the POSIX specification, I read the following:

If IFS is not set, the shell shall behave as if the value of IFS is <space><tab><newline>.

Which is exactly equivalent to:

With an unset IFS, the shell shall behave as if IFS is default.

That means that the 'Field splitting' will be exactly the same with a default IFS value, or unset.
That does NOT mean that IFS will work the same way in all conditions. Being more specific, executing OldIFS=$IFS will set the var OldIFS to null, not the default. And trying to set IFS back, as this, IFS=OldIFS will set IFS to null, not keep it unset as it were before. Watch out !!.

Q3: (part b) Restore IFS.

How could I restore the value of IFS to default. Say I want to restore the default value of IFS. How do I do that? (more specifically, how do I refer to <tab> and <newline>?)

For zsh, ksh, and bash (AFAIK), IFS could be set to the default value as:

IFS=$' \t\n'        # works with zsh, ksh, bash.

Done, you need to read nothing else.

But if you need to re-set IFS for sh, it may become complex.

Let's take a look from easiest to complete with no drawbacks (except complexity).

1.- Unset IFS.

We could just unset IFS (Read Q3 part a, above.).

2.- Swap chars.

As a workaround, swapping the value of tab and newline makes it simpler to set the value of IFS, and then it works in a equivalent way.

Set IFS to <space><newline><tab>:

sh -c 'IFS=$(echo " \n\t"); printf "%s" "$IFS"|xxd'      # Works.

3.- A simple? solution:

If there are child scripts that need IFS correctly set, you could always manually write:

IFS='   
'

Where the sequence manually typed was: IFS='spacetabnewline', sequence which has actually been correctly typed above (If you need to confirm, edit this answer). But a copy/paste from your browser will break because the browser will squeeze/hide the whitespace. It makes it difficult to share the code as written above.

4.- Complete solution.

To write code that can be safely copied usually involves unambiguous printable escapes.

We need some code that "produces" the expected value. But, even if conceptually correct, this code will NOT set a trailing \n:

sh -c 'IFS=$(echo " \t\n"); printf "%s" "$IFS"|xxd'      # wrong.

That happens because, under most shells, all trailing newlines of $(...) or `...` command substitutions are removed on expansion.

We need to use a trick for sh:

sh -c 'IFS="$(printf " \t\nx")"; IFS="${IFS%x}"; printf "$IFS"|xxd'  # Correct.

An alternative way may be to set IFS as an environment value from bash (for example) and then call sh (the versions of it that accept IFS to be set via the environment), as this:

env IFS=$' \t\n' sh -c 'printf "%s" "$IFS"|xxd'

In short, sh makes resetting IFS to default quite an odd adventure.

Q4: In actual code:

Finally, how would this code:

while IFS= read -r line
do
    echo $line
done < /path_to_text_file

behave if we we change the first line to

while read -r line # Use the default IFS value

or to:

while IFS=' ' read -r line

First: I do not know if the echo $line (with the var NOT quoted) is there on porpouse, or not. It introduces a second level of 'field splitting' that read does not have. So I'll answer both. :)

With this code (so you could confirm). You'll need the useful xxd:

#!/bin/ksh
# Correctly set IFS as described above.
defIFS="$(printf " \t\nx")"; defIFS="${defIFS%x}";
IFS="$defIFS"
printf "IFS value: "
printf "%s" "$IFS"| xxd -p

a='   bar   baz   quz   '; l="${#a}"
printf "var value          : %${l}s-" "$a" ; printf "%s\n" "$a" | xxd -p

printf "%s\n" "$a" | while IFS='x' read -r line; do
    printf "IFS --x--          : %${l}s-" "$line" ;
    printf "%s" "$line" |xxd -p; done;

printf 'Values      quoted :\n' ""  # With values quoted:
printf "%s\n" "$a" | while IFS='' read -r line; do
    printf "IFS null    quoted : %${l}s-" "$line" ;
    printf "%s" "$line" |xxd -p; done;

printf "%s\n" "$a" | while IFS="$defIFS" read -r line; do
    printf "IFS default quoted : %${l}s-" "$line" ;
    printf "%s" "$line" |xxd -p; done;

unset IFS; printf "%s\n" "$a" | while read -r line; do
    printf "IFS unset   quoted : %${l}s-" "$line" ;
    printf "%s" "$line" |xxd -p; done;
    IFS="$defIFS"   # set IFS back to default.

printf "%s\n" "$a" | while IFS=' ' read -r line; do
    printf "IFS space   quoted : %${l}s-" "$line" ;
    printf "%s" "$line" |xxd -p; done;

printf '%s\n' "Values unquoted :"   # Now with values unquoted:
printf "%s\n" "$a" | while IFS='x' read -r line; do
    printf "IFS --x-- unquoted : "
    printf "%s, " $line; printf "%s," $line |xxd -p; done

printf "%s\n" "$a" | while IFS='' read -r line; do
    printf "IFS null  unquoted : ";
    printf "%s, " $line; printf "%s," $line |xxd -p; done

printf "%s\n" "$a" | while IFS="$defIFS" read -r line; do
    printf "IFS defau unquoted : ";
    printf "%s, " $line; printf "%s," $line |xxd -p; done

unset IFS; printf "%s\n" "$a" | while read -r line; do
    printf "IFS unset unquoted : ";
    printf "%s, " $line; printf "%s," $line |xxd -p; done
    IFS="$defIFS"   # set IFS back to default.

printf "%s\n" "$a" | while IFS=' ' read -r line; do
    printf "IFS space unquoted : ";
    printf "%s, " $line; printf "%s," $line |xxd -p; done

I get:

$ ./stackexchange-Understanding-IFS.sh
IFS value: 20090a
var value          :    bar   baz   quz   -20202062617220202062617a20202071757a2020200a
IFS --x--          :    bar   baz   quz   -20202062617220202062617a20202071757a202020
Values      quoted :
IFS null    quoted :    bar   baz   quz   -20202062617220202062617a20202071757a202020
IFS default quoted :       bar   baz   quz-62617220202062617a20202071757a
IFS unset   quoted :       bar   baz   quz-62617220202062617a20202071757a
IFS space   quoted :       bar   baz   quz-62617220202062617a20202071757a
Values unquoted :
IFS --x-- unquoted : bar, baz, quz, 6261722c62617a2c71757a2c
IFS null  unquoted : bar, baz, quz, 6261722c62617a2c71757a2c
IFS defau unquoted : bar, baz, quz, 6261722c62617a2c71757a2c
IFS unset unquoted : bar, baz, quz, 6261722c62617a2c71757a2c
IFS space unquoted : bar, baz, quz, 6261722c62617a2c71757a2c

The first value is just the correct value of IFS='spacetabnewline'

Next line is all the hex values that the var $a has, and a newline '0a' at the end as it is going to be given to each read command.

The next line, for which IFS is null, does not perform any 'field spliting', but the newline is removed (as expected).

The next three lines, as IFS contains an space, remove the initial spaces and set the var line to the balance remaining.

The last four lines shows what an unquoted variable will do. The values will be split on the (several) spaces and will be printed as: bar,baz,qux,

Tags:

Shell