How does ` ... | awk '$1=$1'` remove extra spaces?
When we assign a value to a field variable ie. value of $1 is assigned to field $1, awk actually rebuilds its $0 by concatenating them with default field delimiter(or OFS) space.
we can get the same case in the following scenarios as well...
echo -e "foo foo\tbar\t\tbar" | awk '$1=$1'
foo foo bar bar
echo -e "foo foo\tbar\t\tbar" | awk -v OFS=',' '$1=$1'
foo,foo,bar,bar
echo -e "foo foo\tbar\t\tbar" | awk '$3=1'
foo foo 1 bar
For GNU AWK this behavior is documented here:
https://www.gnu.org/software/gawk/manual/html_node/Changing-Fields.html
$1 = $1 # force record to be reconstituted
echo "$string" | awk '$1=$1'
causes AWK to evaluate $1=$1
, which assigns the field to itself, and has the side-effect of re-evaluating $0
; then AWK considers the value of the expression, and because it’s non-zero and non-empty, it executes the default action, which is to print $0
.
The extra spaces are removed when AWK re-evaluates $0
: it does so by concatenating all the fields using OFS
as a separator, and that’s a single space by default. When AWK parses a record, $0
contains the whole record, as-is, and $1
to $NF
contain the fields, without the separators; when any field is assigned to, $0
is reconstructed from the field values.
Whether AWK outputs anything in this example is dependent on the input:
echo "0 0" | awk '$1=$1'
won’t output anything.