How does awk '!a[$0]++' work?
Here is a "intuitive" answer, for a more in depth explanation of awk's mechanism see either @Cuonglm's
In this case, !a[$0]++
, the post-increment ++
can be set aside for a moment, it does not change the value of the expression. So, look at only !a[$0]
. Here:
a[$0]
uses the current line $0
as key to the array a
, taking the value stored there. If this particular key was never referenced before, a[$0]
evaluates to the empty string.
!a[$0]
The !
negates the value from before. If it was empty or zero (false), we now have a true result. If it was non-zero (true), we have a false result. If the whole expression evaluated to true, meaning that a[$0]
was not set to begin with, the whole line is printed as the default action.
Also, regardless of the old value, the post-increment operator adds one to a[$0]
, so the next the same value in the array is accessed, it will be positive and the whole condition will fail.
Here is the processing:
a[$0]
: look at the value of key$0
, in associative arraya
. If it does not exist, automatically create it with an empty string.a[$0]++
: increment the value ofa[$0]
, return the old value as value of expression. The++
operator returns a numeric value, so ifa[$0]
was empty to begin with,0
is returned anda[$0]
incremented to1
.!a[$0]++
: negate the value of expression. Ifa[$0]++
returned0
(a false value), the whole expression evaluates to true, and makesawk
perform the default actionprint $0
. Otherwise, if the whole expression evaluates to false, no further action is taken.
References:
- Expression in awk
- gawk - Increment and Decrement Operators
With gawk
, we can use dgawk (or awk --debug
with newer version) to debug a gawk
script. First, create a gawk
script, named test.awk
:
BEGIN {
a = 0;
!a++;
}
Then run:
dgawk -f test.awk
or:
gawk --debug -f test.awk
In debugger console:
$ dgawk -f test.awk
dgawk> trace on
dgawk> watch a
Watchpoint 1: a
dgawk> run
Starting program:
[ 1:0x7fe59154cfe0] Op_rule : [in_rule = BEGIN] [source_file = test.awk]
[ 2:0x7fe59154bf80] Op_push_i : 0 [PERM|NUMCUR|NUMBER]
[ 2:0x7fe59154bf20] Op_store_var : a [do_reference = FALSE]
[ 3:0x7fe59154bf60] Op_push_lhs : a [do_reference = TRUE]
Stopping in BEGIN ...
Watchpoint 1: a
Old value: untyped variable
New value: 0
main() at `test.awk':3
3 !a++;
dgawk> step
[ 3:0x7fe59154bfc0] Op_postincrement :
[ 3:0x7fe59154bf40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;
dgawk>
You can see, Op_postincrement
was executed before Op_not
.
You can also use si
or stepi
instead of s
or step
to see more clearly:
dgawk> si
[ 3:0x7ff061ac1fc0] Op_postincrement :
3 !a++;
dgawk> si
[ 3:0x7ff061ac1f40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;