Separator between statements in awk
Very good question! I think the key is this: "Thus, the program shown at the start of this section could also be written this way:"
Is not mandatory to be written in this way. It is a kind of alternative way. This means (and has been proved in action) that below statements are both correct :
$ awk '/12/ { print $0 } /21/ { print $0 }' file
$ awk '/12/ { print $0 } ; /21/ { print $0 }' file
I think this semicolon usage is to cover really short - idiomatic code , for example cases that we omit the action part and we want to apply multiple rules on the same line:
$ awk '/12//21/' file
awk: cmd. line:2: /12//21/
awk: cmd. line:2: ^ unexpected newline or end of string
In this case using a semicolon is mandatory to separate rules (=conditions):
$ awk '/12/;/21/' file
Since the {action}
part is ommited in both rules/both conditions, the default action will be performed for every rule = {print $0}
In gawk, this two quote from the manual describe the issue:
An action consists of one or more awk statements, enclosed in braces (‘{…}’). Each statement specifies one thing to do. The statements are separated by newlines or semicolons.
A semicolon is a "separator" but not a "terminator".
The only valid terminator of an action is a closing brace (}
).
Therefore, what follows an action closing brace (}
) must be some other pattern{action}
In the "man mawk" there is some other description that may help clarify what awk should do:
Statements are terminated by newlines, semi-colons or both. Groups of statements such as actions or loop bodies are blocked via { ... } as in C. The last statement in a block doesn't need a terminator.
The "man nawk" explains it like this:
The pattern comes first, and then the action. Action statements are enclosed in { and }.
And, if you want to dwell into the detail, read the POSIX description:
action : '{' newline_opt '}'
| '{' newline_opt terminated_statement_list '}'
| '{' newline_opt unterminated_statement_list '}'
;
And search for what is an "unterminated" statement list.
Or, simpler, search for Action to read:
Any single statement can be replaced by a statement list enclosed in curly braces. The application shall ensure that statements in a statement list are separated by <newline> or <semicolon> characters.
Again: are separated by <newline> or <semicolon> characters
The semicolon between conditional blocks appears to be optional; only the semicolons between statements within blocks appear to be mandatory:
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" } /bar/ {print "bar found"}'
foo found
bar found
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" }; /bar/ {print "bar found"}'
foo found
bar found
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found"; print "whee" }'
foo found
whee
$ echo -e "foo\nbar" | gawk '/foo/ { print "foo found" print "whee" }'
gawk: cmd. line:1: /foo/ { print "foo found" print "whee" }
gawk: cmd. line:1: ^ syntax error
However, when the actual code block between two conditionals is omitted in favor of the default (i. e. {print}
), the semicolon becomes necessary:
$ echo -e "foo\nbar" | gawk '/foo/ /bar/'
gawk: cmd. line:2: /foo/ /bar/
gawk: cmd. line:2: ^ unexpected newline or end of string
$ echo -e "foo\nbar" | gawk '/foo/; /bar/'
foo
bar