Piping awk's print/printf output into a shell command makes that statement run after all other unrelated print/printf statements
If you want the cat
process to terminate (and the Y
to be printed) before the X
s, then just call close("cat")
after the print "Y" | "cat"
.
All the rest is explained in the manpage, which you better read.
Why isn't Y printed first given that it's supposed to run before the other statements?
The cat
is not supposed to write its output and terminate before the other statements. It may write its output before, after or in between your two print "X"
calls.
When you use something like print ... | "command ..."
in awk, command ..
is started as an asynchronous process with its stdin connected to a pipe (via popen("command ...", "w")
), and that process will not necessarily terminate and write its output before you call close("command ...")
(or that is implicitly done when awk terminates).
See an example like:
BEGIN {
print "foo" | "cat > file"
print "bar" | "cat > file"
}
The result will be that file
will contain both lines, foo
and bar
; the cat > file
command will not be run separately for each line.
Redirections and pipes in awk are similar to redirections and pipes in sh, but there is one major difference. In sh, foo >bar
keeps bar
open only for the duration of the foo
command, and foo | bar
waits for both foo
and bar
to terminate. In awk, a redirection or pipe remains open until it's closed explicitly, and redirecting or piping to the same file name or command multiple times reuses the open redirection/pipe.
For example, in sh, this prints a
, b
, c
, a
, b
, because each sort command gets just two lines of input:
{ echo b; echo a; } | sort
echo c
{ echo b; echo a; } | sort
But in awk, this prints c
, a
, a
, b
, b
(assuming that awk's output is line-buffered, otherwise c
could be delayed) because there is a single sort
command and it won't print anything until it has all of its input data, which only happens when the input side of the pipe gets closed.
{ print "b"; print "a"; } | "sort";
print "c";
{ print "b"; print "a"; } | "sort";
To make a piped command terminate, call the close
function explicitly. Awk implicitly closes all open pipes and redirections when it exits. This prints a
, b
, c
, a
, b
:
{ print "b"; print "a"; } | "sort"; close("sort");
print "c";
{ print "b"; print "a"; } | "sort"; close("sort");
Likewise, this awk snippet creates a two-line file, since foo
is opened once by the first line and is still open when the second line runs:
print "hello" >"foo";
print "world" >"foo";
Whereas this sh snippet creates a one-line file, because the second line opens the file that was created by the first line and truncates it before writing world
:
echo hello >foo
echo world >foo
The main reason awk is designed this way is that there's an implicit loop around the processing of each line. In sh, if you want to process lines in a loop, you'd typically write the redirection around the loop:
while read line; do
if condition "$line"; then
process line
fi
done >output
But in awk, there would be no way to apply the redirection to the implicit loop, so you write
condition($0) { process $0 >"output" }
The awk way is also more powerful because you can open and close pipes at will, even in the middle of a loop or other blocks. In sh, this is possible for redirections with the exec
builtin, but not for pipes: a pipe has to be applied to a (possibly compound) command as a whole.