awk "date" | getline var caches its value - but only sometimes

GNU awk's manual mentions that:

If the same file name or the same shell command is used with getline more than once during the execution of an awk program (see section Explicit Input with getline), the file is opened (or the command is executed) the first time only. At that time, the first record of input is read from that file or command. The next time the same file or command is used with getline, another record is read from it, and so on.

So it only runs the command once, and on further reads gets EOF, leaving the old value of x unchanged. Compare with what happens if we trash x after each read:

$ for f in {1..3}; do echo $f; sleep 2; done |
   awk '{ "date" | getline x; printf ">>%s<<\n", x; x ="done" }'
>>Mon Jun 29 13:37:53 EEST 2020<<
>>done<<
>>done<<

If we replace the date command here with something that keeps a record of when it runs, we can also see the record show it only get executed once.

getline does also returns zero at EOF, and -1 on error, so we could check that:

$ for f in {1..3}; do echo $f; sleep 2; done |
    awk '{ if ("date" | getline x > 0) printf ">>%s<<\n", x; else printf "error or eof\n"; }'
>>Mon Jun 29 13:46:58 EEST 2020<<
error or eof
error or eof

You need to close() the pipe explicitly to have awk reopen it the next time.

$ for f in {1..3}; do echo $f; sleep 2; done |
   awk '{ "date" | getline x; printf ">>%s<<\n", x; x = "done"; close("date") }'
>>Mon Jun 29 13:39:19 EEST 2020<<
>>Mon Jun 29 13:39:21 EEST 2020<<
>>Mon Jun 29 13:39:23 EEST 2020<<

With "date; : " NR | getline x;, all the command lines are distinct, so you get a separate pipe for each.

With "date; : " $1 | getline x;, when $1 repeats you get the same issue as in the first case, the second read to the same pipe hits EOF.

Tags:

Awk