Make awk produce error on non-numeric
A reasonable way to test would be to compare the field using tests similar to strtod
, which is the method that awk uses to convert strings to numbers:
$2 !~ / *[+-]?[[:digit:]]/ { print "NAN: " $2; exit 1; }
The above differs from strtod in that it does not consider INFINITY or NAN to be "numbers". The leading space requirement could be relaxed under awk's default field-splitting behavior -- meaning the fields would never contain leading space:
$2 !~ /[+-]?[[:digit:]]/ { print "NAN: " $2; exit 1; }
A further refinement, thanks to Stéphane's comment and answer here:
$2 !~ /^[+-]?([[:digit:]]*\.?[[:digit:]]*([eE][-+]?[[:digit:]]+)?|0[xX][[:xdigit:]]*\.?[[:xdigit:]]*([pP][-+]?[[:digit:]]+)?)$/ { print "NAN: " $2; exit 1; }
Broken out for slightly better legibility, that regex is:
/^[+-]?([[:digit:]]*\.?[[:digit:]]*([eE][-+]?[[:digit:]]+)?|\
0[xX][[:xdigit:]]*\.?[[:xdigit:]]*([pP][-+]?[[:digit:]]+)?)$/
... where the intention is to allow a possible leading + or -, then either a floating point number or hexadecimal number. The floating point number has optional leading digits, an option separator (here fixed to be a period .
), followed by some number of digits, optionally followed by an exponent. The hex number must start with 0x
or 0X
, followed by hex digits, a separator, more hex digits, and optionally followed by a "power" (exponent). The entire second field must match one of those formats (as anchored by ^
and $
). Omitted here, for the purposes of this question, are the NAN and INFINITY options.
Another option would be to force a numeric conversion, then compare it to zero and then further compare the original input to something that would convert to zero; more specifically, does it start with an optional + or -, then is it followed by zeros, or followed by a period and zeros:
{ number=0 + $2;
if (!number && $2 !~ /^[+-]?(0+)|\.0+/)
print "NAN: "$2;
}
I ended up with this:
awk -v col=$col '
typeof($col) != "strnum" {
print "Error on line " NR ": " $col " is not numeric"
noprint=1
exit 1
}
{
sum+=$col
}
END {
if(!noprint)
print sum
}' $file
This uses typeof, which is a GNU awk extension. typeof($col)
returns 'strnum' if $col
is a valid number, and 'string' or 'unassigned' if it is not.
See Can I determine type of an awk variable?
awk -v col=2 '
$col+0==0 && $col!~/^[+-]?0/ { print "bad number " $col > "/dev/stderr" }
{sum+=$col}
END{print sum}' input-file
It's up to you to complicate it if you want it to also handle .0
or .0e+33
as valid representations of 0
; notice that awk
will ignore trailing junk when converting strings to numbers ("1.4e1e3"+0
, "1.4e1.e7"+0
or "14+13"+0
will be all equal to 14).