Using awk to sum the values of a column, based on the values of another column

awk -F '|' '$1 ~ /smiths/ {sum += $3} END {print sum}' inputfilename
  • The -F flag sets the field separator; I put it in single quotes because it is a special shell character.
  • Then $1 ~ /smiths/ applies the following {code block} only to lines where the first field matches the regex /smiths/.
  • The rest is the same as your code.

Note that since you're not really using a regex here, just a specific value, you could just as easily use:

awk -F '|' '$1 == "smiths" {sum += $3} END {print sum}' inputfilename

Which checks string equality. This is equivalent to using the regex /^smiths$/, as mentioned in another answer, which includes the ^ anchor to only match the start of the string (the start of field 1) and the $ anchor to only match the end of the string. Not sure how familiar you are with regexes. They are very powerful, but for this case you could use a string equality check just as easily.


Another approach is to use awk associative arrays, more info here. This line produces the desired output:

awk -F '|' '{a[$1] += $3} END{print a["smiths"]}' filename.txt

As a side effect, the array stores all other values:

awk -F '|' '{a[$1] += $3} END{for (i in a) print i, a[i]}' filename.txt

Output:

smiths 212
denniss 100
olivert 10

Very good so far. All you need to do is add a selector before the block to add the sum. Here we check that the first argument contains only "smiths":

awk 'BEGIN {FS = "|"} ; $1 ~ /^smiths$/ {sum+=$3} END {print sum}'

You could shorten this a bit by specifying the field separator as an option. In awk it's generally a good idea to initialize variables on the command line:

awk -F'|' '$1 ~ /^smiths$/ {sum+=$3} END {print sum}'

Tags:

Linux

Awk