sum pair of columns based on matching fields

I would do this in Perl:

$ perl -lane '$k{"$F[0] $F[1]"}+=$F[2]+$F[3]; 
              END{print "$_ $k{$_}" for keys(%k) }' file 
2 1019 15
2 1021 4
2 1030 6
2 1031 8
2 1022 9

Or awk:

awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file

If you want the output sorted according to the second column you could just pipe to sort:

awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file | sort -k2

Note that both solutions include the 1st column as well. The idea is to use the first and second columns as keys to a hash (in perl) or an associative array (in awk). The key in each solution is column1 column2 so if two lines have the same column two but a different column one, they will be grouped separately:

$ cat file
2 1019 2 3
2 1019 4 1
3 1019 2 2

$ awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file
3 1019 4
2 1019 10

Maybe this could help, but is column 1 always 2 and does results depend on it?

awk '{ map[$2] += $3 + $4; } END { for (i in map) { print "2", i, map[i] | "sort -t't'" } }' file

or as mentioned by glenn jackman in comments about sorting:

gawk '{ map[$2] += $3 + $4; } END { PROCINFO["sorted_in"] = "@ind_str_asc"; for (i in map) { print 2, i, map[i] } }' file

You could pre-sort the data and let awk handle the details:

sort -n infile | awk 'NR>1 && p!=$2 {print p,s} {s+=$3+$4} {p=$2}'

You may want to reset the accumulator:

sort -n infile | awk 'NR>1 && p!=$2 {print p,s;s=0} {s+=$3+$4} {p=$2}'

Output:

If you really want the keep the first column, do something like this:

sort -n infile | awk 'NR>1 && p!=$1FS$2 {print p,s} {s+=$3+$4} {p=$1FS$2}'

Output:

Explanation

The p variable holds the $2 value of the previous line, or $1FS$2 in the second case above. This means that the {print p,s} is triggered when $2 of the previous line is not the same as the one on the current line (p!=$2).

sum pair of columns based on matching fields

Explanation

Tags:

Awk

Text Processing

Uniq

Sort

Related

Recent Posts