sum pair of columns based on matching fields
I would do this in Perl:
$ perl -lane '$k{"$F[0] $F[1]"}+=$F[2]+$F[3];
END{print "$_ $k{$_}" for keys(%k) }' file
2 1019 15
2 1021 4
2 1030 6
2 1031 8
2 1022 9
Or awk:
awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file
If you want the output sorted according to the second column you could just pipe to sort
:
awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file | sort -k2
Note that both solutions include the 1st column as well. The idea is to use the first and second columns as keys to a hash (in perl) or an associative array (in awk). The key in each solution is column1 column2
so if two lines have the same column two but a different column one, they will be grouped separately:
$ cat file
2 1019 2 3
2 1019 4 1
3 1019 2 2
$ awk '{a[$1" "$2]+=$3+$4}END{for (i in a){print i,a[i]}}' file
3 1019 4
2 1019 10
Maybe this could help, but is column 1 always 2 and does results depend on it?
awk '{ map[$2] += $3 + $4; } END { for (i in map) { print "2", i, map[i] | "sort -t't'" } }' file
or as mentioned by glenn jackman in comments about sorting:
gawk '{ map[$2] += $3 + $4; } END { PROCINFO["sorted_in"] = "@ind_str_asc"; for (i in map) { print 2, i, map[i] } }' file
You could pre-sort the data and let awk handle the details:
sort -n infile | awk 'NR>1 && p!=$2 {print p,s} {s+=$3+$4} {p=$2}'
You may want to reset the accumulator:
sort -n infile | awk 'NR>1 && p!=$2 {print p,s;s=0} {s+=$3+$4} {p=$2}'
Output:
1019 15
1021 19
1022 28
1030 34
If you really want the keep the first column, do something like this:
sort -n infile | awk 'NR>1 && p!=$1FS$2 {print p,s} {s+=$3+$4} {p=$1FS$2}'
Output:
2 1019 15
2 1021 19
2 1022 28
2 1030 34
Explanation
The p
variable holds the $2
value of the previous line, or $1FS$2
in the second case above. This means that the {print p,s}
is triggered when $2
of the previous line is not the same as the one on the current line (p!=$2
).