Getting average of values in a field based on variable in another field
You're not too far off. Try arrays indexed by $1
:
awk '{ total[$1] += $3; count[$1]++ } END {for (t in total) print t, total[t]/count[t]}' file
a 2.5
b 9.66667
Or, if you want a maximum of two decimal points, as you show in your question:
$ awk '{ total[$1] += $3; count[$1]++ } END {for (t in total) printf "%s %.2f\n", t, total[t]/count[t]}' file
a 2.50
b 9.67
With GNU datamash
:
$ datamash -R2 -W -s -g 1 mean 3 <file
a 2.50
b 9.67
$ datamash -W -s -g 1 mean 3 <file
a 2.5
b 9.6666666666667
-R2
round to two decimal places-W
use spaces and/or tabs as field separators-s
sort before grouping-g 1
group on the first fieldmean 3
mean of the values, third field
You can remove -W
if you have tabs as separators add --output-delimiter=' '
if you want to replace tabs with space characters in the output.
Miller is also handy for tasks like this ex.
$ mlr --nidx stats1 -a mean -f 3 -g 1 file.txt
a 2.500000
b 9.666667
or (with a more recent version that has the format-values
verb)
$ mlr --nidx stats1 -a mean -f 3 -g 1 then format-values -f '%.2f' file.txt
a 2.50
b 9.67