Making summary of sentences
The general approach would be
$ awk '{ count[$2]++ }
END {
for (name in count)
printf("%s signed %d time(s)\n", name, count[name])
}' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
I.e., use an associative array/hash to store the number of times that a particular name is seen. In the END
block, iterate over all the names and print out the summary for each.
For slightly nicer formatting, change the %s
placeholder in the printf()
call to something like %-10s
to reserve 10 characters for the names (left-justified).
$ awk '{ count[$2]++ }
END {
for (name in count)
printf("%-10s signed %d time(s)\n", name, count[name])
}' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)
More fiddling around with the output (because I'm bored):
$ awk '{ count[$2]++ }
END {
for (name in count)
printf("%-10s signed %d time%s\n", name, count[name],
count[name] > 1 ? "s" : "" )
}' <file
Harold signed 1 time
Dan signed 1 time
Sebastian signed 1 time
Suzie signed 4 times
Jordan signed 2 times
Suzan signed 1 time
While awk
is using an associated array and that would be limited to the memory size you have, you could do as the following instead:
sort -k2,2 infile | uniq -c
Or to do formatting as you want:
sort -k2,2 infile |uniq -c |awk '{ print $3, "signed", $1, "time(s)" }'
This job is for awk
. You need an array[index]
to do it:
awk 'NF {name[$2]++} END{for (each in name) {print each " signed " name[each] " time(s)"}}' file
Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)
NF
is to remove extra blank lines.
The data is stored in the index and value of the array. Values are referenced with the corresponding index.