Sort and count number of occurrence of lines
| sort | uniq -c
As stated in the comments.
Piping the output into sort
organises the output into alphabetical/numerical order.
This is a requirement because uniq
only matches on repeated lines, ie
a
b
a
If you use uniq
on this text file, it will return the following:
a
b
a
This is because the two a
s are separated by the b
- they are not consecutive lines. However if you first sort the data into alphabetical order first like
a
a
b
Then uniq
will remove the repeating lines. The -c
option of uniq
counts the number of duplicates and provides output in the form:
2 a
1 b
References:
sort(1)
uniq(1)
[your command] | sort | uniq -c | sort -nr
The accepted answer is almost complete you might want to add an extra sort -nr
at the end to sort the results with the lines that occur most often first
uniq options:
-c, --count
prefix lines by the number of occurrences
sort options:
-n, --numeric-sort
compare according to string numerical value
-r, --reverse
reverse the result of comparisons
In the particular case were the lines you are sorting are numbers, you need use sort -gr
instead of sort -nr
, see comment
You can use an associative array on awk and then -optionally- sort:
$ awk ' { tot[$0]++ } END { for (i in tot) print tot[i],i } ' access.log | sort
output:
1 c.php
1 d.php
2 b.php
3 a.php