Get list of user-agents from nginx log
Solution 1:
awk -F'"' '/GET/ {print $6}' /var/log/nginx-access.log | cut -d' ' -f1 | sort | uniq -c | sort -rn
awk(1)
- selecting full User-Agent string of GET requestscut(1)
- using first word from itsort(1)
- sortinguniq(1)
- countsort(1)
- sorting by count, reversed
PS. Of course it can be replaced by one awk
/sed
/perl
/python
/etc script. I just wanted to show how rich unix-way is.
Solution 2:
While the one liner by SaveTheRbtz does the job, it took several hours to parse my nginx
access log.
Here is a faster version based on his, which takes less than 1 minute per 100MB of log file (corresponding to about 1 million lines):
sed -n 's!.* "GET.* "\([[:alnum:].]\+/*[[:digit:].]*\)[^"]*"$!\1!p' /var/log/nginx/access.log | sort | uniq -c | sort -rfg
It works with the default access log format of nginx
, which is the same as the combined
format of Apache's httpd
and has the User-Agent
as the last field, delimited by "
.
Solution 3:
This is a slight variation of the accepted answer, using fgrep
and cut
.
cat your_file.log | fgrep '"GET ' | cut -d'"' -f6 | cut -d' ' -f1 | sort | uniq -c | sort -rn
There is something appealing about using "weaker" commands when it is possible.
Solution 4:
Awstats should do the trick, but will supply far more information. I hope this helps...
Solution 5:
Webalizer can do it.
Example:
webalizer -o reports_folder -M 5 log_file
-o reports_folder
specifies folder where report is generated-M 5
displays only the browser name and the major version numberlog_file
specifies log file name- source: ftp://ftp.mrunix.net/pub/webalizer/README