Here is a quick command to generate a list of the top pages in the Apache web-server’s access log:
gawk '{ print $7}' /var/log/httpd/access_log | sort | uniq -c | sort -nr | head -n 20
Parts of the command explained:
gawk '{ print $7}'
— return only the 7th [white-space delimited] column of text from the access log, which happens to be the path requested.sort
— sort the lines of the output.uniq -c
— condense the output to unique lines, prepending each line with the number of times that line occurs.sort -nr
— sort the resulting lines numerically in reverse order.head -n 20
— chop off all but the first 20 lines.
The result should look something like this:
83361 / 49582 /feed 39616 /robots.txt 36265 /favicon.ico 17048 /?feed=rss2 10798 /archives/3 10036 /wp-content/uploads/2007/05/img_7870_header.jpg 9913 /wp-includes/images/smilies/icon_smile.gif 9425 /wp-comments-post.php 8274 /feed/ 7508 /archives/category/work/feed 7367 /archives/88 7312 /photos/10_small/IMG_3023.JPG.jpg 7175 /photos/10_small/IMG_3028.JPG.jpg 7151 /photos/10_small/IMG_3024.JPG.jpg 7096 /photos/10_small/IMG_3026.JPG.jpg 6381 /photosetToKML.php?set=72157594417350372&size=small 6253 /qtvr/2007-04-05_back_deck_snow%20-%2010000x5000%20-%20SLIN%20-%20Blended%20Layer0002.jpg 5798 /photosetToKML.php 4344 /archives/category/photography