BASH tip: Top web pages

Here is a quick command to generate a list of the top pages in the Apache web-server’s access log:

gawk '{ print $7}' /var/log/httpd/access_log | sort | uniq -c | sort -nr | head -n 20

Parts of the command explained:

  1. gawk '{ print $7}' — return only the 7th [white-space delimited] column of text from the access log, which happens to be the path requested.
  2. sort — sort the lines of the output.
  3. uniq -c — condense the output to unique lines, prepending each line with the number of times that line occurs.
  4. sort -nr — sort the resulting lines numerically in reverse order.
  5. head -n 20 — chop off all but the first 20 lines.

The result should look something like this:

  83361 /
  49582 /feed
  39616 /robots.txt
  36265 /favicon.ico
  17048 /?feed=rss2
  10798 /archives/3
  10036 /wp-content/uploads/2007/05/img_7870_header.jpg
   9913 /wp-includes/images/smilies/icon_smile.gif
   9425 /wp-comments-post.php
   8274 /feed/
   7508 /archives/category/work/feed
   7367 /archives/88
   7312 /photos/10_small/IMG_3023.JPG.jpg
   7175 /photos/10_small/IMG_3028.JPG.jpg
   7151 /photos/10_small/IMG_3024.JPG.jpg
   7096 /photos/10_small/IMG_3026.JPG.jpg
   6381 /photosetToKML.php?set=72157594417350372&size=small
   6253 /qtvr/2007-04-05_back_deck_snow%20-%2010000x5000%20-%20SLIN%20-%20Blended%20Layer0002.jpg
   5798 /photosetToKML.php
   4344 /archives/category/photography

Leave a Reply

Your email address will not be published. Required fields are marked *