Adam October 14th, 2010
Filed under: Computers and Technology , Work/Professional
Tags: Apache, BASH, Linux, web-development
Here is a quick command to generate a list of the top pages in the Apache web-server’s access log:
gawk '{ print $7}' /var/log/httpd/access_log | sort | uniq -c | sort -nr | head -n 20
Parts of the command explained:
gawk '{ print $7}' — return only the 7th [white-space delimited] column of text from the access log, which happens to be the path requested.
sort — sort the lines of the output.
uniq -c — condense the output to unique lines, prepending each line with the number of times that line occurs.
sort -nr – sort the resulting lines numerically in reverse order.
head -n 20 — chop off all but the first 20 lines.
The result should look something like this:
83361 /
49582 /feed
39616 /robots.txt
36265 /favicon.ico
17048 /?feed=rss2
10798 /archives/3
10036 /wp-content/uploads/2007/05/img_7870_header.jpg
9913 /wp-includes/images/smilies/icon_smile.gif
9425 /wp-comments-post.php
8274 /feed/
7508 /archives/category/work/feed
7367 /archives/88
7312 /photos/10_small/IMG_3023.JPG.jpg
7175 /photos/10_small/IMG_3028.JPG.jpg
7151 /photos/10_small/IMG_3024.JPG.jpg
7096 /photos/10_small/IMG_3026.JPG.jpg
6381 /photosetToKML.php?set=72157594417350372&size=small
6253 /qtvr/2007-04-05_back_deck_snow%20-%2010000x5000%20-%20SLIN%20-%20Blended%20Layer0002.jpg
5798 /photosetToKML.php
4344 /archives/category/photography
Adam June 14th, 2010
Filed under: Computers and Technology , Work/Professional
Tags: caching, PHP, reverse-proxy, Varnish, web-development
Note: This is a cross-post of documentation I am writing about Lazy Sessions.
Why use reverse-proxy caching?
For most public-facing web applications, the significant majority of their traffic is anonymous, non-authenticated users. Even with a variety of internal data-cache mechanisms and other good optimizations, a large amount of code execution goes into executing a PHP application to generate a page even if the content of this page will be the same for many users. Code and query optimization are very important to improving the experience for all users of a web application, but even the most basic “Hello World” script will top out at about 3k requests/second due to the overhead of Apache and PHP — many real applications top out at less than 200 requests/second. Varnish, a light-weight proxy-server that can run on the same host as the webserver, can cache pages in memory and can serve them at rates of more than 10k requests/second with thousands of concurrent connections.
While the point of web-applications is to have content be dynamic and easily changeable, for most applications and most of the anonymous users, receiving content that is slightly stale (cached for 5 minutes or something similar) isn’t a big deal. Sure, visitors to your blog might not see the latest post for a few minutes, but they will get their response in 4 milliseconds rather than 2 seconds.
Should your site get posted on Slashdot, a caching reverse-proxy server will give anonymous visitor #2 and up the same page from cache (until expiration), while authenticated users continue to have their requests passed through to the Apache/PHP back-end. Everyone wins.
Continue Reading »