Archive for the 'web-development' Tag  

BASH tip: Top web pages

October 14th, 2010

Filed under: Computers and Technology , Work/Professional

Tags: , , ,

Here is a quick command to generate a list of the top pages in the Apache web-server’s access log:

gawk '{ print $7}' /var/log/httpd/access_log | sort | uniq -c | sort -nr | head -n 20

Parts of the command explained:

  1. gawk '{ print $7}' — return only the 7th [white-space delimited] column of text from the access log, which happens to be the path requested.
  2. sort — sort the lines of the output.
  3. uniq -c — condense the output to unique lines, prepending each line with the number of times that line occurs.
  4. sort -nr — sort the resulting lines numerically in reverse order.
  5. head -n 20 — chop off all but the first 20 lines.

The result should look something like this:

  83361 /
  49582 /feed
  39616 /robots.txt
  36265 /favicon.ico
  17048 /?feed=rss2
  10798 /archives/3
  10036 /wp-content/uploads/2007/05/img_7870_header.jpg
   9913 /wp-includes/images/smilies/icon_smile.gif
   9425 /wp-comments-post.php
   8274 /feed/
   7508 /archives/category/work/feed
   7367 /archives/88
   7312 /photos/10_small/IMG_3023.JPG.jpg
   7175 /photos/10_small/IMG_3028.JPG.jpg
   7151 /photos/10_small/IMG_3024.JPG.jpg
   7096 /photos/10_small/IMG_3026.JPG.jpg
   6381 /photosetToKML.php?set=72157594417350372&size=small
   6253 /qtvr/2007-04-05_back_deck_snow%20-%2010000x5000%20-%20SLIN%20-%20Blended%20Layer0002.jpg
   5798 /photosetToKML.php
   4344 /archives/category/photography

Adding reverse-proxy caching to PHP applications

June 14th, 2010

Filed under: Computers and Technology , Work/Professional

Tags: , , , ,

Note: This is a cross-post of documentation I am writing about Lazy Sessions.

Why use reverse-proxy caching?

For most public-facing web applications, the significant majority of their traffic is anonymous, non-authenticated users. Even with a variety of internal data-cache mechanisms and other good optimizations, a large amount of code execution goes into executing a PHP application to generate a page even if the content of this page will be the same for many users. Code and query optimization are very important to improving the experience for all users of a web application, but even the most basic “Hello World” script will top out at about 3k requests/second due to the overhead of Apache and PHP — many real applications top out at less than 200 requests/second. Varnish, a light-weight proxy-server that can run on the same host as the webserver, can cache pages in memory and can serve them at rates of more than 10k requests/second with thousands of concurrent connections.

While the point of web-applications is to have content be dynamic and easily changeable, for most applications and most of the anonymous users, receiving content that is slightly stale (cached for 5 minutes or something similar) isn’t a big deal. Sure, visitors to your blog might not see the latest post for a few minutes, but they will get their response in 4 milliseconds rather than 2 seconds.

Should your site get posted on Slashdot, a caching reverse-proxy server will give anonymous visitor #2 and up the same page from cache (until expiration), while authenticated users continue to have their requests passed through to the Apache/PHP back-end. Everyone wins.

Continue Reading »