Adam June 14th, 2010
Note: This is a cross-post of documentation I am writing about Lazy Sessions.
Why use reverse-proxy caching?
For most public-facing web applications, the significant majority of their traffic is anonymous, non-authenticated users. Even with a variety of internal data-cache mechanisms and other good optimizations, a large amount of code execution goes into executing a PHP application to generate a page even if the content of this page will be the same for many users. Code and query optimization are very important to improving the experience for all users of a web application, but even the most basic “Hello World” script will top out at about 3k requests/second due to the overhead of Apache and PHP — many real applications top out at less than 200 requests/second. Varnish, a light-weight proxy-server that can run on the same host as the webserver, can cache pages in memory and can serve them at rates of more than 10k requests/second with thousands of concurrent connections.
While the point of web-applications is to have content be dynamic and easily changeable, for most applications and most of the anonymous users, receiving content that is slightly stale (cached for 5 minutes or something similar) isn’t a big deal. Sure, visitors to your blog might not see the latest post for a few minutes, but they will get their response in 4 milliseconds rather than 2 seconds.
Should your site get posted on Slashdot, a caching reverse-proxy server will give anonymous visitor #2 and up the same page from cache (until expiration), while authenticated users continue to have their requests passed through to the Apache/PHP back-end. Everyone wins.
Continue Reading »
Adam September 9th, 2009
One of the requirements in the migration of our web sites to Drupal is that we create a robust and redundant platform that can stay running or degrade gracefully when hardware or software problems inevitably arise. While our sites get heavy use from our communities and the public, our traffic numbers are no where near those of a top-1000 site and could comfortably run off of one machine that ran both the database and web-server.

Single Machine Configuration
This simple configuration however has the major weakness that any hiccups in the hardware or software of the machine will likely take the site offline until the issues can be addressed. In order to give our site a better chance at staying up as failures occur, we separate some of the functional pieces of the site onto discrete machines and then ensure that each function is redundant or fail-safe. This post and the next will detail a few of the techniques we have used to build a robust site.
Continue Reading »
Adam October 13th, 2008
I have been using Twitter as a log of my daily doings and wished to export my time-line for reformatting into a calender format. Unfortunately TweetDumpr just retrieves the list of Tweets using a single fetch request which is limited by the Twitter API to a maximum of 200 Tweets. (Update: apparently TweetDumpr can get more than 200 Tweets. It just didn’t say so in its description.)
I wanted to export all 600+ of my tweets, so I wrote the following little php script to accomplish this. I have not yet tested it with many concurrent users or added a form to select which user to update. Until I do so, I won’t be providing it as an end-user service. You are free to put it on your own machine and use it though.
TwitterExport.php
<?php
/**
* This script will allow the export of complete user time-lines from the twitter
* service. It joins together all pages of status updates into one large XML block
* that can then be reformatted/processed with other tools.
*
* @since 10/13/08
*
* @copyright Copyright © 2008, Adam Franco
* @license http://www.gnu.org/copyleft/gpl.html GNU General Public License (GPL)
*/
$user = 'afranco_work'; // Replace this with your user name.
header('Content-type: text/plain');
$allDoc = new DOMDocument;
$root = $allDoc->appendChild($allDoc->createElement('statuses'));
$root->setAttribute('type', 'array');
$page = 1;
do {
$numStatus = 0;
$pageDoc = new DOMDocument;
$res = @$pageDoc->load('http://twitter.com/statuses/user_timeline/'.$user.'.xml?page='.$page);
if (!$res) {
print "\n\n**** Error loading page $page ****";
exit;
}
foreach ($pageDoc->getElementsByTagName('status') as $status) {
$root->appendChild($allDoc->createTextNode("\n"));
$root->appendChild($allDoc->importNode($status, true));
$numStatus++;
}
print "\nLoaded page $page with $numStatus status updates.";
flush();
$page ++;
sleep(1);
} while ($numStatus);
print "\nDone loading timeline.";
print "\n\n\n";
$root->appendChild($allDoc->createTextNode("\n"));
print $allDoc->saveXml();
Usage (assuming PHP is installed)
- Save the code above on your machine as twitter_export.php
- Edit the code to change the
$user variable to be your own Twitter username
- From the command line run
php twitter_export.php
- Copy/paste the XML output into a file for safe keeping and further processing
Adam September 6th, 2007
I’ve recently developed a small PHP script, the WPEnclosureAdder (source | try) that goes through each item in an RSS feed, looks for links to YouTube videos or GoogleVideo videos, and then adds an enclosure tags for the videos. If multiple videos are found embedded in a post, then that post is duplicated in the feed for each additional URL to provide compatibility with the many RSS readers/video-podcast viewers that expect a single enclosure per post.
I wrote this script because I have been recently making heavy use of Miro (formerly known as “The Democracy Player“) to download videos from YouTube in order to watch them off-line. Miro also provides a nice UI for aggregating videos and remembers my spot when I go back to watching later (nice for long documentaries). Miro however, expects links to videos in RSS enclosure tags, something that WordPress (and probably other blogging software) doesn’t do for embeded videos.
Throw Away Your Telescreen is a video blog done by one of my favorite geo-political bloggers, Dave on Fire, and a few others. In it they link out to the most interesting “documentaries, lectures, and interviews that follow a different editorial line” from the corporate press. I highly recommend all of the videos on it that I have seen.
Throw Away Your Telescreen has all the makings of an indie-news channel, perfect for Miro which was developed to encourage participatory media and culture. The only thing missing was to get the videos embedded in Throw Away Your Telescreen’s posts in such a way that Miro can find them. With the WPEnclosureAdder, this has now been done. Use this feed to view Throw Away Your Telescreen in Miro.
More about the WPEnclosureAdder:
- View the source-code of the latest version. (save-as to download)
- License: GNU General Public License (GPL) version 3 or later
- Requirements (for hosting it yourself): PHP version 5.2 or later
- Git Repository: http://www2.adamfranco.com/WPEnclosureAdder.git
I wrote this script with Throw Away Your Telescreen in mind, but it should work with any other WordPress blog, and probably with RSS feeds generated from other blogging tools. To point it at another blog’s RSS feed, enter the feed url in the form below:
Using my version will use my default search strings for YouTube and GoogleVideo videos. If you would like to change what is being searched for, please download the script, change the configuration, and host it on your own website. I have licensed the WPEnclosureAdder under the GNU General Public License (GPL) version 3 or later, so you are free to copy and modify this script as per the terms of that license.
Adam August 29th, 2007
As of a few days ago, I am now able to generate KML versions of Flickr photosets for viewing in Google Earth/Maps. With that taken care of, I also want to easily combine these KML documents of images together with other KML files that show additional information, such as paths traveled, points of interest, etc.
To accomplish this task, I have written a new script, the KML Joiner that will combine any KML documents on the web together into a single (referenced) KML document. (try it out)
More Detail: for those interested in KML
The resulting document is a collection of network links, each of which points to one of the KML URLs specified. Doing this rather than combining their text together into a static KML document prevents style collisions as well as allows changes in the source data to propagate to the combined document.
Refresh intervals can optionally be specified for every source document allowing for a server-friendly combination of static data with rapidly changing data. By default, no refresh interval is specified, making the linked documents load only once when first accessed.
Example:
View the KML Joiner with fields filled in that generates the map below.
View Larger Map
The map above is of the trip mentioned in a previous blog post, but this time the data sources (1. a static KML file with the path and house placemark, 2. a dynamic KML document generated with my Photo set to KML script) joined together with the KML Joiner script instead of manually put together with a text editor.
Usage:
You are welcome to use this script hosted on my site, or you can download it and run it on your own computer/webserver.
This script is available under the GNU General Public License (GPL) version 3 or later. (Source Code)
Please post any suggestions for fixes or changes. Thanks!
Adam August 23rd, 2007
One of the things I (and others) have found lacking when working with geotagged images on Flickr, is the inability to retrieve a “photo set” (Flickr’s take on a slideshow) as a KML document that can then be displayed in GoogleEarth, GoogleMaps, or other geo-browsers. Flickr provides some KML links and GeoRSS feeds, but these are either limited to 20 items or can only be pointed at tags or users’ photo-streams, not a particular photo set.
To fill this niche, I present a small script I wrote to generate a KML file from the geotagged photos in a set:
Photo Set to KML (
try it out)
Features:
- Generate a KML file from a Flickr photo set
- Directly open the KML file in Google Maps
- Choose what size image to include in the placemark description for each photo.
- Optionaly draw a path (line) from photo to photo ordered in one of several ways: by date taken, by date uploaded, by set order. Useful for making a quick and dirty map of a trip.
Examples:
- KML / GoogleMaps – A nice set of graphitti in Toronto.
View Larger Map
- KML / GoogleMaps – A set of photos from a trip I took around Turkey, with lines drawn chronologically. Since this is a large set that causes GoogleMaps to time-out, I’ve downloaded the KML file and then re-uploaded it to my website. This is the method I recommend for large photo sets.
View Larger Map
You are welcome to use this script hosted on my site, or you can download it and run it on your own computer/webserver. If you would like to run it yourself, please be aware of the following…
System Requirements:
This script is available under the GNU General Public License (GPL) version 3 or later. (Source Code)
Updates::
- 2007-08-27
- Now uses htmlspecialchars() to clean titles instead of htmlentities(), the latter of which was causing excessive translation of German characters. Thanks Stefan Geens, for pointing this out.
- Form now generates valid XHTML 1.0 strict.
- Now can use image thumbnails instead of camera icons. Thanks for the idea Nicolas Hoizey.
- 2007-08-24
- Now escapes ampersands in titles and descriptions. Thanks Jesse for pointing this out.
Future Improvement Ideas::
- Add an option for icon size.
- Add options for custom icon/path styles. I’m not sure whether to give several options, or just provide a field for a block of arbitrary KML style-markup.
Adam May 16th, 2007
Quite a few of my friends and relatives make use of Blogger/Blogspot for their weblogs. While Blogger seems to be a great service and very easy to use, what annoys me is that RSS feeds are often disabled on Blogger weblogs. Maybe people are setting this on purpose, maybe they are turning off RSS feeds unintentionally, or maybe that is the default. Either way, I read all of my news and blogs in an RSS reader. Friends & family blogs and photo-streams make up about 25 of the 100+ feeds I subscribe to. If a blog doesn’t have an RSS feed for me to subscribe to, I’m never going to remember to read it.
So, my work-around for Blogger was to write a screen-scraping RSS generator that creates an RSS feed from a Blogger weblog. BlogspotRSS is a simple PHP5 script that makes use of XPATH queries to turn the Blogger weblog into an RSS document. If you have a web server with PHP5, please download BlogspotRSS and run it on your own web server to save my bandwidth.
The BlogspotRSS script is licensed under the GNU General Public License (GPL).
Notes:
- It seems that some of the Blogger themes change the HTML quite a bit. I’ll have to fix up the RSS generator to make it work with a few more themes than the ones I tested it on…