Archive for the 'PHP' Tag  

Regex from the dark lagoon

November 13th, 2013

Filed under: Computers and Technology , Work/Professional

Tags: , , ,

As a software developer or system admin have you ever encountered regular expressions that are just a bit too hard to understand? Kind of frustrating, right? As a rule, regular expressions are often relatively easy to write, but pretty hard to read even if you know what they are supposed to do. Then there is this bugger:


This is the most complex regex I’ve ever had need to write and I just had to share. Can you guess what it might do? 😉

Continue Reading »

Mirroring a Subversion repository on Github

December 5th, 2010

Filed under: Computers and Technology , Work/Professional

Tags: , , , ,

For the past few months I have been doing a lot of work on the phpCAS library, mostly to improve the community trunk of phpCAS so that I wouldn’t have to maintain our own custom fork with support for the CAS attribute format we use at Middlebury College. The phpCAS project lead, Joachim Fritschi, has been great to work with and I’ve had a blast helping out with the project.

The tooling has involved a few challenges however, since Jasig (the organization that hosts the CAS and phpCAS projects) uses Subversion for its source-code repositories and we use Git for all of our projects. Now, I could just suck it up and use Subversion when doing phpCAS development, but there are a few reasons I don’t:

  1. We make use of Git submodules to include phpCAS along with the source-code of our applications, necessitating the use of a public Git repository that includes phpCAS.
  2. The git-svn tools allow me to use git on my end to work with a Subversion repository, which is great because…
  3. I find that Git’s fast history browsing and searching make troubleshooting and bug fixing much easier than any other tools I’ve used.

For the past two years I have been using git-svn to work with the phpCAS repository and every so often pushing changes up to a public Git repository on GitHub. Our applications reference this repository as a submodule when they need to make use of phpCAS. Now that I’ve been doing more work on phpCAS (and am more interested in keeping our applications using up-to-date versions), I’ve decided to automate the process of mirroring the Subversion repository on GitHub. Read on for details of how I’ve set this up and the scripts for keeping the mirror in sync.

Continue Reading »

Adding reverse-proxy caching to PHP applications

June 14th, 2010

Filed under: Computers and Technology , Work/Professional

Tags: , , , ,

Note: This is a cross-post of documentation I am writing about Lazy Sessions.

Why use reverse-proxy caching?

For most public-facing web applications, the significant majority of their traffic is anonymous, non-authenticated users. Even with a variety of internal data-cache mechanisms and other good optimizations, a large amount of code execution goes into executing a PHP application to generate a page even if the content of this page will be the same for many users. Code and query optimization are very important to improving the experience for all users of a web application, but even the most basic “Hello World” script will top out at about 3k requests/second due to the overhead of Apache and PHP — many real applications top out at less than 200 requests/second. Varnish, a light-weight proxy-server that can run on the same host as the webserver, can cache pages in memory and can serve them at rates of more than 10k requests/second with thousands of concurrent connections.

While the point of web-applications is to have content be dynamic and easily changeable, for most applications and most of the anonymous users, receiving content that is slightly stale (cached for 5 minutes or something similar) isn’t a big deal. Sure, visitors to your blog might not see the latest post for a few minutes, but they will get their response in 4 milliseconds rather than 2 seconds.

Should your site get posted on Slashdot, a caching reverse-proxy server will give anonymous visitor #2 and up the same page from cache (until expiration), while authenticated users continue to have their requests passed through to the Apache/PHP back-end. Everyone wins.

Continue Reading »

High-availability Drupal — File-handling

September 9th, 2009

Filed under: Computers and Technology , Work/Professional

Tags: , ,

One of the requirements in the migration of our web sites to Drupal is that we create a robust and redundant platform that can stay running or degrade gracefully when hardware or software problems inevitably arise. While our sites get heavy use from our communities and the public, our traffic numbers are no where near those of a top-1000 site and could comfortably run off of one machine that ran both the database and web-server.

Single Machine Configuration

Single Machine Configuration

This simple configuration however has the major weakness that any hiccups in the hardware or software of the machine will likely take the site offline until the issues can be addressed. In order to give our site a better chance at staying up as failures occur, we separate some of the functional pieces of the site onto discrete machines and then ensure that each function is redundant or fail-safe. This post and the next will detail a few of the techniques we have used to build a robust site.

Continue Reading »

Twitter Export Script

October 13th, 2008

Filed under: Computers and Technology , Software


I have been using Twitter as a log of my daily doings and wished to export my time-line for reformatting into a calender format. Unfortunately TweetDumpr just retrieves the list of Tweets using a single fetch request which is limited by the Twitter API to a maximum of 200 Tweets. (Update: apparently TweetDumpr can get more than 200 Tweets. It just didn’t say so in its description.)

I wanted to export all 600+ of my tweets, so I wrote the following little php script to accomplish this. I have not yet tested it with many concurrent users or added a form to select which user to update. Until I do so, I won’t be providing it as an end-user service. You are free to put it on your own machine and use it though.


 * This script will allow the export of complete user time-lines from the twitter
 * service. It joins together all pages of status updates into one large XML block
 * that can then be reformatted/processed with other tools.
 * @since 10/13/08
 * @copyright Copyright © 2008, Adam Franco
 * @license GNU General Public License (GPL)

$user = 'afranco_work';	// Replace this with your user name.

header('Content-type: text/plain');

$allDoc = new DOMDocument;
$root = $allDoc->appendChild($allDoc->createElement('statuses'));
$root->setAttribute('type', 'array');

$page = 1;
do {
	$numStatus = 0;

	$pageDoc = new DOMDocument;
	$res = @$pageDoc->load(''.$user.'.xml?page='.$page);
	if (!$res) {
		print "\n\n**** Error loading page $page ****";
	foreach ($pageDoc->getElementsByTagName('status') as $status) {
		$root->appendChild($allDoc->importNode($status, true));

	print "\nLoaded page $page with $numStatus status updates.";

	$page ++;

} while ($numStatus);

print "\nDone loading timeline.";
print "\n\n\n";

print $allDoc->saveXml();

Usage (assuming PHP is installed)

  1. Save the code above on your machine as twitter_export.php
  2. Edit the code to change the $user variable to be your own Twitter username
  3. From the command line run php twitter_export.php
  4. Copy/paste the XML output into a file for safe keeping and further processing

Outside-In: Application Interoperability Using an OSID-Based Framework

June 25th, 2008

Filed under: Work/Professional

Tags: , ,

This post describes an interoperability demonstration given at OpeniWorld Europe 2008 in Lyon, France.


Segue and Concerto are two curricular applications built upon Harmoni, an Open Service Interface Definition-based (OSID) service-oriented application framework. This demonstration will show how website content created in Segue is stored as OSID Assets in Harmoni’s OSID Repository. Similarly, the demonstration will show how multimedia assets created in Concerto can be stored same repository. Interoperability will be demonstrated as each application is used to view and make real-time modifications to the OSID Assets created using the other application, while at the same time respecting the authorizations given to those assets. Additionally, an OSID Repository to OAI-PMH gateway will be shown providing the LibraryFind meta-search tool with access to the metadata for content created in Segue, Concerto, and a lightweight, read-only OSID Repository.

Software Demonstrated:

Segue 2.0 – Beta 20

June 9th, 2008

Filed under: Work/Professional

Tags: ,

Another week, another Segue 2 beta. This week’s installation brings visitor registration, a few new themes from Alex, theme migration from Segue 1, and a bunch of little bug fixes.

Visitor registration brings with it a few interesting challenges. As in Segue 1, we want (and need) to be able to allow people outside of the Middlebury community to join in on public discussions hosted in Segue. As well, Middlebury users often need to give access to restricted parts of their sites to people off-campus with whom they are collaborating. Our visitor registration system therefore needs to be easy to use by registrants, keep out spammers, as well as enable searches for visitor accounts by community users.

To keep out spammers, the visitor registration form uses reCAPTCHA to try to verify that a human is sitting at the browser. There are other CAPTCHA systems out there, but I like the philosophy and approach of reCAPTCHA. Starting with words that OCR software had trouble reading seems like a good idea. After the registration form is filled out, Segue sends an email to the address entered with a unique registration code. Until the link in the email is clicked on (and hence the address verified) the account is locked.

To enable easy searching of visitor accounts, visitors are asked to enter their name. While there are a few restrictions on names, these are user-chooseble. To provide some measure of differentiation between verified institution accounts and visitor accounts visitor accounts have the user-chosen name followed by their email domain name in parenthesis, e.g.:

Adam Franco (

I weighed including the entire email address as that is the only verified information we have about the visitor accounts, but I’d rather not open that information up for harvesting by spammers. If abuse becomes an issue, the visitor registration system also supports both black-lists and white-lists of email domains.

Segue 2 – The home stretch begins.

May 19th, 2008

Filed under: Work/Professional

Tags: ,

Segue 2 logoWe’ve recently announced our migration plans to the campus: We’ll be rolling out Segue 2 in mid-August for production use in the fall semester.

I’ve now been working on Segue 2 directly or indirectly for 5 years, since June 2003. It has been a long road and it is wonderful to finally be cresting the last rise. That said, as the feature-request tracker indicates, we still have a lot to do over the next 12 weeks.

This past week I rebuilt the theming system for the 4th (and last before production) time. The challenge with the theming system is that we wanted to enable end-users to choose from a few straight-forward options for things like ‘overall color scheme’, ‘font size’, corner-treatment — not all of which mapped cleanly to CSS properties. As well, to enable more powerful themes, we needed to let theme developers wrap each content type with HTML tags in order to get some effects that are just not possible with plain CSS when the dimensions of the element are not known. Our first three theming implementations involved different PHP classes for each theme with method for setting various options. Each implementation had its own strengths and weaknesses, but they were all hideously complex and required theme developers to know PHP in order to do more than change the CSS. The new theme implementation scraps all of that complexity and defines themes as a set of CSS files and HTML templates, with associated images. An extension to this simple base adds an option listing (defined in XML) that enables placeholders in the CSS and HTML templates to be replaced with values from end-user-choose-able options.

With the new theming system in place in development Alex has set to work building the first three (Rounded Corners, Shadow Box, and Tabs) of the themes that will be distributed with Segue while I’ve been finishing up the user-interfaces for choosing theme options and enabling more advanced users to customize the theme CSS and HTML in their web-browser. So far Alex and I are pretty happy with the new theming system and its simplicity should give it much longer legs than our previous attempts.

While it won’t make it to production, I eventually plan to have a theme-gallery that users can choose to publish their designs to for use by the rest of the community.

Up Next
With theming out of the way the following are some of the next areas I’ll be working on in addition to fixing bugs and working out smaller kinks:

  • Templates – starting points for sites
  • Enabling embedded videos from trusted sites (i.e. YouTube, Vimeo, etc)
  • Visitor Registration
  • Copy/Move tools for Classic Mode
  • Display of RSS feeds

Still a lot to do, but with each addition Segue 2 gets much closer to being able to take over as the primary course website system.

WordPress Enclosure Adder

September 6th, 2007

Filed under: Computers and Technology , Software

Tags: ,

I’ve recently developed a small PHP script, the WPEnclosureAdder (source | try) that goes through each item in an RSS feed, looks for links to YouTube videos or GoogleVideo videos, and then adds an enclosure tags for the videos. If multiple videos are found embedded in a post, then that post is duplicated in the feed for each additional URL to provide compatibility with the many RSS readers/video-podcast viewers that expect a single enclosure per post.

I wrote this script because I have been recently making heavy use of Miro (formerly known as “The Democracy Player“) to download videos from YouTube in order to watch them off-line. Miro also provides a nice UI for aggregating videos and remembers my spot when I go back to watching later (nice for long documentaries). Miro however, expects links to videos in RSS enclosure tags, something that WordPress (and probably other blogging software) doesn’t do for embeded videos.

Throw Away Your Telescreen is a video blog done by one of my favorite geo-political bloggers, Dave on Fire, and a few others. In it they link out to the most interesting “documentaries, lectures, and interviews that follow a different editorial line” from the corporate press. I highly recommend all of the videos on it that I have seen.

Throw Away Your Telescreen has all the makings of an indie-news channel, perfect for Miro which was developed to encourage participatory media and culture. The only thing missing was to get the videos embedded in Throw Away Your Telescreen’s posts in such a way that Miro can find them. With the WPEnclosureAdder, this has now been done. Use this feed to view Throw Away Your Telescreen in Miro.

More about the WPEnclosureAdder:

  • View the source-code of the latest version. (save-as to download)
  • License: GNU General Public License (GPL) version 3 or later
  • Requirements (for hosting it yourself): PHP version 5.2 or later
  • Git Repository:

I wrote this script with Throw Away Your Telescreen in mind, but it should work with any other WordPress blog, and probably with RSS feeds generated from other blogging tools. To point it at another blog’s RSS feed, enter the feed url in the form below:

Using my version will use my default search strings for YouTube and GoogleVideo videos. If you would like to change what is being searched for, please download the script, change the configuration, and host it on your own website. I have licensed the WPEnclosureAdder under the GNU General Public License (GPL) version 3 or later, so you are free to copy and modify this script as per the terms of that license.

KML Joiner

August 29th, 2007

Filed under: Computers and Technology , Software

Tags: , , ,

As of a few days ago, I am now able to generate KML versions of Flickr photosets for viewing in Google Earth/Maps. With that taken care of, I also want to easily combine these KML documents of images together with other KML files that show additional information, such as paths traveled, points of interest, etc.

To accomplish this task, I have written a new script, the KML Joiner that will combine any KML documents on the web together into a single (referenced) KML document. (try it out)

More Detail: for those interested in KML
The resulting document is a collection of network links, each of which points to one of the KML URLs specified. Doing this rather than combining their text together into a static KML document prevents style collisions as well as allows changes in the source data to propagate to the combined document.

Refresh intervals can optionally be specified for every source document allowing for a server-friendly combination of static data with rapidly changing data. By default, no refresh interval is specified, making the linked documents load only once when first accessed.


View the KML Joiner with fields filled in that generates the map below.

View Larger Map

The map above is of the trip mentioned in a previous blog post, but this time the data sources (1. a static KML file with the path and house placemark, 2. a dynamic KML document generated with my Photo set to KML script) joined together with the KML Joiner script instead of manually put together with a text editor.

You are welcome to use this script hosted on my site, or you can download it and run it on your own computer/webserver.

This script is available under the GNU General Public License (GPL) version 3 or later. (Source Code)

Please post any suggestions for fixes or changes. Thanks!

Next »