Archive for the 'Work/Professional' Category  

Posts on topics related to my work as a software developer at Middlebury College.

Regex from the dark lagoon

November 13th, 2013

Filed under: Computers and Technology , Work/Professional

Tags: , , ,

As a software developer or system admin have you ever encountered regular expressions that are just a bit too hard to understand? Kind of frustrating, right? As a rule, regular expressions are often relatively easy to write, but pretty hard to read even if you know what they are supposed to do. Then there is this bugger:

/^(?:([0-9]{4})[\-\/:]?(?:((?:0[1-9])|(?:1[0-2]))[\-\/:]?(?:((?:0[1-9])|(?:(?:1|2)[0-9])
|(?:3[0-1]))[\sT]?(?:((?:[0-1][0-9])|(?:2[0-4]))(?::?([0-5][0-9])?(?::?([0-5][0-9](?:\.[0-9]+)?)?
(Z|(?:([+\-])((?:[0-1][0-9])|(?:2[0-4])):?([0-5][0-9])?))?)?)?)?)?)?)?$/x

This is the most complex regex I’ve ever had need to write and I just had to share. Can you guess what it might do? 😉

Continue Reading »

Git Tip: Grouping feature-branch commits when merging.

December 12th, 2010

Filed under: Computers and Technology , Work/Professional

Tags: , ,

Let’s say you are working on a large feature or update that requires a bunch of commits to complete. You finish up with your work and are then ready to merge it onto your master branch.

For example, here is the history of my drupal repository after some work updating the cas module to the latest version (and to support the new version of phpCAS):

As you can see, I have a number of commits, followed by a merge in with the new module code, followed by some more commits.

Now, if I merge my feature branch (master-cas3-simple) into the master via

git merge  master-cas3-simple

then the history will look like this:

While the history is all there, it isn’t obvious that all of the commits beyond “Convert MS Word quote…” are a single unit of work. They all kind of blend together because git performed a “fast-forward” commit. Usually fast-forward commits are helpful since they keep the history from being cluttered with hundreds of unnecessary merge commits, but in this case we are loosing the context of these commits being a unit of work.

To preserve the grouping of these commits together I can instead force the merge operation to create a merge commit (and even append a message) by using the --no-ff option to git merge:

git merge --no-ff -m "Upgraded CAS support to to cas-6.x-3.x-dev and phpCAS 1.2.0 RC2.5" master-cas3-simple

This results in the history below:

As you can see, merging with the --no-ff option creates a merge commit which very obviously delineates work on this feature. If we decided that we wanted to roll back this feature it would be much easier to sort out where the starting point before the feature was.

Thanks to Vincent Driessen for turning me onto the utility of the the --no-ff merge option via his post “A successful Git branching model“.

Mirroring a Subversion repository on Github

December 5th, 2010

Filed under: Computers and Technology , Work/Professional

Tags: , , , ,

For the past few months I have been doing a lot of work on the phpCAS library, mostly to improve the community trunk of phpCAS so that I wouldn’t have to maintain our own custom fork with support for the CAS attribute format we use at Middlebury College. The phpCAS project lead, Joachim Fritschi, has been great to work with and I’ve had a blast helping out with the project.

The tooling has involved a few challenges however, since Jasig (the organization that hosts the CAS and phpCAS projects) uses Subversion for its source-code repositories and we use Git for all of our projects. Now, I could just suck it up and use Subversion when doing phpCAS development, but there are a few reasons I don’t:

  1. We make use of Git submodules to include phpCAS along with the source-code of our applications, necessitating the use of a public Git repository that includes phpCAS.
  2. The git-svn tools allow me to use git on my end to work with a Subversion repository, which is great because…
  3. I find that Git’s fast history browsing and searching make troubleshooting and bug fixing much easier than any other tools I’ve used.

For the past two years I have been using git-svn to work with the phpCAS repository and every so often pushing changes up to a public Git repository on GitHub. Our applications reference this repository as a submodule when they need to make use of phpCAS. Now that I’ve been doing more work on phpCAS (and am more interested in keeping our applications using up-to-date versions), I’ve decided to automate the process of mirroring the Subversion repository on GitHub. Read on for details of how I’ve set this up and the scripts for keeping the mirror in sync.

Continue Reading »

BASH tip: Top web pages

October 14th, 2010

Filed under: Computers and Technology , Work/Professional

Tags: , , ,

Here is a quick command to generate a list of the top pages in the Apache web-server’s access log:

gawk '{ print $7}' /var/log/httpd/access_log | sort | uniq -c | sort -nr | head -n 20

Parts of the command explained:

  1. gawk '{ print $7}' — return only the 7th [white-space delimited] column of text from the access log, which happens to be the path requested.
  2. sort — sort the lines of the output.
  3. uniq -c — condense the output to unique lines, prepending each line with the number of times that line occurs.
  4. sort -nr — sort the resulting lines numerically in reverse order.
  5. head -n 20 — chop off all but the first 20 lines.

The result should look something like this:

  83361 /
  49582 /feed
  39616 /robots.txt
  36265 /favicon.ico
  17048 /?feed=rss2
  10798 /archives/3
  10036 /wp-content/uploads/2007/05/img_7870_header.jpg
   9913 /wp-includes/images/smilies/icon_smile.gif
   9425 /wp-comments-post.php
   8274 /feed/
   7508 /archives/category/work/feed
   7367 /archives/88
   7312 /photos/10_small/IMG_3023.JPG.jpg
   7175 /photos/10_small/IMG_3028.JPG.jpg
   7151 /photos/10_small/IMG_3024.JPG.jpg
   7096 /photos/10_small/IMG_3026.JPG.jpg
   6381 /photosetToKML.php?set=72157594417350372&size=small
   6253 /qtvr/2007-04-05_back_deck_snow%20-%2010000x5000%20-%20SLIN%20-%20Blended%20Layer0002.jpg
   5798 /photosetToKML.php
   4344 /archives/category/photography

Adding reverse-proxy caching to PHP applications

June 14th, 2010

Filed under: Computers and Technology , Work/Professional

Tags: , , , ,

Note: This is a cross-post of documentation I am writing about Lazy Sessions.

Why use reverse-proxy caching?

For most public-facing web applications, the significant majority of their traffic is anonymous, non-authenticated users. Even with a variety of internal data-cache mechanisms and other good optimizations, a large amount of code execution goes into executing a PHP application to generate a page even if the content of this page will be the same for many users. Code and query optimization are very important to improving the experience for all users of a web application, but even the most basic “Hello World” script will top out at about 3k requests/second due to the overhead of Apache and PHP — many real applications top out at less than 200 requests/second. Varnish, a light-weight proxy-server that can run on the same host as the webserver, can cache pages in memory and can serve them at rates of more than 10k requests/second with thousands of concurrent connections.

While the point of web-applications is to have content be dynamic and easily changeable, for most applications and most of the anonymous users, receiving content that is slightly stale (cached for 5 minutes or something similar) isn’t a big deal. Sure, visitors to your blog might not see the latest post for a few minutes, but they will get their response in 4 milliseconds rather than 2 seconds.

Should your site get posted on Slashdot, a caching reverse-proxy server will give anonymous visitor #2 and up the same page from cache (until expiration), while authenticated users continue to have their requests passed through to the Apache/PHP back-end. Everyone wins.

Continue Reading »

Importing users into Bugzilla

March 8th, 2010

Filed under: Computers and Technology , Software , Work/Professional

Tags: , , ,

For the past 6 months our Web Application Development work-group has been Bugzilla as our issue tracker with quite a bit of success. While it has its warts, Bugzilla seems like a pretty decent issue-tracking system and is flexible enough to fit into a variety of different work-flows. One very important feature of Bugzilla is support for LDAP authentication. This enables any Middlebury College user to log in and report a bug using their standard campus credentials.

While LDAP authentication works great, there is one problem: If a person has never logged into our Bugzilla, we can’t add them to the CC list of an issue. This is important for us because issues usually don’t get submitted directly to the bug tracker, but rather come in via calls, emails, tweets, and face-to-face meetings. We are then left to submit issues to Bugzilla ourselves to keep track of our to-do items. Ideally we’d add the original reporter to the bug’s CC list so that they will automatically be notified as we make progress on the issue, but their Bugzilla account must exist before we can add them to the bug.

Searching about the internet I wasn’t able to find anything about how to import LDAP users (or any kind of users) into Bugzilla, though I was able to find some basic instructions on how to create a single user via Bugzilla’s Perl API. To improve on the lack of user-import support I’ve created an Perl script that creates users from lines in a tab-delimited text file (create_users.pl) as well as a companion PHP script that will export an appropriately-formatted list of users from an Active Directory (LDAP) server (export_users.php).

Continue Reading »

High-availability Drupal — File-handling

September 9th, 2009

Filed under: Computers and Technology , Work/Professional

Tags: , ,

One of the requirements in the migration of our web sites to Drupal is that we create a robust and redundant platform that can stay running or degrade gracefully when hardware or software problems inevitably arise. While our sites get heavy use from our communities and the public, our traffic numbers are no where near those of a top-1000 site and could comfortably run off of one machine that ran both the database and web-server.

Single Machine Configuration

Single Machine Configuration


This simple configuration however has the major weakness that any hiccups in the hardware or software of the machine will likely take the site offline until the issues can be addressed. In order to give our site a better chance at staying up as failures occur, we separate some of the functional pieces of the site onto discrete machines and then ensure that each function is redundant or fail-safe. This post and the next will detail a few of the techniques we have used to build a robust site.

Continue Reading »

Setting up CAS development on OS X

June 19th, 2009

Filed under: Work/Professional

Tags: , , , ,

Central Authentication Service (CAS) is a single-sign-on system for web applications written in Java that we have begun to deploy here at Middlebury College. Web applications communicate with it by forwarding users to the central login page and then checking the responces via a web-service protocol.

A few months ago Ian and I got CAS installed on campus and began updating applications to work with it rather than maintaining their own internal connections to the Active Directory server. Throughout this process we ran into a few challenges (such as returning attributes with the authentication-success response) and a bug in CAS, but we worked through these and got CAS up and running successfully.

We are now at a point where we need to do some customizations to our CAS installation to deal with changes to the group structure in the Active Directory. As well, the bug I reported was apparently fixed in a new CAS version, an improvement I need to test before we update our production installation. Both of these require a bit more poking at CAS than we can do safely in our production environment, so I am now embarking on the process of setting up a Java/Tomcat development environment on my PC. I’m documenting this process here both for my own benefit (when I have to set this up again on my laptop) and in case it helps anyone else.

Read on for my step-by-step instructions for setting up a CAS development environment on OS X.
Continue Reading »

Git Tip of the Day: Stage Hunks

January 13th, 2009

Filed under: Computers and Technology , Work/Professional

Tags: , ,

One of the great things about the Git version-control system is the ability to incrementally commit your changes on a private branch to keep a step-by-step record of your thought and writing process on a fix or a feature, and then merge the completed work onto your main [or public] branch after your feature or fix is all done and tested. By keeping an incremental log of your changes — rather than just committing one giant set of code with changes to 30 files — it becomes much easier to know why a certain line was changed in the future when bugs are discovered with it.

One thing that often happens to me though, is that I work for about a half hour to an hour trying to get a new piece of code working and in the process make several sets of changes to one file that are only loosely related.

Let’s say that I am fixing a bug in my ‘MediaLibrary’ class and while doing so notice some some spelling mistakes in some comments that I fix. Now my one file has two changes my bug fix, and the spelling fix. Rather than committing both changes together with one comment describing both changes, I can highlight one of the changes in git-gui and select the “Stage Hunk for Commit” option.

Screen-shot of Staging a Hunk of code

With that one hunk staged I can now commit with a message applicable to that change. Other changes can then be staged and committed with their own messages resulting in a very understandable history of changes.

“Stage Hunk for Commit” can also be used to commit important changes while not including debugging lines inserted in your code.

Outside-In: Application Interoperability Using an OSID-Based Framework

June 25th, 2008

Filed under: Work/Professional

Tags: , ,

This post describes an interoperability demonstration given at OpeniWorld Europe 2008 in Lyon, France.

Abstract

Segue and Concerto are two curricular applications built upon Harmoni, an Open Service Interface Definition-based (OSID) service-oriented application framework. This demonstration will show how website content created in Segue is stored as OSID Assets in Harmoni’s OSID Repository. Similarly, the demonstration will show how multimedia assets created in Concerto can be stored same repository. Interoperability will be demonstrated as each application is used to view and make real-time modifications to the OSID Assets created using the other application, while at the same time respecting the authorizations given to those assets. Additionally, an OSID Repository to OAI-PMH gateway will be shown providing the LibraryFind meta-search tool with access to the metadata for content created in Segue, Concerto, and a lightweight, read-only OSID Repository.

Software Demonstrated:

Next »