<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AdamFranco.com &#187; Work/Professional</title>
	<atom:link href="http://www.adamfranco.com/category/work/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.adamfranco.com</link>
	<description>Musings, projects, software, and photography.</description>
	<lastBuildDate>Thu, 06 Oct 2011 19:54:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Git Tip: Grouping feature-branch commits when merging.</title>
		<link>http://www.adamfranco.com/2010/12/12/git-tip-grouping-feature-branch-commits-when-merging/</link>
		<comments>http://www.adamfranco.com/2010/12/12/git-tip-grouping-feature-branch-commits-when-merging/#comments</comments>
		<pubDate>Sun, 12 Dec 2010 18:32:02 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Computers and Technology]]></category>
		<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[Git]]></category>
		<category><![CDATA[source-control]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=478</guid>
		<description><![CDATA[Let&#8217;s say you are working on a large feature or update that requires a bunch of commits to complete. You finish up with your work and are then ready to merge it onto your master branch. For example, here is the history of my drupal repository after some work updating the cas module to the [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s say you are working on a large feature or update that requires a bunch of commits to complete. You finish up with your work and are then ready to merge it onto your master branch.</p>
<p>For example, here is the history of my drupal repository after some work updating the cas module to the latest version (and to support the new version of <a href="https://wiki.jasig.org/display/CASC/phpCAS">phpCAS</a>):<br />
<a href="http://www.adamfranco.com/files/2010/12/git-merge-0.png"><img class="aligncenter size-full wp-image-479" title="git-merge-0" src="http://www.adamfranco.com/files/2010/12/git-merge-0.png" alt="" width="100%" /></a></p>
<p>As you can see, I have a number of commits, followed by a merge in with the new module code, followed by some more commits.</p>
<p>Now, if I merge my feature branch (<code>master-cas3-simple</code>) into the <code>master</code> via</p>
<pre>git merge  master-cas3-simple</pre>
<p>then the history will look like this:<br />
<a href="http://www.adamfranco.com/files/2010/12/git-merge-ff.png"><img class="aligncenter size-full wp-image-482" title="git-merge-ff" src="http://www.adamfranco.com/files/2010/12/git-merge-ff.png" alt="" width="100%" /></a></p>
<p>While the history is all there, it isn&#8217;t obvious that all of the commits beyond &#8220;Convert MS Word quote&#8230;&#8221; are a single unit of work. They all kind of blend together because git performed a &#8220;fast-forward&#8221; commit. Usually fast-forward commits are helpful since they keep the history from being cluttered with hundreds of unnecessary merge commits, but in this case we are loosing the context of these commits being a unit of work.</p>
<p>To preserve the grouping of these commits together I can instead force the merge operation to create a merge commit (and even append a message) by using the <code>--no-ff</code> option to <code>git merge</code>:</p>
<pre>git merge --no-ff -m "Upgraded CAS support to to cas-6.x-3.x-dev and phpCAS 1.2.0 RC2.5" master-cas3-simple</pre>
<p>This results in the history below:<br />
<a href="http://www.adamfranco.com/files/2010/12/git-merge-no-ff.png"><img class="aligncenter size-full wp-image-484" title="git-merge-no-ff" src="http://www.adamfranco.com/files/2010/12/git-merge-no-ff.png" alt="" width="100%" /></a></p>
<p>As you can see, merging with the <code>--no-ff</code> option creates a merge commit which very obviously delineates work on this feature. If we decided that we wanted to roll back this feature it would be much easier to sort out where the starting point before the feature was.</p>
<div class='attribution'>
Thanks to Vincent Driessen for turning me onto the utility of the the <code>--no-ff</code> merge option via his post &#8220;<a href="http://nvie.com/posts/a-successful-git-branching-model/">A successful Git branching model</a>&#8220;.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2010/12/12/git-tip-grouping-feature-branch-commits-when-merging/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mirroring a Subversion repository on Github</title>
		<link>http://www.adamfranco.com/2010/12/05/mirroring-a-subversion-repository-on-github/</link>
		<comments>http://www.adamfranco.com/2010/12/05/mirroring-a-subversion-repository-on-github/#comments</comments>
		<pubDate>Sun, 05 Dec 2010 05:30:12 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Computers and Technology]]></category>
		<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[Git]]></category>
		<category><![CDATA[git-svn]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[source-control]]></category>
		<category><![CDATA[Subversion]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=445</guid>
		<description><![CDATA[For the past few months I have been doing a lot of work on the phpCAS library, mostly to improve the community trunk of phpCAS so that I wouldn&#8217;t have to maintain our own custom fork with support for the CAS attribute format we use at Middlebury College. The phpCAS project lead, Joachim Fritschi, has [...]]]></description>
			<content:encoded><![CDATA[<p>For the past few months I have been doing a lot of <a href="http://www.ohloh.net/p/phpcas/contributors/290013371731193">work</a> on the <a href="https://wiki.jasig.org/display/CASC/phpCAS">phpCAS library</a>, mostly to improve the community trunk of phpCAS so that I wouldn&#8217;t have to maintain our own custom fork with support for the <a href="https://issues.jasig.org/browse/PHPCAS-88">CAS attribute</a> format we use at Middlebury College. The phpCAS project lead, Joachim Fritschi, has been great to work with and I&#8217;ve had a blast helping out with the project.</p>
<p>The tooling has involved a few challenges however, since <a href="http://www.jasig.org/">Jasig</a> (the organization that hosts the <a href="http://www.jasig.org/cas">CAS</a> and phpCAS projects) uses <a href="http://subversion.apache.org/">Subversion</a> for its source-code repositories and we use <a href="http://git-scm.com/">Git</a> for all of our projects. Now, I could just suck it up and use Subversion when doing phpCAS development, but there are a few reasons I don&#8217;t:</p>
<ol>
<li>We make use of <a href="http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#submodules">Git submodules</a> to include phpCAS along with the source-code of our applications, necessitating the use of a public Git repository that includes phpCAS.</li>
<li>The <a href="http://www.kernel.org/pub/software/scm/git/docs/git-svn.html">git-svn</a> tools allow me to use git on my end to work with a Subversion repository, which is great because&#8230;</li>
<li>I find that Git&#8217;s fast history browsing and searching make troubleshooting and bug fixing much easier than any other tools I&#8217;ve used.</li>
</ol>
<p>For the past two years I have been using git-svn to work with the phpCAS repository and every so often pushing changes up to a <a href="https://github.com/adamfranco/phpcas/">public Git repository on GitHub</a>. Our applications reference this repository as a submodule when they need to make use of phpCAS. Now that I&#8217;ve been doing more work on phpCAS (and am more interested in keeping our applications using up-to-date versions), I&#8217;ve decided to automate the process of mirroring the Subversion repository on GitHub. Read on for details of how I&#8217;ve set this up and the scripts for keeping the mirror in sync.</p>
<p><span id="more-445"></span></p>
<p>On my development server I have a git repository I&#8217;ve cloned from the Jasig Subversion repository via:</p>
<pre>git svn clone --stdlayout https://source.jasig.org/cas-clients/phpcas/</pre>
<p>I use this repository for my phpCAS development and am continually using <code>git svn rebase</code> and <code>git svn dcommit</code> to update my branches from Subversion and commit changes back to the Subversion repository.</p>
<p>My goal was to fetch from Subversion and push all of the branches and tags from the Subversion repository to GitHub while ignoring any private branches or un-dcommited changes I might have kicking about my development repository.</p>
<p><strong>Step 1: Add the GitHub repository as a remote</strong></p>
<pre>git remote add github git@github.com:adamfranco/phpcas.git</pre>
<p><strong>Step 2: Fetch the latest changes from the svn repository</strong><br />
To do this, I just needed to run <code>git svn fetch</code> to import commits from the Subversion repository into my Git repository.</p>
<p><strong>Step 3: Make Git tags for Subversion tag-branches</strong><br />
I may be doing something wrong, but it seems that Subversion tags come through <code>git svn</code> as git branches rather than as git &#8220;tag&#8221; objects. Basically they are a branch with a single commit that just adds the tag message, but no content change. Using <code>git show</code> I found I could grab the parent id, message, and other metadata from the &#8220;tag-branch&#8221;, then feed that into <code>git tag</code> to create actual tag objects in the git repository.</p>
<p style="text-align: center;"><a href="http://www.adamfranco.com/files/2010/12/git-svn_tags.png"><img class="aligncenter " title="git-svn_tags" src="http://www.adamfranco.com/files/2010/12/git-svn_tags.png" alt="" width="100%" /></a></p>
<p><strong>Step 4: Push subversion branches and newly created tags to GitHub</strong><br />
When called with no parameters <code>git push</code> will push all branches that have matching names in both the source and the destination repository. This wasn&#8217;t going to work for me since I want to only automatically push the branch-state that exists in subversion (not any un-dcommitted changes in my Git repository) and want to create mirrors of any new branches that appear in the Subversion repository. To accomplish this I needed to specify every branch individually. I determined the list of branches via:</p>
<pre>git branch -r | grep -v '/' | grep -v trunk</pre>
<p>then looped through them and appended them to the hard-coded mapping between the svn &#8220;trunk&#8221; and the GitHub &#8220;master&#8221;:</p>
<pre>$cmd = 'git push --tags github refs/remotes/trunk:refs/heads/master ';
foreach ($svnBranches as $branch) {
	$cmd .= 'refs/remotes/'.$branch.':refs/heads/'.$branch.' ';
}</pre>
<p><strong>All together: <code>update_github_phpcas</code></strong><br />
Below is a script which performs the tasks above. I&#8217;ve added it to my crontab so that it runs every half-hour and keeps my GitHub repository in-sync with the Jasig Subversion repository.</p>
<div style="display: block; border: 1px dotted; padding: 5px;"><code><span style="color: #000000;"><br />
#!/usr/local/bin/php<br />
<span style="color: #0000bb;">&lt;?php<br />
</span><span style="color: #ff8000;">/**<br />
* Script to mirror a Subversion repository on GitHub or another public Git repository.<br />
*<br />
* Author: Adam Franco (afranco@middlebury.edu)<br />
* Date: 2010-12-04<br />
* License: GNU General Public License (GPL) version 2 or later.<br />
*/&nbsp;</p>
<p><span style="color: #0000bb;">chdir</span><span style="color: #007700;">(</span><span style="color: #dd0000;">'/home/afranco/private_html/phpcas/'</span><span style="color: #007700;">);</span></p>
<p><span style="color: #ff8000;">// Fetch from svn.<br />
</span><span style="color: #007700;">`</span><span style="color: #0000bb;">git svn fetch</span><span style="color: #007700;">`;</span></p>
<p><span style="color: #ff8000;">// Lookup all of the svn branches<br />
</span><span style="color: #0000bb;">$svnBranches </span><span style="color: #007700;">= </span><span style="color: #0000bb;">explode</span><span style="color: #007700;">(</span><span style="color: #dd0000;">"\n"</span><span style="color: #007700;">, </span><span style="color: #0000bb;">trim</span><span style="color: #007700;">(`</span><span style="color: #0000bb;">git branch -r | grep -v '/' | grep -v trunk</span><span style="color: #007700;">`));<br />
</span><span style="color: #0000bb;">$svnBranches </span><span style="color: #007700;">= </span><span style="color: #0000bb;">array_map</span><span style="color: #007700;">(</span><span style="color: #dd0000;">'trim'</span><span style="color: #007700;">, </span><span style="color: #0000bb;">$svnBranches</span><span style="color: #007700;">);</span></p>
<p><span style="color: #ff8000;">// Add all of our branches to the list of those to push<br />
</span><span style="color: #0000bb;">$cmd </span><span style="color: #007700;">= </span><span style="color: #dd0000;">'git push --tags github refs/remotes/trunk:refs/heads/master '</span><span style="color: #007700;">;<br />
foreach (</span><span style="color: #0000bb;">$svnBranches </span><span style="color: #007700;">as </span><span style="color: #0000bb;">$branch</span><span style="color: #007700;">) {<br />
</span><span style="color: #0000bb;">$cmd </span><span style="color: #007700;">.= </span><span style="color: #dd0000;">'refs/remotes/'</span><span style="color: #007700;">.</span><span style="color: #0000bb;">$branch</span><span style="color: #007700;">.</span><span style="color: #dd0000;">':refs/heads/'</span><span style="color: #007700;">.</span><span style="color: #0000bb;">$branch</span><span style="color: #007700;">.</span><span style="color: #dd0000;">' '</span><span style="color: #007700;">;<br />
}</span></p>
<p><span style="color: #ff8000;">// Ensure that Git tags are created for every SVN tag branch.<br />
</span><span style="color: #0000bb;">$svnBranches </span><span style="color: #007700;">= </span><span style="color: #0000bb;">explode</span><span style="color: #007700;">(</span><span style="color: #dd0000;">"\n"</span><span style="color: #007700;">, </span><span style="color: #0000bb;">trim</span><span style="color: #007700;">(`</span><span style="color: #0000bb;">git branch -r | grep 'tags/'</span><span style="color: #007700;">`));<br />
</span><span style="color: #0000bb;">$svnBranches </span><span style="color: #007700;">= </span><span style="color: #0000bb;">array_map</span><span style="color: #007700;">(</span><span style="color: #dd0000;">'trim'</span><span style="color: #007700;">, </span><span style="color: #0000bb;">$svnBranches</span><span style="color: #007700;">);<br />
foreach (</span><span style="color: #0000bb;">$svnBranches </span><span style="color: #007700;">as </span><span style="color: #0000bb;">$svnTag</span><span style="color: #007700;">) {<br />
</span><span style="color: #0000bb;">$ref </span><span style="color: #007700;">= </span><span style="color: #dd0000;">"refs/remotes/$svnTag"</span><span style="color: #007700;">;<br />
</span><span style="color: #0000bb;">$parent </span><span style="color: #007700;">= </span><span style="color: #0000bb;">shell_exec</span><span style="color: #007700;">(</span><span style="color: #dd0000;">"git show --format=\"format:%P\" $ref"</span><span style="color: #007700;">);</span></p>
<p><span style="color: #ff8000;">// If there are no tags on the parent of the tag branch, add one.<br />
</span><span style="color: #007700;">if (!</span><span style="color: #0000bb;">strlen</span><span style="color: #007700;">(</span><span style="color: #0000bb;">trim</span><span style="color: #007700;">(`</span><span style="color: #0000bb;">git tag --contains $parent</span><span style="color: #007700;">`))) {<br />
</span><span style="color: #0000bb;">$message </span><span style="color: #007700;">= </span><span style="color: #0000bb;">shell_exec</span><span style="color: #007700;">(</span><span style="color: #dd0000;">"git show --format=\"format:%s%ntagged by %aN on %aD\" $ref"</span><span style="color: #007700;">);<br />
</span><span style="color: #0000bb;">$date </span><span style="color: #007700;">= </span><span style="color: #0000bb;">shell_exec</span><span style="color: #007700;">(</span><span style="color: #dd0000;">"git show --format=\"format:%ai\" $ref"</span><span style="color: #007700;">);<br />
</span><span style="color: #0000bb;">$tagName </span><span style="color: #007700;">= </span><span style="color: #0000bb;">str_replace</span><span style="color: #007700;">(</span><span style="color: #dd0000;">'tags/'</span><span style="color: #007700;">, </span><span style="color: #dd0000;">''</span><span style="color: #007700;">, </span><span style="color: #0000bb;">$svnTag</span><span style="color: #007700;">);</span></p>
<p><span style="color: #0000bb;">$tagCmd </span><span style="color: #007700;">= </span><span style="color: #dd0000;">'GIT_COMMITTER_DATE="'</span><span style="color: #007700;">.</span><span style="color: #0000bb;">$date</span><span style="color: #007700;">.</span><span style="color: #dd0000;">'" git tag -a -m "'</span><span style="color: #007700;">.</span><span style="color: #0000bb;">$message</span><span style="color: #007700;">.</span><span style="color: #dd0000;">'" '</span><span style="color: #007700;">.</span><span style="color: #0000bb;">$tagName</span><span style="color: #007700;">.</span><span style="color: #dd0000;">' '</span><span style="color: #007700;">.</span><span style="color: #0000bb;">$parent</span><span style="color: #007700;">;<br />
</span><span style="color: #ff8000;">#        print $tagCmd ."\n";<br />
#        print "Creating tag $tagName\n";<br />
</span><span style="color: #007700;">`</span><span style="color: #0000bb;">$tagCmd</span><span style="color: #007700;">`;<br />
}<br />
}</span></p>
<p><span style="color: #ff8000;">#print $cmd;<br />
#print "\n";</span></p>
<p><span style="color: #0000bb;">$output </span><span style="color: #007700;">= `</span><span style="color: #0000bb;">$cmd 2&gt;&amp;1</span><span style="color: #007700;">`;</span></p>
<p>if (<span style="color: #0000bb;">trim</span><span style="color: #007700;">(</span><span style="color: #0000bb;">$output</span><span style="color: #007700;">) != </span><span style="color: #dd0000;">'Everything up-to-date'</span><span style="color: #007700;">)<br />
print </span><span style="color: #0000bb;">$output</span><span style="color: #007700;">.</span><span style="color: #dd0000;">"\n"</span><span style="color: #007700;">;</span></p>
<p></span></span></code>&nbsp;</p>
<p><code> </code>&nbsp;</p>
</div>
<p>I think that this script should work with very few changes for mirroring any Subversion repository as a Git repository.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2010/12/05/mirroring-a-subversion-repository-on-github/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>BASH tip: Top web pages</title>
		<link>http://www.adamfranco.com/2010/10/14/bash-tip-top-web-pages/</link>
		<comments>http://www.adamfranco.com/2010/10/14/bash-tip-top-web-pages/#comments</comments>
		<pubDate>Thu, 14 Oct 2010 16:00:04 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Computers and Technology]]></category>
		<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[BASH]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[web-development]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=431</guid>
		<description><![CDATA[Here is a quick command to generate a list of the top pages in the Apache web-server&#8217;s access log: gawk '{ print $7}' /var/log/httpd/access_log &#124; sort &#124; uniq -c &#124; sort -nr &#124; head -n 20 Parts of the command explained: gawk '{ print $7}' &#8212; return only the 7th [white-space delimited] column of text [...]]]></description>
			<content:encoded><![CDATA[<p>Here is a quick command to generate a list of the top pages in the Apache web-server&#8217;s access log:</p>
<p><code>gawk '{ print $7}' /var/log/httpd/access_log | sort | uniq -c | sort -nr | head -n 20</code></p>
<p>Parts of the command explained:</p>
<ol>
<li><code>gawk '{ print $7}' </code> &#8212; return only the 7th [white-space delimited] column of text from the access log, which happens to be the path requested.</li>
<li><code>sort </code> &#8212; sort the lines of the output.</li>
<li><code>uniq -c </code> &#8212; condense the output to unique lines, prepending each line with the number of times that line occurs.</li>
<li><code>sort -nr </code>&#8211; sort the resulting lines numerically in reverse order.</li>
<li><code>head -n 20 </code> &#8212; chop off all but the first 20 lines.</li>
</ol>
<p>The result should look something like this:</p>
<pre>  83361 /
  49582 /feed
  39616 /robots.txt
  36265 /favicon.ico
  17048 /?feed=rss2
  10798 /archives/3
  10036 /wp-content/uploads/2007/05/img_7870_header.jpg
   9913 /wp-includes/images/smilies/icon_smile.gif
   9425 /wp-comments-post.php
   8274 /feed/
   7508 /archives/category/work/feed
   7367 /archives/88
   7312 /photos/10_small/IMG_3023.JPG.jpg
   7175 /photos/10_small/IMG_3028.JPG.jpg
   7151 /photos/10_small/IMG_3024.JPG.jpg
   7096 /photos/10_small/IMG_3026.JPG.jpg
   6381 /photosetToKML.php?set=72157594417350372&#038;size=small
   6253 /qtvr/2007-04-05_back_deck_snow%20-%2010000x5000%20-%20SLIN%20-%20Blended%20Layer0002.jpg
   5798 /photosetToKML.php
   4344 /archives/category/photography</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2010/10/14/bash-tip-top-web-pages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding reverse-proxy caching to PHP applications</title>
		<link>http://www.adamfranco.com/2010/06/14/adding-reverse-proxy-caching-to-php-applications/</link>
		<comments>http://www.adamfranco.com/2010/06/14/adding-reverse-proxy-caching-to-php-applications/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 16:03:57 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Computers and Technology]]></category>
		<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[reverse-proxy]]></category>
		<category><![CDATA[Varnish]]></category>
		<category><![CDATA[web-development]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=426</guid>
		<description><![CDATA[Note: This is a cross-post of documentation I am writing about Lazy Sessions. Why use reverse-proxy caching? For most public-facing web applications, the significant majority of their traffic is anonymous, non-authenticated users. Even with a variety of internal data-cache mechanisms and other good optimizations, a large amount of code execution goes into executing a PHP [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: This is a cross-post of <a href="http://wiki.github.com/adamfranco/lazy_sessions/adding-reverse-proxy-caching-to-php-applications">documentation I am writing about Lazy Sessions</a>.</em></p>
<h1>Why use reverse-proxy caching?</h1>
<p>For most public-facing web applications, the significant majority of their traffic is anonymous, non-authenticated users. Even with a variety of internal data-cache mechanisms and other good optimizations, a large amount of code execution goes into executing a <span class="caps">PHP</span> application to generate a page even if the content of this page will be the same for many users. Code and query optimization are very important to improving the experience for all users of a web application, but even the most basic &ldquo;Hello World&rdquo; script will top out at about 3k requests/second due to the overhead of Apache and <span class="caps">PHP</span> &mdash; many real applications top out at less than 200 requests/second. Varnish, a light-weight proxy-server that can run on the same host as the webserver, can cache pages in memory and can serve them at rates of more than 10k requests/second with thousands of concurrent connections.</p>
<p>While the point of web-applications is to have content be dynamic and easily changeable, for most applications and most of the anonymous users, receiving content that is slightly stale (cached for 5 minutes or something similar) isn&rsquo;t a big deal. Sure, visitors to your blog might not see the latest post for a few minutes, but they will get their response in 4 milliseconds rather than 2 seconds.</p>
<p>Should your site get posted on Slashdot, a caching reverse-proxy server will give anonymous visitor #2 and up the same page from cache (until expiration), while authenticated users continue to have their requests passed through to the Apache/<span class="caps">PHP</span> back-end. Everyone wins.</p>
<p><span id="more-426"></span></p>
<h1>Caveats</h1>
<p>Before we get into how to set this up, you should be aware of a few caveats (in addition to increased complexity) that come with this scheme.</p>
<h2>1. Stale Content</h2>
<p>Ideally, pages would always be served from the cache for as long as they don&rsquo;t change, then the application would expire pages when they are changed on the back-end. Varnish has an <span class="caps">API</span> that supports this behavior and <a href="http://drupal.org/project/Varnish">Drupal Varnish module</a> is being developed to do this dynamic cache-clearing for Drupal sites, but overall, dynamic cache clearing is much more difficult to set up than time-based cache expiration.</p>
<p>When using time-based cache expiration, the challenge is to balance the needs for content freshness (shorter cache lifetimes) against the efficiency of cache hits (longer cache lifetimes will result in more clients using the cached versions). For content that doesn&rsquo;t need to be up-to-the-minute fresh, a cache lifetime of around 5 minutes might be a good starting point. If the content only changes daily at certain time, a fixed expiration time (shortly after the data sync) might be appropriate.</p>
<h2>2. Cookie Use</h2>
<p>If your application only uses a cookies set by PHP&rsquo;s <code>session_start()</code> function, then <code>lazy_sessions.php</code> should work transparently without modification of either that include file or your application (other than including the file). If your application sets other cookies then these will cause the reverse-proxy not to cache them unless you specifically exclude them in the reverse-proxy server&rsquo;s configuration.</p>
<h2>3. Data Caching in the <code>$_SESSION</code></h2>
<p>If you use the <code>$_SESSION</code> array as a data cache on anonymous requests, then these anonymous requests will be given a session cookie and their requests won&rsquo;t be served from the reverse-proxy&rsquo;s cache. Rather than using the <code>$_SESSION</code> array for non-user-specific data, cache such data with <span class="caps">APC</span> or memcached. This also has the advantage of such non-user-specific data not having to be rebuilt for every new client.</p>
<h2>4. <code>flush()</code> and output buffering</h2>
<p>The default <span class="caps">PHP</span> session handling mechanism adds the session cookie to the response headers right when <code>session_start()</code> is called and writes the data off to the file-system after the script exits and the data has been sent. This default behavior ensures that users will always get a session cookie and saves the session data as the final processing step after all class destructors have been called.</p>
<p>Since we don&rsquo;t want to always set a session cookie, we need to remove the <code>Set-Cookie</code> header before headers are sent to the client. Output buffering with <code>ob_start()</code> will ensure that we have a chance to decide to clear the <code>Set-Cookie</code> header at script shutdown.</p>
<p>In some cases (such as incrementally sending large binary files) we want to send the content body (and therefor also the headers) before the script exits using the <code>flush()</code> function. To ensure that the session cookie is properly removed <code>session_write_close()</code> must be called before <code>flush()</code> or any other code that causes headers to be sent.</p>
<h1>Implementation</h1>
<p>Implementing reverse-proxy caching has three steps: <span class="caps">PHP</span> changes to enable lazy sessions, <span class="caps">PHP</span> changes to set cache-controlling headers, and finally the reverse-proxy server setup. For this example I&rsquo;ll use the Varnish reverse-proxy server, but others could be used instead.</p>
<h2>1. <span class="caps">PHP</span>: Lazy Sessions</h2>
<p>The first thing that needs to happen to make anonymous requests cache-able in an application that uses sessions is to ensure that sessions are only started when there is session data to be stored. By default, PHP&rsquo;s session handling mechanisms add a session cookie to the response header and store a session data file on the server on page-load that calls <code>session_start()</code>. While this behavior makes it easy to write applications that use sessions, it effectively means that there is no way to differentiate between responses that are for a particular user and those that could be for many users.</p>
<p>Including the <a href="http://github.com/adamfranco/lazy_sessions/blob/master/lazy_sessions.php"><code>lazy_sessions.php</code> file</a> before <code>session_start()</code> is called will override the default session-handling mechanism with one that checks to see if there is any data in the <code>$_SESSION</code> array before sending the user a <code>Set-Cookie</code> header and storing a session file:</p>
<pre>
<code>&lt;?php

// Include files or other pre-session_start code

require_once('lazy_sessions/lazy_sessions.php');
start_session();

// The rest of the application code.
?&gt;
</code>
</pre>
<p>If your application needs to flush content and thereby send headers before script shutdown (such as incrementally sending file data), call <code>session_write_close()</code> if <code>session_start()</code> has been called for that script:</p>
<pre>
<code>&lt;?php

// Include files or other pre-session_start code

require_once('lazy_sessions/lazy_sessions.php');
start_session();

// other application code.

// If session_write_close() is not called before flushing, then the Set-Cookie
// header will be sent before our custom session handler has a chance to determine
// if a session is even needed.
session_write_close();

print "Hello";
flush();
print " World.";
flush();

?&gt;
</code>
</pre>
<h2>2. <span class="caps">PHP</span>: Cache-Control headers</h2>
<p>Now that we have our cookies straightened out, we need to ensure that our <span class="caps">PHP</span> scripts respond with <span class="caps">HTTP</span> headers that indicate that downstream clients such as our reverse-proxy and the user&rsquo;s browser are allowed to cache anonymous pages. There are a number of different <a href="http://wiki.github.com/adamfranco/lazy_sessions/cache-controlling-headers">Cache-Controlling Headers</a> that may affect whether a particular cache may store a given response. By default, <span class="caps">PHP</span> sets all of these headers to indicate that no caches may store any pages, ensuring that they are dynamic.</p>
<pre>
<code>&lt;?php

// If the session data is empty, then we could assume that there is no per-user data
// and that the response can be cached.
if (!count($_SESSION)) {

// Alternatively, we could check an application-specific value (such as a user-id)
// to determine if the response is for a particular user.
// if (!isset($_SESSION['user_id'])) {

// Cache for 5 minutes
$maxAge = 300;

header('Expires: '.gmdate('D, d M Y H:i:s', time() + $maxAge).' GMT', true);
header('Cache-Control: public, max-age='.$maxAge, true);
header('Pragma: ', true);
}

header('Vary: Cookie,Accept-Encoding', true);
</code>
</pre>
<p>The two most important headers with regard to caching with varnish are the following:</p>
<h3>The <code>Cache-Control</code> header.</h3>
<p>The <code>Cache-Control: public, max-age=300</code> header indicates to any clients (such as the Varnish caching proxy) that this response can be cached in public caches valid for many downstream clients. The <code>max-age</code> portion of the header indicates that the cache may store this response for 300 seconds.</p>
<p>As I understand it (possibly wrong) Varnish only looks at the <code>max-age</code> portion of the <code>Cache-Control</code> header when determining how long to store a response. Apparently it ignores the <code>Expires</code> header for its cache-expiration purposes, though this header is passed on to downstream clients.</p>
<h3>The <code>Vary</code> header</h3>
<p>The <code>Vary: Cookie,Accept-Encoding</code> header tells Varnish (and in-browser caches) that they should not respond with the cached version of a response if the request includes a cookie or a different cookie from the request that previously had its response cached. Similarly, if one client says that it accepts gzip encoding via an <code>Accept-Encoding: gzip</code> request header, then the cached response may be compressed with gzip and should not be sent in response to requests from clients that do not state that they accept gzip encoding.</p>
<p>While Varnish&rsquo;s behavior is to never cache or respond from cache when cookies are present, without the <code>Vary: Cookie</code> response header, browsers or other downstream caches may respond with a cached response valid for only anonymous users even though a cookie is now present.</p>
<p>See my notes on <a href="http://wiki.github.com/adamfranco/lazy_sessions/cache-controlling-headers">Cache-Controlling Headers</a> for more details about other headers and how they affect the Varnish cache and in-browser caches.</p>
<h2>3. Varnish (Reverse-Proxy) Configuration</h2>
<p>The <code>/etc/varnish/default.vcl</code> config file controls how Varnish responds to requests and responses, in particular whether or not it should cache or not. Below is the contents of my <code>default.vcl</code> file.</p>
<div>
<strong>Notes:</strong></p>
<ol>
<li>The backend portion is the default, you probably will want to modify this to point at your correct backend hosts and ports.</li>
<li>The <code>vcl_recv</code> and <code>vcl_hash</code> sections come directly from the <a href="https://wiki.fourkitchens.com/display/PF/Configure+Varnish+for+Pressflow?focusedCommentId=15335604">Pressflow wiki</a> and are set up to allow requests that include Google Analytics cookies to be cached while not caching requests that include other cookies.</li>
<li>The <code>vcl_fetch</code> section is the default with my addition of the lines to unset empty Set-Cookie headers that can&rsquo;t be removed from within <span class="caps">PHP</span> &lt; 5.3.</li>
</ol>
</div>
<pre>
<code>
backend default {
.host = "127.0.0.1";
.port = "80";
}

sub vcl_recv {
// Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
// Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
// Remove empty cookies.
if (req.http.Cookie ~ "^\s*$") {
unset req.http.Cookie;
}

// Cache all requests by default, overriding the
// standard Varnish behavior.
// if (req.request == "GET" || req.request == "HEAD") {
//   return (lookup);
// }
}

sub vcl_hash {
if (req.http.Cookie) {
set req.hash += req.http.Cookie;
}
}

sub vcl_fetch {
if (!beresp.cacheable) {
	return (pass);
}

// If using PHP &lt; 5.3 there is no way to fully delete headers, so empty
// Set-Cookie headers may be in the response. Ignore these empty headers.
if (beresp.http.Set-Cookie ~ "^\s*$") {
	unset beresp.http.Set-Cookie;
}

if (beresp.http.Set-Cookie) {
	return (pass);
}
return (deliver);
}
</code>
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2010/06/14/adding-reverse-proxy-caching-to-php-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Importing users into Bugzilla</title>
		<link>http://www.adamfranco.com/2010/03/08/importing-users-into-bugzilla/</link>
		<comments>http://www.adamfranco.com/2010/03/08/importing-users-into-bugzilla/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 04:11:45 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Computers and Technology]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[Bugzilla]]></category>
		<category><![CDATA[import]]></category>
		<category><![CDATA[LDAP]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=374</guid>
		<description><![CDATA[For the past 6 months our Web Application Development work-group has been Bugzilla as our issue tracker with quite a bit of success. While it has its warts, Bugzilla seems like a pretty decent issue-tracking system and is flexible enough to fit into a variety of different work-flows. One very important feature of Bugzilla is [...]]]></description>
			<content:encoded><![CDATA[<p>For the past 6 months our <a href="http://go.middlebury.edu/webservices">Web Application Development work-group</a> has been Bugzilla as our issue tracker with quite a bit of success. While it has its warts, Bugzilla seems like a pretty decent issue-tracking system and is flexible enough to fit into a variety of different work-flows. One very important feature of Bugzilla is support for LDAP authentication. This enables any Middlebury College user to log in and report a bug using their standard campus credentials.</p>
<p>While LDAP authentication works great, there is one problem: If a person has never logged into our Bugzilla, we can&#8217;t add them to the CC list of an issue. This is important for us because issues usually don&#8217;t get submitted directly to the bug tracker, but rather come in via calls, emails, tweets, and face-to-face meetings. We are then left to submit issues to Bugzilla ourselves to keep track of our to-do items. Ideally we&#8217;d add the original reporter to the bug&#8217;s CC list so that they will automatically be notified as we make progress on the issue, but their Bugzilla account must exist before we can add them to the bug.</p>
<p>Searching about the internet I wasn&#8217;t able to find anything about how to import LDAP users (or any kind of users) into Bugzilla, though I was able to find some <a href="http://groups.google.com/group/mozilla.support.bugzilla/browse_thread/thread/165d4fc1a8b4ad82/b1e31ad20bfef3f0">basic instructions</a> on how to create a single user via Bugzilla&#8217;s Perl API. To improve on the lack of user-import support I&#8217;ve created an Perl script that creates users from lines in a tab-delimited text file (<code>create_users.pl</code>) as well as a companion PHP script that will export an appropriately-formatted list of users from an Active Directory (LDAP) server (<code>export_users.php</code>).</p>
<p><span id="more-374"></span><br />
<a href='http://www.adamfranco.com/files/2010/03/BugzillaImport.zip'>BugzillaImport.zip</a> &#8212; Unzip in your Bugzilla directory, run via the command line. See below for examples.</p>
<h1>File Listings:</h1>
<h2>create_users.pl</h2>
<p>This script can safely be run repeatedly. Only new users not already in Bugzilla will be added, users matching existing email addresses will be skipped.</p>
<pre>#!/usr/bin/env perl
##########################################################
# This is a basic script to import users into Bugzilla.
#
# Users can be imported from tab-delimited text files or
# tab-delimited lines piped to STDIN. Lines should have 3
# columns: login	email	name
#
#
# Author:
#	Adam Franco (afranco@middlebury.edu)
# Date:
#	2010-03-08
# URL:
#	http://www.adamfranco.com/archives/374
# License:
#   The contents of this file are subject to the Mozilla Public
#   License Version 1.1 (the "License"); you may not use this file
#   except in compliance with the License. You may obtain a copy of
#   the License at http://www.mozilla.org/MPL/
#
#   Software distributed under the License is distributed on an "AS
#   IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
#   implied. See the License for the specific language governing
#   rights and limitations under the License.
##########################################################

use FindBin qw($Bin);
BEGIN {
    push @INC,$Bin;
    push @INC,$Bin."/lib";
    push @INC,$Bin."/lib/x86_64-linux-thread-multi";
}
use Bugzilla;
use Bugzilla::User;
use Error qw(:try);

sub usage {
    print "
Usage:
    $0 ListOfUsers1.txt [ListOfUsers2.txt [...]]
    $0 < ListOfUsers.txt

The ListOfUsers can be passed as either a file argument or passed to STDIN.

The ListOfUsers must be tab-delimited with the following columns:
login   email   name

";
    exit 1;
}

foreach (@ARGV) {
    if ($_ =~ /^-h|--help$/) {
        usage();
    }
}

my $lines = 0;
my $users = 0;
my $usersAdded = 0;
while (<>) {
    chomp; # Remove the trailing new-line.
    my($login, $email, $name) = split(/\t/, $_);

    if ($login &#038;&#038; $email &#038;&#038; $name &#038;&#038; $login =~ /[a-z0-9]+/ &#038;&#038;  $email =~ /[a-z0-9]+.*@.*[a-z0-9]+/ &#038;&#038; $name =~ /[a-z]+/) {
        if (is_available_username($email)) {
            try {
                my $user = Bugzilla::User->create({
                    login_name    => $email,
                    realname      => $name,
                    cryptpassword => '*',
                    disable_mail  => 0,
                    extern_id     => $login
                });
                print "Account for " . $user->login . " was created.\n";
                $usersAdded++;
            } catch Error with {
                my $ex = shift;
                my $error = "Error: $ex";
                $error =~ s/\n|\r/ /g;
                print $error."\n";
            };
        }

        $users++;
    }
    $lines++;
    close (ARGV) if (eof);
}

if (!$lines) {
    print "No input lines given.\n\n";
    usage();
}

print "\n$lines lines evaluated, $users user records checked, $usersAdded users added.\n";

exit 0;
</pre>
<h2>export_users.php</h2>
<pre>#!/usr/bin/env php
&lt;?php
##########################################################
# This is a basic script to export users from an
# MS Active Directory via LDAP in the format required
# by create_users.pl.
#
# Authors:
#	Adam Franco (afranco@middlebury.edu)
#	Ian McBride (imcbride@middlebury.edu)
# Date:
#	2010-03-08
# URL:
#	http://www.adamfranco.com/archives/374
# License:
#   The contents of this file are subject to the Mozilla Public
#   License Version 1.1 (the "License"); you may not use this file
#   except in compliance with the License. You may obtain a copy of
#   the License at http://www.mozilla.org/MPL/
#
#   Software distributed under the License is distributed on an "AS
#   IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
#   implied. See the License for the specific language governing
#   rights and limitations under the License.
##########################################################

$ldaphost = "ldap.example.com";
$ldapport = 389;
$ldapuser = "username";
$ldappass = "password";
$baseDN = "DC=example,DC=com";

$connection = ldap_connect($ldaphost, $ldapport);

if (!$connection) die();

if (ldap_set_option($connection, LDAP_OPT_PROTOCOL_VERSION,3) === FALSE) die();

if (ldap_set_option($connection, LDAP_OPT_REFERRALS,0) === FALSE) die();

$bind = ldap_bind($connection, $ldapuser, $ldappass);

if (!$bind) die();

$filter = "(&#038;(objectClass=User)(!(objectClass=Computer)))";

$search = ldap_search($connection, $baseDN, $filter, array("samaccountname", "mail", "givenname", "sn"));

$entries = ldap_get_entries($connection, $search);

print "samaccountname\temail\tname\n";

foreach($entries as $entry) {
  if(isset($entry['samaccountname'])) {
    print iconv('UTF-8', 'UTF-8//IGNORE', $entry['samaccountname'][0]);
  }
  print "\t";

  if(isset($entry['mail'])) {
    print iconv('UTF-8', 'UTF-8//IGNORE', $entry['mail'][0]);
  }
  print "\t";

  $name = '';
  if(isset($entry['givenname'])) {
    $name .= iconv('UTF-8', 'UTF-8//IGNORE', $entry['givenname'][0]);
  }  $name .= ' ';
  if(isset($entry['sn'])) {
    $name .= iconv('UTF-8', 'UTF-8//IGNORE', $entry['sn'][0]);
  }
  print trim($name);

  print "\n";
}
</pre>
<h1>Example Usage</h1>
<p>After unzipping the scripts in your Bugzilla directory you can use the <code>create_users.pl</code> script right away. To use <code>export_users.php</code> you will need to edit it and add your LDAP server configuration.<br />
<code>[root@hostname /var/www/htdocs/bugzilla/]# ./export_users.php | ./create_users.pl</code></p>
<p>If you&#8217;d rather import users from another source, simply create one or more tab-delimited text files that have the following columns:<br />
login&nbsp;&nbsp;&nbsp;&nbsp;email&nbsp;&nbsp;&nbsp;&nbsp;name<br />
<code>[root@hostname /var/www/htdocs/bugzilla/]# ./create_users.pl users.txt otherusers.txt</code></p>
<p>You can pipe tab-delimited data to the script as well:<br />
<code>[root@hostname /var/www/htdocs/bugzilla/]# head -n 20 users.txt | ./create_users.pl</code></p>
<h2>Update:</h2>
<ul>
<li>Changed the license statement to the MPL be compatible with the rest of Bugzilla</li>
<li>Changed the password to &#8216;*&#8217; based on Max&#8217;s suggestion</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2010/03/08/importing-users-into-bugzilla/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>High-availability Drupal &#8212; File-handling</title>
		<link>http://www.adamfranco.com/2009/09/09/high-availability-drupal-file-handling/</link>
		<comments>http://www.adamfranco.com/2009/09/09/high-availability-drupal-file-handling/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 21:18:27 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Computers and Technology]]></category>
		<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=266</guid>
		<description><![CDATA[One of the requirements in the migration of our web sites to Drupal is that we create a robust and redundant platform that can stay running or degrade gracefully when hardware or software problems inevitably arise. While our sites get heavy use from our communities and the public, our traffic numbers are no where near [...]]]></description>
			<content:encoded><![CDATA[<p>One of the requirements in the migration of our <a href="http://www.middlebury.edu/">web</a> <a href="http://www.miis.edu">sites</a> to <a href="http://drupal.org/">Drupal</a> is that we create a robust and redundant platform that can stay running or degrade gracefully when hardware or software problems inevitably arise. While our sites get heavy use from our communities and the public, our traffic numbers are no where near those of a top-1000 site and could comfortably run off of one machine that ran both the database and web-server.<br />
<div id="attachment_297" class="wp-caption aligncenter" style="width: 541px"><img src="http://www.adamfranco.com/files/2009/09/1-SingleMachine.jpg" alt="Single Machine Configuration" title="Single Machine Configuration" width="531" height="332" class="size-full wp-image-297" /><p class="wp-caption-text">Single Machine Configuration</p></div><br />
This simple configuration however has the major weakness that any hiccups in the hardware or software of the machine will likely take the site offline until the issues can be addressed. In order to give our site a better chance at staying up as failures occur, we separate some of the functional pieces of the site onto discrete machines and then ensure that each function is redundant or fail-safe. This post and the next will detail a few of the techniques we have used to build a robust site.</p>
<p><span id="more-266"></span></p>
<h2>Pull out the database, use multiple web-servers</h2>
<p>The two main components of Drupal (and most similar web applications) are the webserver, which handles PHP execution and file-serving; and the MySQL database, which stores all data with the exception of uploaded files. By putting the database on a separate machine we can can have multiple machines acting as front-end web-servers, both of them reading and writing to the same database. In this way, it doesn&#8217;t matter which web-server handles a given request as they will both get the same information out of the database. With two or more web-servers, our platform gains some redundancy since one web-server can fail while the second keeps handling requests.</p>
<p>With both web-servers point at the same database server, the database server still remains a single point of failure. Database clustering can alleviate this problem, but will be the subject of a future post.</p>
<h2>Multiple web-server challenges</h2>
<p>This redundancy does come at a cost in complexity however, since we need to ensure that any uploaded files are available on both web-servers. There seem to be <a href="http://groups.drupal.org/node/1648">two primary ways</a> of tackling this problem (without resorting to costly and complex distributed file-system tools). The first is use rsync to copy files between the web-servers every few minutes.<br />
<div id="attachment_299" class="wp-caption aligncenter" style="width: 610px"><img src="http://www.adamfranco.com/files/2009/09/2a-Two-Web-servers-rsync.jpg" alt="Two web servers with rsync" title="2a - Two Web servers - rsync" width="600" class="size-full wp-image-299" /><p class="wp-caption-text">Two web servers with rsync</p></div><br />
While this is reasonably simple to set up between two web-servers, it comes with significant downsides:</p>
<ul>
<li>Files cannot be deleted in the sync as newly-added files will exist on only one web-server. Since the sync is two-way, there is no way for the rsync processes to tell the difference between a new file and a deleted file.</li>
<li>Requests that come to the &#8220;other&#8221; web-server will not be able to access new files until the sync happens.</li>
<li>If additional web-servers are added, the sync process needs to be updated on every existing web-server to include the new web-server</li>
</ul>
<p>The other alternative is to store uploaded files on a separate file-server, whose upload directory is mounted on each web-server using NFS. This method eliminates the synchronization problems, since all web-servers are essentially writing to the same directory.<br />
<div id="attachment_300" class="wp-caption aligncenter" style="width: 610px"><img src="http://www.adamfranco.com/files/2009/09/2b-Two-Web-servers-nfs.jpg" alt="Two web servers with NFS" title="2b - Two Web servers - nfs" width="600" class="size-full wp-image-300" /><p class="wp-caption-text">Two web servers with NFS</p></div><br />
On top of the complexity of adding a fourth machine (the file-server) to our mix, this method also leaves us with the file-server as a single point of failure &#8212; were it to go down, no uploaded files would be accessible.</p>
<h2>Best of both worlds</h2>
<p>In order to better solve this problem, the approach we took is to go the NFS route, but augment it with a backup copy of the files stored on the local file-system of each web-server. Every ten minutes or so a script (<a href='http://www.adamfranco.com/files/2009/09/sync_files.sh'>sync_files.sh</a>) runs that checks to see if the shared NFS directory is available, and if so syncs the uploaded-files to a backup location on the web-server&#8217;s file-system. This backup copy has its permissions set so that the Apache process cannot write to it, preventing synchronization problems if the shared NFS directory goes offline and we need to serve files out of the backup copy.<br />
<div id="attachment_302" class="wp-caption aligncenter" style="width: 610px"><img src="http://www.adamfranco.com/files/2009/09/3-Two-Web-servers-nfs+backup1.jpg" alt="Two web servers with NFS and local backup copies." title="3 - Two Web servers - nfs+backup" width="600" class="size-full wp-image-302" /><p class="wp-caption-text">Two web servers with NFS and local backup copies.</p></div><br />
A second script (<a href='http://www.adamfranco.com/files/2009/09/check_link.sh'>check_link.sh</a>) runs every minute and checks to see if the shared NFS directory is available. If it is offline, this script changes the symbolic link of our &#8220;files&#8221; directory so that Drupal will now use the read-only backup copy for its files. If the NFS directory comes back online, this script will again update the symbolic link to point at our writable shared NFS directory.</p>
<p>An important consideration in this setup is that the NFS share is mounted in &#8216;soft&#8217; mode so that file-access errors will time out quickly and allow for a timely switch-over to our backup files.<br />
<div class="wp-caption alignnone" style="width: 610px">
<pre>files.example.edu:/images       /mnt/files     nfs     soft    0 0</pre>
<p><p class="wp-caption-text">An example 'soft' mount line in /etc/fstab</p></div></p>
<p>If the default &#8216;hard&#8217; NFS mount is used, the check_link processes will hang indefinitely while trying to communicate with the file-server and never switch to our backup files.</p>
<p>Here is an example layout on the web-server to accomplish this setup:</p>
<pre style='width: 100%'># The scripts that will be run by cron:
/usr/local/bin/check_link.sh  # Run every minute
/usr/local/bin/sync_files.sh   # Run every 10 minutes

# The mounted NFS share:
/mnt/files/

# The backup copy of files:
/srv/files_read_only/

# The 'files' symbolic link, pointing normally at the NFS share:
/srv/files/ => /mnt/files/
# On NFS failure, this link will be switched to the backup directory:
/srv/files/ => /srv/files_read_only/

# The Drupal code directory:
/srv/drupal/
# The files directory for a site is a link into the switched files link
/srv/drupal/sites/www.example.com/files/ => /srv/files/www.example.com/files/
</pre>
<p>By mounting the shared NFS directory, keeping a read-only local copy of the files, and monitoring the state of the NFS directory we gain the following benefits:</p>
<ul>
<li>No problems with synchronization as all web-servers share the same remote filesystem.</li>
<li>Synchronization of the local backup copies is not a problem as this is always a one-way sync rather than a two-way sync between different web-servers.</li>
<li>While the NFS file-server is still a single point of failure, read access to the uploaded files (via the backup copy) will be restored after a maximum of one minute plus the NFS time-out (2 minutes by default for &#8216;soft&#8217; mounts).</li>
<li>The web-servers don&#8217;t need to know about each other, easing configuration if additional web-servers are added.</li>
</ul>
<p>This configuration adds an extra machine to the platform mix and a bit of complexity, but it makes normal operation robust (instant file availability to all web-servers) and allows for graceful degradation (file-access becomes read-only) if the file-server goes down.</p>
<p><em>Many thanks to our system administrator, Mark Pyfrom, for all of his help in developing and testing this platform.</em></p>
<p><em>* Update on 2009-09-10: added note about &#8216;soft&#8217; NFS mounts and an example file-system layout.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2009/09/09/high-availability-drupal-file-handling/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Setting up CAS development on OS X</title>
		<link>http://www.adamfranco.com/2009/06/19/setting-up-a-cas-development-on-os-x/</link>
		<comments>http://www.adamfranco.com/2009/06/19/setting-up-a-cas-development-on-os-x/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 20:57:12 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[CAS]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[single sign on]]></category>
		<category><![CDATA[Tomcat]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=146</guid>
		<description><![CDATA[Central Authentication Service (CAS) is a single-sign-on system for web applications written in Java that we have begun to deploy here at Middlebury College. Web applications communicate with it by forwarding users to the central login page and then checking the responces via a web-service protocol. A few months ago Ian and I got CAS [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.jasig.org/cas/">Central Authentication Service (CAS)</a> is a single-sign-on system for web applications written in Java that we have begun to deploy here at Middlebury College. Web applications communicate with it by forwarding users to the central login page and then checking the responces via a web-service protocol.</p>
<p>A few months ago Ian and I got CAS installed on campus and began updating applications to work with it rather than maintaining their own internal connections to the Active Directory server. Throughout this process we ran into a few challenges (such as returning attributes with the authentication-success response) and a bug in CAS, but we worked through these and got CAS up and running successfully.</p>
<p>We are now at a point where we need to do some customizations to our CAS installation to deal with changes to the group structure in the Active Directory. As well, the bug I reported was apparently fixed in a new CAS version, an improvement I need to test before we update our production installation. Both of these require a bit more poking at CAS than we can do safely in our production environment, so I am now embarking on the process of setting up a Java/Tomcat development environment on my PC. I&#8217;m documenting this process here both for my own benefit (when I have to set this up again on my laptop) and in case it helps anyone else.</p>
<p>Read on for my step-by-step instructions for setting up a CAS development environment on OS X.<br />
<span id="more-146"></span></p>
<p>Since I&#8217;ve recently been successful using <a href="http://www.macports.org/">MacPorts</a> to install <a href="http://git-scm.com/">Git</a> (a source-code-management too), I decided to use MacPorts to install <a href="http://tomcat.apache.org/">Apache Tomcat</a>, <a href="http://maven.apache.org/">Maven</a>, and the other required software.</p>
<h1>Part One: Get a default CAS installation up and running</h1>
<h2>Install the Apache Tomcat server using MacPorts </h2>
<pre>sudo port install tomcat5</pre>
<p>A number of packages will not be successfully found by ports, resulting in the following error:</p>
<pre>--->  Verifying checksum(s) for servlet24-api
Error: Checksum (md5) mismatch for apache-tomcat-5.5.25-src.tar.gz
Error: Target org.macports.checksum returned: Unable to verify file checksums</pre>
<p>The easiest fix I found was do go to the apache website and directly download the file. <a href="http://www.google.com/search?q=apache-tomcat-5.5.25-src.tar.gz">Googling</a> will find it. Once you&#8217;ve downloaded the file, find the corrupted one on your file-system with</p>
<pre>find /opt/local/var/macports -name "apache-tomcat-5.5.25-src.tar.gz"</pre>
<p>And replace the corrupted version with the directly downloaded one and then re-try installation with MacPorts:</p>
<pre>sudo mv /Users/afranco/Downloads/apache-tomcat-5.5.25-src.tar.gz /opt/local/var/macports/distfiles/servlet24-api/</pre>
<p>You will have to do this process several times for different packages that fail to download.</p>
<p>You will likely also need to install the <a href="http://dev.mysql.com/downloads/connector/j/3.1.htmll">mysql-connector-java</a>. Download the zip archive and copy the .jar file inside to <code>/Library/Java/Extensions/</code></p>
<h2>Install Maven</h2>
<p>Same thing as with Tomcat, try installing with ports and replace failed downloads.</p>
<pre>sudo port install maven2</pre>
<p>Also, make sure that <code>/opt/loca/bin/</code> is to the front of the search path in your <code>~/.bash_profile</code>. If not, the built-in mvn command may take precedence.</p>
<pre>mvn -v
Apache Maven 2.1.0</pre>
<h2>Download CAS</h2>
<p>I cloned the repository using Git so that I can easily maintain my private branches, but you can download it directly or checkout with subversion.</p>
<pre>git svn clone  https://source.jasig.org/cas3 --trunk=trunk --branches=branches --tags=tags</pre>
<h2>Build CAS</h2>
<p>Building CAS is accomplished by cd&#8217;ing to the directory in which you downloaded CAS and runing:</p>
<pre>mvn package install</pre>
<p>Because some of the tests rely on network particulars, I can&#8217;t get the build to work unless I skip tests by instead using:</p>
<pre>mvn -Dmaven.test.skip=true package install</pre>
<h2>Install CAS</h2>
<p>Copy the CAS war and files to the Tomcat webapps directory:</p>
<pre>sudo cp -R cas-server-webapp/target/cas-server-webapp-3.x.x  /opt/local/share/java/tomcat5/webapps/cas
sudo cp -R cas-server-webapp/target/cas.war  /opt/local/share/java/tomcat5/webapps/cas.war</pre>
<h2>Start Tomcat/CAS</h2>
<p>Start Tomcat:</p>
<pre>sudo tomcatctl start</pre>
<p>Point your browser at <a href="http://localhost:8080/cas/">http://localhost:8080/cas/</a> and you should see the CAS login page.</p>
<h1>Part Two: Customizing CAS</h1>
<h2>Create the customization overlay</h2>
<p>Following the <a href="http://www.ja-sig.org/wiki/display/CASUM/Maintaining+local+customizations+using+Maven+2">instructions for maintaining local customizations</a> on the CAS wiki, I created a <code>cas-server-midd</code> subdirectory in my CAS-source directory and added a <code>pom.xml</code> file based on the example. Running maven properly generates a war file:</p>
<pre>cd cas3/cas-server-midd/

mvn -Dmaven.test.skip=true package install

ls target/
	cas
	cas-server-midd-3.3.3-SNAPSHOT
	cas-server-webapp-3.3.3-SNAPSHOT
	cas.war
	maven-archiver
	pom-transformed.xml
	war

sudo rm -R  /opt/local/share/java/tomcat5/webapps/cas

sudo cp -R target/cas  /opt/local/share/java/tomcat5/webapps/cas

sudo cp target/cas.war  /opt/local/share/java/tomcat5/webapps/cas.war

sudo tomcatctl restart
</pre>
<p>Note that there is a <code>target/cas/</code> directory in addition to <code>target/cas-server-midd-3.x.x</code> and <code>target/cas-server-webapp-3.x.x</code> directories. In this case, the <code>target/cas/</code> one is the one we want to copy to the <code>tomcat5/webapps</code> directory. Refreshing your browser should still show the login page an not an error.</p>
<h2>Adding Customizations</h2>
<p>With the overlay directory structure in place, any files you add to the overlay directory (in my case <code>cas3/cas-server-midd/src/...</code>) will be used instead of the versions in <code>cas3/cas-server-webapp/src/...</code></p>
<p>Using the overlay has the same result as editing the files in <code>cas3/cas-server-webapp/src/...</code>, but keeps the changes in their own directory structure and allows the overlay to only contain files with modifications. All other files have their defaults used.</p>
<h2>Build and Install the customizations</h2>
<p>Run the maven build process again and copy the result from our overlay target to the Tomcat directory:</p>
<pre>cd cas3/cas-server-midd/
mvn -Dmaven.test.skip=true package install
sudo rm -R  /opt/local/share/java/tomcat5/webapps/cas
sudo cp -R target/cas  /opt/local/share/java/tomcat5/webapps/cas
sudo cp target/cas.war  /opt/local/share/java/tomcat5/webapps/cas.war
sudo tomcatctl restart
</pre>
<p>As you go, make more changes on the overlay and rebuild CAS. Lather, rinse, repeat.</p>
<p>Simply adding files to the overlay should be able to support any configuration or JSP (themes, etc) changes needed. I&#8217;ll make another post once I figure out where to put class-files for adding customized Principal-Resolver classes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2009/06/19/setting-up-a-cas-development-on-os-x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Git Tip of the Day: Stage Hunks</title>
		<link>http://www.adamfranco.com/2009/01/13/git-tip-of-the-day-stage-hunks/</link>
		<comments>http://www.adamfranco.com/2009/01/13/git-tip-of-the-day-stage-hunks/#comments</comments>
		<pubDate>Tue, 13 Jan 2009 16:19:43 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Computers and Technology]]></category>
		<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[Git]]></category>
		<category><![CDATA[source-control]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=91</guid>
		<description><![CDATA[One of the great things about the Git version-control system is the ability to incrementally commit your changes on a private branch to keep a step-by-step record of your thought and writing process on a fix or a feature, and then merge the completed work onto your main [or public] branch after your feature or [...]]]></description>
			<content:encoded><![CDATA[<p>One of the great things about the <a href="http://git-scm.com/">Git</a> version-control system is the ability to incrementally commit your changes on a <a href="http://amarok.kde.org/wiki/Development/Git#Branching">private branch</a> to keep a step-by-step record of your thought and writing process on a fix or a feature, and then merge the completed work onto your main [or public] branch after your feature or fix is all done and tested. By keeping an incremental log of your changes &#8212; rather than just committing one giant set of code with changes to 30 files &#8212; it becomes much easier to know why a certain line was changed in the future when bugs are discovered with it.</p>
<p>One thing that often happens to me though, is that I work for about a half hour to an hour trying to get a new piece of code working and in the process make several sets of changes to one file that are only loosely related.</p>
<p>Let&#8217;s say that I am fixing a bug in my &#8216;MediaLibrary&#8217; class and while doing so notice some some spelling mistakes in some comments that I fix. Now my one file has two changes my bug fix, and the spelling fix. Rather than committing both changes together with one comment describing both changes, I can highlight one of the changes in <a href="http://www.kernel.org/pub/software/scm/git/docs/git-gui.html">git-gui</a> and select the &#8220;Stage Hunk for Commit&#8221; option.</p>
<p><a href='http://www.adamfranco.com/files/2009/01/stagehunk1.jpg'><img src="http://www.adamfranco.com/files/2009/01/stagehunk1.jpg" alt="Screen-shot of Staging a Hunk of code" title="Git-GUI: Stage Hunk" width="500" height="361" class="alignnone size-full wp-image-93" /></a></p>
<p>With that one hunk staged I can now commit with a message applicable to that change. Other changes can then be staged and committed with their own messages resulting in a very understandable history of changes.</p>
<p>&#8220;Stage Hunk for Commit&#8221; can also be used to commit important changes while not including debugging lines inserted in your code.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2009/01/13/git-tip-of-the-day-stage-hunks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Outside-In:  Application Interoperability Using an OSID-Based Framework</title>
		<link>http://www.adamfranco.com/2008/06/25/outside-in-application-interoperability-using-an-osid-based-framework/</link>
		<comments>http://www.adamfranco.com/2008/06/25/outside-in-application-interoperability-using-an-osid-based-framework/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 03:59:04 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[Harmoni]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Segue]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=76</guid>
		<description><![CDATA[This post describes an interoperability demonstration given at OpeniWorld Europe 2008 in Lyon, France. Abstract Segue and Concerto are two curricular applications built upon Harmoni, an Open Service Interface Definition-based (OSID) service-oriented application framework. This demonstration will show how website content created in Segue is stored as OSID Assets in Harmoni’s OSID Repository. Similarly, the [...]]]></description>
			<content:encoded><![CDATA[<p><em>This post describes an interoperability demonstration given at Open<span style='color: #a00'>i</span>World Europe 2008 in Lyon, France.</em></p>
<p><strong>Abstract </strong></p>
<p>Segue and Concerto are two curricular applications built upon Harmoni, an Open Service Interface Definition-based (OSID) service-oriented application framework. This demonstration will show how website content created in Segue is stored as OSID Assets in Harmoni’s OSID Repository. Similarly, the demonstration will show how multimedia assets created in Concerto can be stored  same repository. Interoperability will be demonstrated as each application is used to view and make real-time modifications to the OSID Assets created using the other application, while at the same time respecting the authorizations given to those assets. Additionally, an OSID Repository to OAI-PMH gateway will be shown providing the LibraryFind meta-search tool with access to the metadata for content created in Segue, Concerto, and a lightweight, read-only OSID Repository.</p>
<ul>
<li><strong>Companion Paper: </strong> <a href='http://www.adamfranco.com/files/2008/06/openiworld-europe-2008-paper.pdf'>PDF (76 KB)</a></li>
<li><strong>Presentation Slides: </strong><a href='http://www.adamfranco.com/files/2008/06/openiworld-europe-2008-slides.pdf'>PDF (7.4 MB)</a></li>
</ul>
<p><strong>Software Demonstrated:</strong></p>
<ul>
<li><a href="http://harmoni.sf.net">Harmoni Application Framework</a> (Middlebury College)</li>
<li><a href="http://segue.sf.net">Segue</a> version 2 (Middlebury College)</li>
<li><a href="http://concerto.sf.net">Concerto</a> (Middlebury College)</li>
<li><a href="http://www.libraryfind.org">LibraryFind</a> (Oregon State University)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2008/06/25/outside-in-application-interoperability-using-an-osid-based-framework/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Segue 2.0 &#8211; Beta 20</title>
		<link>http://www.adamfranco.com/2008/06/09/segue-20-beta-20/</link>
		<comments>http://www.adamfranco.com/2008/06/09/segue-20-beta-20/#comments</comments>
		<pubDate>Tue, 10 Jun 2008 03:23:52 +0000</pubDate>
		<dc:creator>Adam</dc:creator>
				<category><![CDATA[Work/Professional]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Segue]]></category>

		<guid isPermaLink="false">http://www.adamfranco.com/?p=75</guid>
		<description><![CDATA[Another week, another Segue 2 beta. This week&#8217;s installation brings visitor registration, a few new themes from Alex, theme migration from Segue 1, and a bunch of little bug fixes. Visitor registration brings with it a few interesting challenges. As in Segue 1, we want (and need) to be able to allow people outside of [...]]]></description>
			<content:encoded><![CDATA[<p>Another week, another Segue 2 beta. This week&#8217;s installation brings visitor registration, a few new themes from Alex, theme migration from Segue 1, and a bunch of little bug fixes.</p>
<p>Visitor registration brings with it a few interesting challenges. As in Segue 1, we want (and need) to be able to allow people outside of the Middlebury community to join in on public discussions hosted in Segue. As well, Middlebury users often need to give access to restricted parts of their sites to people off-campus with whom they are collaborating. Our visitor registration system therefore needs to be easy to use by registrants, keep out spammers, as well as enable searches for visitor accounts by community users.</p>
<p>To keep out spammers, the visitor registration form uses <a href="http://recaptcha.net/">reCAPTCHA</a> to try to verify that a human is sitting at the browser. There are other CAPTCHA systems out there, but I like the philosophy and approach of reCAPTCHA. Starting with words that OCR software had trouble reading seems like a good idea. After the registration form is filled out, Segue sends an email to the address entered with a unique registration code. Until the link in the email is clicked on (and hence the address verified) the account is locked.</p>
<p>To enable easy searching of visitor accounts, visitors are asked to enter their name. While there are a few restrictions on names, these are user-chooseble. To provide some measure of differentiation between verified institution accounts and visitor accounts visitor accounts have the user-chosen name followed by their email domain name in parenthesis, e.g.:</p>
<blockquote><p>Adam Franco (gmail.com)</p></blockquote>
<p>I weighed including the entire email address as that is the only verified information we have about the visitor accounts, but I&#8217;d rather not open that information up for harvesting by spammers. If abuse becomes an issue, the visitor registration system also supports both black-lists and white-lists of email domains.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.adamfranco.com/2008/06/09/segue-20-beta-20/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

