Mirroring a Subversion repository on Github

For the past few months I have been doing a lot of work on the phpCAS library, mostly to improve the community trunk of phpCAS so that I wouldn’t have to maintain our own custom fork with support for the CAS attribute format we use at Middlebury College. The phpCAS project lead, Joachim Fritschi, has been great to work with and I’ve had a blast helping out with the project.

The tooling has involved a few challenges however, since Jasig (the organization that hosts the CAS and phpCAS projects) uses Subversion for its source-code repositories and we use Git for all of our projects. Now, I could just suck it up and use Subversion when doing phpCAS development, but there are a few reasons I don’t:

  1. We make use of Git submodules to include phpCAS along with the source-code of our applications, necessitating the use of a public Git repository that includes phpCAS.
  2. The git-svn tools allow me to use git on my end to work with a Subversion repository, which is great because…
  3. I find that Git’s fast history browsing and searching make troubleshooting and bug fixing much easier than any other tools I’ve used.

For the past two years I have been using git-svn to work with the phpCAS repository and every so often pushing changes up to a public Git repository on GitHub. Our applications reference this repository as a submodule when they need to make use of phpCAS. Now that I’ve been doing more work on phpCAS (and am more interested in keeping our applications using up-to-date versions), I’ve decided to automate the process of mirroring the Subversion repository on GitHub. Read on for details of how I’ve set this up and the scripts for keeping the mirror in sync.

On my development server I have a git repository I’ve cloned from the Jasig Subversion repository via:

git svn clone --stdlayout https://source.jasig.org/cas-clients/phpcas/

I use this repository for my phpCAS development and am continually using git svn rebase and git svn dcommit to update my branches from Subversion and commit changes back to the Subversion repository.

My goal was to fetch from Subversion and push all of the branches and tags from the Subversion repository to GitHub while ignoring any private branches or un-dcommited changes I might have kicking about my development repository.

Step 1: Add the GitHub repository as a remote

git remote add github git@github.com:adamfranco/phpcas.git

Step 2: Fetch the latest changes from the svn repository
To do this, I just needed to run git svn fetch to import commits from the Subversion repository into my Git repository.

Step 3: Make Git tags for Subversion tag-branches
I may be doing something wrong, but it seems that Subversion tags come through git svn as git branches rather than as git “tag” objects. Basically they are a branch with a single commit that just adds the tag message, but no content change. Using git show I found I could grab the parent id, message, and other metadata from the “tag-branch”, then feed that into git tag to create actual tag objects in the git repository.

Step 4: Push subversion branches and newly created tags to GitHub
When called with no parameters git push will push all branches that have matching names in both the source and the destination repository. This wasn’t going to work for me since I want to only automatically push the branch-state that exists in subversion (not any un-dcommitted changes in my Git repository) and want to create mirrors of any new branches that appear in the Subversion repository. To accomplish this I needed to specify every branch individually. I determined the list of branches via:

git branch -r | grep -v '/' | grep -v trunk

then looped through them and appended them to the hard-coded mapping between the svn “trunk” and the GitHub “master”:

$cmd = 'git push --tags github refs/remotes/trunk:refs/heads/master ';
foreach ($svnBranches as $branch) {
	$cmd .= 'refs/remotes/'.$branch.':refs/heads/'.$branch.' ';
}

All together: update_github_phpcas
Below is a script which performs the tasks above. I’ve added it to my crontab so that it runs every half-hour and keeps my GitHub repository in-sync with the Jasig Subversion repository.


#!/usr/local/bin/php
<?php
/**
* Script to mirror a Subversion repository on GitHub or another public Git repository.
*
* Author: Adam Franco (afranco@middlebury.edu)
* Date: 2010-12-04
* License: GNU General Public License (GPL) version 2 or later.
*/ 

chdir('/home/afranco/private_html/phpcas/');

// Fetch from svn.
`git svn fetch`;

// Lookup all of the svn branches
$svnBranches = explode("\n", trim(`git branch -r | grep -v '/' | grep -v trunk`));
$svnBranches = array_map('trim', $svnBranches);

// Add all of our branches to the list of those to push
$cmd = 'git push --tags github refs/remotes/trunk:refs/heads/master ';
foreach (
$svnBranches as $branch) {
$cmd .= 'refs/remotes/'.$branch.':refs/heads/'.$branch.' ';
}

// Ensure that Git tags are created for every SVN tag branch.
$svnBranches = explode("\n", trim(`git branch -r | grep 'tags/'`));
$svnBranches = array_map('trim', $svnBranches);
foreach (
$svnBranches as $svnTag) {
$ref = "refs/remotes/$svnTag";
$parent = shell_exec("git show --format=\"format:%P\" $ref");

// If there are no tags on the parent of the tag branch, add one.
if (!strlen(trim(`git tag --contains $parent`))) {
$message = shell_exec("git show --format=\"format:%s%ntagged by %aN on %aD\" $ref");
$date = shell_exec("git show --format=\"format:%ai\" $ref");
$tagName = str_replace('tags/', '', $svnTag);

$tagCmd = 'GIT_COMMITTER_DATE="'.$date.'" git tag -a -m "'.$message.'" '.$tagName.' '.$parent;
#        print $tagCmd ."\n";
#        print "Creating tag $tagName\n";
`$tagCmd`;
}
}

#print $cmd;
#print "\n";

$output = `$cmd 2>&1`;

if (trim($output) != 'Everything up-to-date')
print
$output."\n";

 

 

I think that this script should work with very few changes for mirroring any Subversion repository as a Git repository.

4 Comments

  1. Woops!
    I take it back.
    It must have been me.

  2. I’m not sure I understand your loop over the svn tags. Why not do something like:

    foreach ($svnBranches as $svnTag) {
    $cmd .= ‘refs/remotes/’.$svnTag.’:refs/’.$svnTag.’ ‘;
    }

    Or is there something subtle that I’m missing?

  3. Tim, the subtlety is that svn “tags” seem to come through git-svn as git branches off of the trunk (or other svn branch” that contain a single commit with now changes. For my purposes I wanted to create an actual git tag object that has the comment of that vestigial commit, but is located at the point on the parent branch where the “tag” forked off.

    I have added a screen shot from gitk to step #3 that shows my repository with the svn tag branches and the git tag objects that were created by the script. I hope this makes it all little more clear what is going on.

  4. Thanks for your post. very useful. I listed couple of others creating short repository of useful links to resources about Git. Might be useful for people who start with Git. Check it at: http://blog.i-evaluation.com/2011/11/09/introduction-to-git-and-github

Leave a Reply to Tim Cancel reply

Your email address will not be published. Required fields are marked *