Migrating from Gitolite to GitHub Enterprise


Recently, we performed a mass migration of our git repositories from Gitolite to GitHub Enterprise.

We had found that the level of maintenance required on Gitolite was quite high, and had quite an impact on the team that was looking after it due to the configuration complexity. We were running a rather old version with some pretty big security flaws, and running on some out of date, snowflake servers. One of the biggest issues though was the way that it required developers to request another team to create repos and change permissions etc, adding unnecessary delay and causing blockages.

After reviewing multiple options, we decided to migrate to GitHub Enterprise (GitHub), which runs as an on-premises VMWare appliance. We chose this due to the familiarity most developers have with github.com, and GitHub’s superior support amongst third party tools. This allowed developers to create repositories and perform most common tasks as self-service, rather than relying on another team.

As this migration does not appear to be very common, this post shares some detail about the steps that were required.

Remove large files from all git repositories

GitHub, quite rightly, has a file size limit of 100 MB, as git is designed as a source code management tool, rather than a place to store large binaries. As such, it performs badly when used with large files. GitHub has a post-receive hook that prevents large files being committed, so we had to clean up the source repos before importing. We used the fantastic open source tool BFG repo cleaner against our Gitolite repos, with a few code changes to move dependencies from the repo into nuget packages and/or Artifactory.

Unfortunately, there is no easy way (that we could find) to find these large files, without using BFG, or attempting to import into GitHub. In the end, we went with some basic searching for large files in the working copies (which had quite a few false positives), followed by the import approach, and just dealt with the errors that came up.

Side note: On the day we finished these changes, GitHub announced Large File Storage. Such is life.

Map repositories to teams

This was a rather boring admin task for each team, involving going through a list of 800-odd repositories, and marking the ones they owned. We had partial success with some teams previously using owners files to mark code ownership, however this wasn’t widespread. Some repositories that were left without owners were marked as owned by a new team “UnclaimedRepos”

Create the “Organisations” in GitHub

We manually created the organisations in GitHub, and manually added LDAP groups for the owning teams. Boring, but much quicker than automating it for the number of organisations involved.

Automatic Import

We wrote a basic script that:

  1. Looped through the repositories in Gitolite retrieved via
    ssh git@gitolite.thetrainline.local
    
  2. Looked up the owning team from step 2
  3. Used the REST API to create the repository in GitHub.
    curl -X POST -u USER:PASSWORD --data '{"name": "REPONAME"}' https://github.thetrainline.local/api/v3/orgs/ORGANISATION/repos
    
  4. Cloned the repo locally
    git clone git@gitolite.thetrainline.com --bare
    
  5. Pushed the local clone to the new destination
    git push git@github.thetrainline.com --mirror
    

This was an idempotent process, so we were able to take advantage of this and run it multiple times. We ran it several times before the final cut-over, and then one last time to ensure we had the latest code.

Once the code was migrated, we modified the gitolite-admin repo (where Gitolite stores its config) to mark all repos as read-only, and completely remove access from our build agents (to catch any missed references to the old server). We also wrote some basic automation scripts to modify TeamCity (via the REST API) and Go (via config.xml modification) to update our continuous integration (CI) servers.

The final step was to update our Chef automation for our build agents to update the known_hosts file to accept the public key for the new server, and everything pretty much just worked.

Issues

Overall, this was a relatively smooth migration, though there were definitely a few issues on the way.

Some repos had been used in… “interesting ways”. Due to reasons lost in the mists of time, some teams were using git as a transport mechanism to ship deployments into our production environment. While this was working well with gitolite (where we had git replication to our production data centre), this was an issue, as we didn’t want to open the firewalls to GitHub from all environments. As this was just a small number of repos, we decided it was best to modify these applications to use the same deployment processes as the rest of our applications, which gave the additional benefit of less cognitive dissonance when switching between projects.

We are wasting user licences for CI servers… At the moment, each of our CI servers has its own SSH key, which means it is set up as a full user in GitHub. We need to move over to using deploy keys, which should be relatively simple.

GitHub does not support Post-Receive hooks. We previously had some post-receive hooks set up in Gitolite to perform various tasks when code was pushed (ie, validating commit messages etc). Unfortunately, this is not supported in GitHub, so we’ve had to temporarily remove the actions. A potential approach here is to move these validations into the CI pipeline.

Change of permissions model. Previously, we had a relatively open access model with Gitolite. As we moved to GitHub, we changed to have more of a team ownership model, with commit access only available to the owning team. This works most of the time, however, we have some repos that have truly shared code ownership. This has highlighted some spots where we need to work on boundaries between our applications a bit better, to remove cross-team dependencies. Another way this issue has manifested is with some teams going straight for “please grant me access”, rather than fully considering the fact that the repo may be in the incorrect organisation, the code boundaries are not correct, or even that they should just do a pull request.

Summary

Overall, the move to GitHub has been well received, and has increased developer productivity. It has vastly increased code visibility, as well as encouraging code ownership and even conversations about the code – both leading to higher quality.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s