Using rsync as a low-cost alternative to svnsync or WanDisco for Subversion synchronisation


We needed to replicate a Subversion artifacts repository to a remote location, but we found svnsync to be unreliable, and WanDisco’s offering was out of our budget range. We therefore developed a simple Subversion replication mechanism using post-commit hooks, message queues, and rsync; this has proven highly reliable and meets our needs well. Here is how we did it.

Background

For various reasons (now forgotten in the mists of time) we use a Subversion repository for storing some of our deployable artifacts. We also need a read-only copy of these artifacts in a second location accessible from our Production servers, and so we need a way of replicating the artifacts to the secondary location.

We tried using the native Subversion svnsync tool to synchronise the data from the master to the secondary Subversion instance, but found it to be quite brittle, as it would fail almost every day due to various network gremlins and bandwidth issues. For instance, we would get unhelpful log messages like these from svnsync:

[Wed Mar 28 12:40:34 2012] [error] [client 10.xx.x.xx] Provider encountered an error while streaming a REPORT response.  [500, #0]
[Wed Mar 28 12:40:34 2012] [error] [client 10.xx.x.xx] Problem replaying revision  [500, #106]
[Wed Mar 28 12:40:34 2012] [error] [client 10.xx.x.xx] Error writing base64 data: Unknown error  [500, #106]

The workaround involved deleting a synchronisation lock on the secondary repository – painful:

svn propdel svn:sync-lock --revprop -r 0 http://10.xx.x.xx/svn/repo

We then investigated whether WANdisco’s Subversion Multisite would do what we wanted, but the cost model was not right for us so we had to think again.

Details

What we came up with was a simple solution comprising:

  • svn post-commit hooks
  • message queues
  • rsync
  • Subversion’s FSFS data store

At a high level, commits into the source (or master) repository trigger a post-commit hook to run, which grabs details of the commit, and drops these onto a message queue. At some point later, a service (daemon) picks up messages from the queue, and runs rsync to copy across the Subversion data on the filesystem to the secondary location:

rsync-subversion-replication

The workflow is:

  1. New svn commit arrives at the master Subversion repo
  2. The post-commit hook fires
  3. Details of the commit are placed on queue using info from the post-commit hook
  4. A Windows Service watches the queue
  5. The Service fires rsync for each new message on the queue
  6. rsync copies the relevant repository data to the secondary Subversion server
  7. The secondary repository is available for use immediately

The approach was very tailored to the problem, but – as we’ll see below – has some useful properties, particulary around replication reliability and resilience.

Subversion Post-Commit Hook

rsync-subversion-post-commit-2

Post-commit hook in the source Subversion repo

We made use of the post-commit hook in Subversion in order to call custom code just after new artifacts have been committed into Subversion at the master location.

When Subversion fires the post-commit hook, it passes two parameters:

  1. Repository path
  2. Revision number created by the commit

We can make use of these in order to place enough information on a queue to allow rsync to push the relevant files to the secondary repository. We can use this method because we’re using Subversion’s FSFS data storage (on the filesystem). In the post-commimt hook (here called post-commit.cmd as we’re running on Windows), we pass in the two parameters provided by Subversion:

postcommit.exe %1 %2

Our custom tool postcommit.exe simply takes the parameters and creates a message on a local message queue with these details:

18/06/2012 07:22:38 sending D:\svn_repository\deployment_maps 85 to .\private$\svnRsync

Message Queue

We used MSMQ as a message queue (as the server runs Windows Server 2008). Each message on the queue simply contains the Subversion revision ID (as the label) plus the filesystem-local repository path (inside the message body):

rsync-subversion-MSMQ

This makes it easy to diagnose problems, because all the information needed is in the message queue.

Triggering rsync

We wrote a simple Windows Service to inspect the message queue at regular intervals and trigger an rsync operation based on the message or messages in the queue. Typically, the queue processor encounters only a single message, and so rsyncs just a single revision:

18/06/2012 11:14:56 got a message d/svn_repository/deployment_maps 85

Here, the queue processor found a message in the queue with Subversion revision ID 85 and a repository path d/svn_repository/deployment_maps (the format of the path is made rsync-friendly, with forward path separators instead of the usual Windows-style backslashes). This is sufficient for the queue processor to know which repository to replicate, and which revision, although in practice, rsync simply pushes all new files on the master to the secondary.

Putting it all together

One of the really nice properties of this replication scheme is that, because it uses an asynchronous message queue, it is resilient to transient failures in the rsync mechanism, including network link failures. As soon as the transient error goes away, queue processing resumes, and replication continues, starting with a large ‘burst’ to process all the messages:

18/06/2012 11:10:09 got a message d/svn_repository/deployment_maps 2937
18/06/2012 11:10:26 sending incremental file list
18/06/2012 11:10:27 deployment_maps/README.txt
18/06/2012 11:10:27 deployment_maps/format
18/06/2012 11:10:27 deployment_maps/conf/authz
18/06/2012 11:10:27 deployment_maps/conf/passwd
18/06/2012 11:10:27 deployment_maps/conf/svnserve.conf
...
18/06/2012 11:10:27 deployment_maps/db/revprops/0/0
18/06/2012 11:10:27 deployment_maps/db/revprops/0/1
18/06/2012 11:10:27 deployment_maps/db/revprops/0/10
18/06/2012 11:10:27 deployment_maps/db/revprops/0/100
...
18/06/2012 11:14:32 deployment_maps/locks/db-logs.lock
18/06/2012 11:14:32 deployment_maps/locks/db.lock
18/06/2012 11:14:50 sent 2740922 bytes  received 4064177 bytes  24260.60 bytes/sec
18/06/2012 11:14:50 total size is 4263615246  speedup is 626.53

Here we can see an entire repository being replicated to the secondary server, including the repository settings (svnserve.conf and authz files). The final lines (in bold) are the summary info from rsync showing a successful (if large!) transfer of files.

We now have a super-reliable and resilient one-way replication scheme for Subversion which is hugely better than svnsync, and much cheaper than WANdisco, although admittedly, our solution is very simple!

4 thoughts on “Using rsync as a low-cost alternative to svnsync or WanDisco for Subversion synchronisation

    • Thanks Pavel!

      We’ve since migrated our artifact storage to Artifactory – this handles replication much better than the solution above. Not to mention, storing binaries in a system designed to hold source code was not the greatest of ideas.

  1. How did you handle the risk of temporary head corruption during the rsync?

    The repo/db/current file contains the head revision id. If this file say (eg) contains 500 andthe rsync transfers this file prior to the rev & revprop file repo/db/revs/0/500 & repo/db/revprops/0/500, this will cause temporary read problems of the mirror.

    They may of course resolve themselves quite quickly, unless you experiance the transient network issues you see, but could break builds and generally create an administrative overhead checking if it’s actually broken or not?

    Even svnsync seems better here, given it’s svn atomic operation

    • It had the unfortunate side effect of locking the destination svn repository. Not 100% sure of how it manifested, but I know that it caused a bit of pain.

      For what we were using it for, it worked fairly well for quite a while. However, we’ve now moved over to Artifactory and deprecated this solution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s