We needed to replicate a Subversion artifacts repository to a remote location, but we found svnsync to be unreliable, and WanDisco’s offering was out of our budget range. We therefore developed a simple Subversion replication mechanism using post-commit hooks, message queues, and rsync; this has proven highly reliable and meets our needs well. Here is how we did it.
For various reasons (now forgotten in the mists of time) we use a Subversion repository for storing some of our deployable artifacts. We also need a read-only copy of these artifacts in a second location accessible from our Production servers, and so we need a way of replicating the artifacts to the secondary location.
We tried using the native Subversion svnsync tool to synchronise the data from the master to the secondary Subversion instance, but found it to be quite brittle, as it would fail almost every day due to various network gremlins and bandwidth issues. For instance, we would get unhelpful log messages like these from svnsync:
[Wed Mar 28 12:40:34 2012] [error] [client 10.xx.x.xx] Provider encountered an error while streaming a REPORT response. [500, #0] [Wed Mar 28 12:40:34 2012] [error] [client 10.xx.x.xx] Problem replaying revision [500, #106] [Wed Mar 28 12:40:34 2012] [error] [client 10.xx.x.xx] Error writing base64 data: Unknown error [500, #106]
The workaround involved deleting a synchronisation lock on the secondary repository – painful:
svn propdel svn:sync-lock --revprop -r 0 http://10.xx.x.xx/svn/repo
We then investigated whether WANdisco’s Subversion Multisite would do what we wanted, but the cost model was not right for us so we had to think again.
What we came up with was a simple solution comprising:
- svn post-commit hooks
- message queues
- Subversion’s FSFS data store
At a high level, commits into the source (or master) repository trigger a post-commit hook to run, which grabs details of the commit, and drops these onto a message queue. At some point later, a service (daemon) picks up messages from the queue, and runs rsync to copy across the Subversion data on the filesystem to the secondary location:
The workflow is:
- New svn commit arrives at the master Subversion repo
- The post-commit hook fires
- Details of the commit are placed on queue using info from the post-commit hook
- A Windows Service watches the queue
- The Service fires rsync for each new message on the queue
- rsync copies the relevant repository data to the secondary Subversion server
- The secondary repository is available for use immediately
The approach was very tailored to the problem, but – as we’ll see below – has some useful properties, particulary around replication reliability and resilience.
Subversion Post-Commit Hook
We made use of the post-commit hook in Subversion in order to call custom code just after new artifacts have been committed into Subversion at the master location.
When Subversion fires the post-commit hook, it passes two parameters:
- Repository path
- Revision number created by the commit
We can make use of these in order to place enough information on a queue to allow rsync to push the relevant files to the secondary repository. We can use this method because we’re using Subversion’s FSFS data storage (on the filesystem). In the post-commimt hook (here called post-commit.cmd as we’re running on Windows), we pass in the two parameters provided by Subversion:
postcommit.exe %1 %2
Our custom tool postcommit.exe simply takes the parameters and creates a message on a local message queue with these details:
18/06/2012 07:22:38 sending D:\svn_repository\deployment_maps 85 to .\private$\svnRsync
We used MSMQ as a message queue (as the server runs Windows Server 2008). Each message on the queue simply contains the Subversion revision ID (as the label) plus the filesystem-local repository path (inside the message body):
This makes it easy to diagnose problems, because all the information needed is in the message queue.
We wrote a simple Windows Service to inspect the message queue at regular intervals and trigger an rsync operation based on the message or messages in the queue. Typically, the queue processor encounters only a single message, and so rsyncs just a single revision:
18/06/2012 11:14:56 got a message d/svn_repository/deployment_maps 85
Here, the queue processor found a message in the queue with Subversion revision ID 85 and a repository path d/svn_repository/deployment_maps (the format of the path is made rsync-friendly, with forward path separators instead of the usual Windows-style backslashes). This is sufficient for the queue processor to know which repository to replicate, and which revision, although in practice, rsync simply pushes all new files on the master to the secondary.
Putting it all together
One of the really nice properties of this replication scheme is that, because it uses an asynchronous message queue, it is resilient to transient failures in the rsync mechanism, including network link failures. As soon as the transient error goes away, queue processing resumes, and replication continues, starting with a large ‘burst’ to process all the messages:
18/06/2012 11:10:09 got a message d/svn_repository/deployment_maps 2937 18/06/2012 11:10:26 sending incremental file list 18/06/2012 11:10:27 deployment_maps/README.txt 18/06/2012 11:10:27 deployment_maps/format 18/06/2012 11:10:27 deployment_maps/conf/authz 18/06/2012 11:10:27 deployment_maps/conf/passwd 18/06/2012 11:10:27 deployment_maps/conf/svnserve.conf ... 18/06/2012 11:10:27 deployment_maps/db/revprops/0/0 18/06/2012 11:10:27 deployment_maps/db/revprops/0/1 18/06/2012 11:10:27 deployment_maps/db/revprops/0/10 18/06/2012 11:10:27 deployment_maps/db/revprops/0/100 ... 18/06/2012 11:14:32 deployment_maps/locks/db-logs.lock 18/06/2012 11:14:32 deployment_maps/locks/db.lock 18/06/2012 11:14:50 sent 2740922 bytes received 4064177 bytes 24260.60 bytes/sec 18/06/2012 11:14:50 total size is 4263615246 speedup is 626.53
Here we can see an entire repository being replicated to the secondary server, including the repository settings (svnserve.conf and authz files). The final lines (in bold) are the summary info from rsync showing a successful (if large!) transfer of files.
We now have a super-reliable and resilient one-way replication scheme for Subversion which is hugely better than svnsync, and much cheaper than WANdisco, although admittedly, our solution is very simple!