At thetrainline.com we have been using Opscode Chef since 2012 for automation of parts of our server infrastructure, particularly the build (CI) machines. We recently had several days of intensive Chef training with Chef expert Stephen Nelson-Smith (@LordCope on Twitter) of Atalanta Systems, spread over a few weeks, in preparation for scaling up our use of Chef. Here are some details of how we structured the Chef training, and what we got out of it (and why).
The first thing to say is that Stephen Nelson-Smith is excellent at delivering technical training. Having delivered some technical training myself in the past, I know how difficult it is to keep training sessions moving and attendees engaged, and Stephen is very effective at both.
Hands-on Classroom Sessions
Most of the attendees had never used Chef before (some had not used Ruby either), so we needed some ‘ground-up’ classroom sessions to cover the fundamentals of Chef. The classroom sessions were strongly focussed on hands-on learning, with each attendee eventually configuring two cloud-hosted servers using a Hosted Chef account. As a predominantly Windows-based organisation, we were fortunate that Stephen is an expert in using Chef on Windows as well as on Linux, and so our classroom examples used familiar Windows Server 2008 R2 examples.
Pairing to Build Real Stuff
However, we also knew that having only classroom sessions would not give attendees a real sense of using Chef in our server environments, so we also arranged two days of pairing following the classroom days for people to create real infrastructure from our backlog. This allowed people to try out their classroom knowledge on infrastructure automation tasks which would result in real servers running in our build environment.
A third kind of sessions we had were short whiteboard meetings with various people with an interest in infrastructure: the head of security, the head of IT operations, our lead software architects, etc. This helped us to flesh out and test the ideas and practices which we’ve been using with Chef so far, and compare these to the extensive experience which Stephen has through engagements with Atalanta Systems.
Chef Learning Outcomes
During the classroom sessions, we kept to a strict rhythm of 25 minutes’ work, and 5 minutes’ break, governed by Stephen’s iPad countdown timer (this is known as the ‘pomodoro technique‘). During the breaks we were forbidden from discussing the training, and during the work sessions, forbidden from doing anything except the training. Although the ‘enforced breaks’ were a little strange at first, I think most people soon appreciated how effective the group became when everyone was focussed on the training tasks, and we settled into the pattern of focus/relax/focus/relax, which helped new material and concepts to sink in mentally.
On the first day we covered the core concepts such as Node, Run List, Resource, Provider, Recipe, Cookbook, and how all these interact. By the end of the second day, we were installing and configuring IIS from a Chef recipe, using existing Resources and Providers, and becoming comfortable with the knife tool.
The following week, we had two full days to pair on real infrastructure tasks using Chef. At the moment, for our build infrastructure we’re using a single instance of the open source Chef server (version 10.14), so some of the nice new features in Chef 11 were not yet available to us, but we’ll be upgrading to Chef 11 over the next few months. We split into two Dev teams and tackled a set of tasks each, including:
- A new CI server for the Database team to use for pipelining their DB update scripts
- A new CI server for the Suntrap dev team for their cool new applications like Live Departure Boards
- A new server to host a set of tools and metrics-gathering applications
- Moving our installers/RPMs server (used by many Chef recipes) to a more powerful server with more disk space
As we ran into problems with our cookbooks we tracked the issues in Jira (ideally, we’d track them against the cookbook Git repos; we plan to use either Stash or Gitlab in future to provide this repo+tickets feature):
The pairing helped to consolidate the classroom-based learning, and it certainly helps to focus the mind when you’re working on a real server which will provide services to your team or another team.
Advanced Classroom Session
The final session we had with Stephen was an advanced classroom session. This was less prescribed than the initial two-day classroom sessions, which was great, as we could vote on the subjects we wanted to cover during the day:
We worked in pairs and also all together (using tmux to share console sessions!), taking turns to write Chef recipes initially code-first, and then – after we covered the TDD techniques for Chef – test-first:
I have produced a clearer version of this diagram at the start of this post, showing the different tools to use at each layer of the testing ‘onion’:
We saw how foodcritic and chefspec (which itself uses rspec) can be used for a really rapid test feedback cycle, because chefspec does a ‘mock converge’ on the node (i.e. chef-client does not run for these tests), keeping them rapid and local. For Chef integration tests, minitest comes in, running tests at the end of a chef-client run to validate actual node convergence.
We’re looking forward to using Chef more over the coming months as we bring more of our infrastructure under code-defined policy and automation.