Continuous Deployment, Code Review and Pre-Tested Commits on Digg4
One of the exciting things, from a development perspective, about Digg4 is continuous deployment - when developers fix a bug or add a new feature, there's no need to wait for a scheduled release. Instead, the change can go live right away. This is great - the turnaround time for a change drops dramatically. But this also opens up the possibility of broken changes going live, since there won't be manual testing and signoffs before the changes go live. Figuring out how to balance the speed and agility of continuous deployment with the requirements for stability and reliability has been, and continues to be, a major challenge for us here. Over the last couple months, we've rolled out a workflow which we believe helps achieve that balance without sacrificing an excess of stability or agility.
There are a few pieces to the workflow we've put in place. First, there are the tools we were already using before moving to continuous deployment: git for source control, Hudson for continuous integration and build management, Selenium for UI testing (via the ridiculously handy Sauce OnDemand service from Sauce Labs), and puppet for managing what's running on our servers. The major addition to that stack has been Gerrit, a code review system for git, originating with Google's Android development group. Mandatory code review for all changes can seem like a pain, but there's nothing like having another developer actually look at your code to find the sort of design snafus automated testing may not catch.
But that's not all we use Gerrit for. Alongside code review, Gerrit by default also provides a "Verified" flag. This is meant to be used to record that the change in question builds and tests properly. By combining this with Hudson, we are able to achieve pre-tested commits: no change can make it to our master branch without passing a battery of unit tests, successfully packaging into a Debian package, and in the case of our frontend code, installing on a test box and passing a battery of Selenium smoke tests. In order to make this work, we rely on the fantastic Gerrit Trigger plugin for Hudson, written and maintained by Robert Sandell and the rest of the Hudson team at Sony Ericsson.
The Gerrit Trigger plugin takes advantage of Gerrit's event streaming functionality to listen for the creation of new patchsets. Whenever it sees a new patchset come down the stream, it checks whether any Hudson projects are configured to build the Gerrit project the patchset belongs to, and if so, it launches a build of that Hudson project, making sure that build pulls down the change we want to verify. When all running builds for a patchset finish, the Gerrit Trigger listener will send a message back to Gerrit with the result. The exact commands, values, etc are configurable - in our workflow, we just use the Verified flag, with a -1 when the build fails and a +1 when it passes. We've also made a minor tweak to the Gerrit Trigger plugin to allow for rebuilding a patchset if the build failed for reasons other than legitimate test failures - the team at Sony Ericsson has a more formal solution for this coming in a future release of the plugin.
Why do we only run a subset of unit tests and Selenium tests in our pre-tested commit builds? First and foremost, due to time - we want the pre-tested commit process to be as fast as possible, since every version of every change will go through it. The next step in the continuous deployment process will run less frequently, so it can take a little longer and be more stringent.
Once a change has been code reviewed by another developer and verified by a Hudson build, it can be merged down to the master branch. When a change hits the master branch, a full build is run, with the full set of unit tests run against the code. Next, the code is packaged and installed onto an internal staging environment, and the full set of Selenium tests are run against that environment. If all those tests pass successfully, we deploy the tested code to production.
Now, is this perfect? Most definitely not - we end up having to manually rebuild more pre-tested commit builds than we'd like, due to instability in some of our tests or external factors and there are plenty of other areas that could be tweaked to make the process smoother or more reliable. But overall, we've achieved quite a lot here - no change is going to make it onto our master branch without being reviewed by another developer, and being built to verify that tests covering that change pass. No change is going to make it onto our internal site without passing a full battery of unit tests. And nothing is going to go on from our internal site to production without passing an additional set of Selenium tests. Does this mean we're never going to introduce bugs to our live site? Of course not - but we're going to keep the number of bugs to hit the live site to a minimum, and we've made it easy and fast to get bug fixes live as well.