A Two Coffee Problem: November 2018

We all spend our days following the processes and practices of software development, the final step of all our hard work is deployment to production.

Viewed in its simplest terms this is merely swapping out our latest and greatest bits and bytes for our previous efforts, we've completed all our testing and can deploy with a 100% confidence. Whilst this is the perfection we are aiming for no development team is ever beyond making mistakes.

Software development is a non-trivial discipline and some of our deployments will unintentionally introduce problems that we will have to deal with. With this in mind what strategies can we use to make our lives easier?

Update in Place

The simplest possible approach is to deploy your new code to your existing infrastructure by replacing the currently running code. However what this approach gains in simplicity it loses in the ability to deal with failure.

Firstly this means your infrastructure is not in a controlled state. By definition with this approach your server are pets, they are long serving machines that are the product of multiple deployments. This would make it very difficult to recreate this exact infrastructure in light of a catastrophic failure, it is also likely to make it harder for you to keep your lower environments in a state that is representative of your production environment.

Secondly this approach makes it impossible to roll-back your changes should the need arise. Yes you can re-deploy your old code but this simply represents another deployment to an environment that you have already lost faith in, it does not given the certainty that if you take this action your problems will definitely be resolved.

In light of these drawbacks we need an approach that provides more certainty and gives us an effective escape plan should the worst happen.

Blue\Green

One such approach is Blue\Green deployments.

With this approach you maintain two identical production environment, Blue and Green. At any moment in time one is your production environment and one provides a staging post for your next release.

When the time comes you deploy into for example the Blue environment and complete the necessary commission testing to give the confidence to release. At this point you switch your production traffic from pointing at your Green environment to you new and shiny Blue environment.

Because you completed your commission testing against these exact server running this exact deployment of code you can be confident that this final switch of traffic is relatively risk free. Should the worst happen and you need to roll back your release you simply switch the traffic back to the Green environment. Because this is the exact environment that was previously dealing with your production traffic this is a clean and precise roll back to a previous state.

Having two production environment, combined with an effective roll back strategy, also gives you freedom to destroy, rebuild and reconfigure your infrastructure.

One note of caution, whilst traffic switching and duplication of environments may provide deployments benefits when it comes to your code the same cannot be said of your databases. Duplicating your production databases isn't likely to be practical and any roll back of your code cannot simply throw away any data written in between deployments.

The only real tactic to combat this is to try and avoid database changes that are destructive or not backwards compatible.

Canary Deployments

A further refinement can be made to the Blue\Green approach, even with that approach you switch all your production traffic and put all of your users on the new code base in one swoop.

It is also possible to phase this approach to gradually move users across to the new release, this limits exposure if problems are found and allows them to be fixed prior to all your users being affected.

The groups of users that are directed towards the new code, could be random, users that meet a certain criteria based on the make-up of the release or users that may take a more forgiving view of any problems. This could be internal\staff users or users that have signed up to be part of beta releases in order to be the first to try out your new features.

As your confidence in the release grows you can dial up the percentage of users exposed to the new code.

The same warning about databases applies equally here when you have the potential to have different users writing and reading different data.

In a DevOps environment deployment should never be nerve wrecking because you don't trust the process. Errors and bugs are a reality of life as a software engineer but the mechanism you use to ship your software should be well understood and trusted.

The final hurdle should be no more daunting or arduous than any other barrier you faced whilst developing your release. Fear in your deployment process will encourage you to release less frequently and this doesn't benefit you or your users.

A Two Coffee Problem

Sunday 4 November 2018

The Final Step

Blog Archive