A Two Coffee Problem: September 2024

Saturday, 14 September 2024

Terraforming Your World

Software Engineers are very good at managing source code. We have developed effective strategies and tools to allow us to branch, version, review, cherry pick and revert changes. It is for this reason we've been keen to try and control all aspects of our engineering environment in the same way.

One such area is the infrastructure on which our code runs.

Infrastructure as Code (IaC) is the process of managing and deploying compute, network and storage resources via files that can be part of the source control process.

Traditionally these resources may have been managed via manual interactions with a portal or front end application from your cloud provider of choice. But manual processes are prone to error and inconsistency making it difficult and time consuming to manage and operate infrastructure in this way.

Environments might not always be created in exactly the same way, and in the event of a problem there is a lack of an effective change log to enable changes to be reverted back to a known good state.

One of the tools that attempts to allow infrastructure to be developed in the same way as software is Terraform by Hashicorp.

Terraform

Terraform allows required infrastructure to be defined in configuration files where the indicated resources are created within the cloud provider via interaction with their APIs.

These interactions are encapsulated via a Provider which defines the resources that can be created and managed within that cloud. These providers are declared within the configuration files and can be pulled in from the Terraform Registry which acts like a package manager for Terraform.

Working with Terraform follows a three stage process.

Firstly in the Write phase the configuration files which describe the required infrastructure are created. These files can span multiple cloud providers and can include anything from a VPC to compute resource to networking infrastructure.

Next comes the Plan phase. Terraform is a state driven tool, it records the current state of your infrastructure and applies the necessary changes based on the updated desired state in the configuration files. As part of the plan phase Terraform creates a plan of the actions that must be taken to move the current state of the infrastructure to match the desired state, whether this be creating, changing or deleting elements. This plan can then be reviewed to ensure it matches with the intention behind the configuration changes.

Finally in the Apply phase Terraform uses the cloud provider APIs, via the associated provider, to ensure the deployed infrastructure aligns with the new state of the configuration files.

Objects

Terraform supports a number of different objects that can be described in the configuration files, presented below is not an exhaustive list but describes some of the more fundamental elements that are used to manage common infrastructure requirements.

Firstly we have Resources which are used to describe any object that should be created in the cloud provider, this could be anything from a compute instance to a DNS record.

A Data Source provides Terraform with access to data defined outside of Terraform. This might be data from pre-existing infrastructure, configuration held outside of Terraform or a database.

Input Variables allow configuration files to be customised to a specific use case increasing their reusability. Output Variables allow the Terraform process to return certain data about the infrastructure that has been created, this might be to further document the infrastructure that is now in place or act as an input to another Terraform process managing separate but connected infrastructure.

Modules act like libraries in a software development context to allow infrastructure configuration to be packaged and re-used across many different applications.

Syntax

We have made many references in this article to configuration files but what do these actually consist of?

They are defined using the Hashicrop Configuration Language (HCL) and follow a format similar to JSON, in fact it is also possible for Terraform to work with JSON directly.

All content is defined with structures called blocks:

resource "my_provider_object_type" "my_object" {

some_argument = "abc123"

}

Presented above is an extremely simple example of a block.

Firstly we define the type of the block, in this case a resource. Then comes a series of labels with the number of labels being dependent on the block type. For a resource block two labels are required, the first describing the type of the resource as defined by the provider, the second being a name that can be used to refer to the resource in other areas of the configuration files.

Inside the block the resource may require a series of arguments to be provided to it in order to configure and control the underlying cloud resource that will be created.

This post hasn't been intended to be a deep dive into Terraform, instead I've been trying to stoke your interest in the ways an IaC approach can help you apply the same rigour and process to your infrastructure management as you do to your source code.

Many of the concepts within Terraform have a close alignment to those in software engineering. Using an IaC approach alongside traditional source code management can help foster a DevOps mentality where the team responsible for writing the software can also be responsible for managing the infrastructure it runs on. Not only will this allow their knowledge of the software to shape the creation of the infrastructure but also in reverse knowing where and on what infrastructure their code will run may well allow them to write better software.

Tuesday, 3 September 2024

Being at the Helm

The majority of containerized applications that are being deployed at any reasonable scale will likely be using some flavour of Kubernetes.

As a container orchestration platform Kubernetes allows the deployment of multiple applications to be organised around the concepts of pods, services, ingress, deployments etc defined in YAML configuration files.

In this post we won't go into detail around these concepts and will assume a familiarity with their purpose and operation.

Whilst Kubernetes makes this process simpler, when it's being used for multiple applications managing the large number of YAML files can come with its own challenges.

This is where Helm comes into the picture. Described as a package manager for Kubernetes Helm provides a way to manage updates to the YAML configuration files and version them to ensure consistency and allow for re-use.

I initially didn't quite understand the notation of Helm being a package manager but as I've used it more I've come to realise why this is how it's described.

Charts and Releases

The Helm architecture consists of two main elements, the client and the library.

The Helm client provides a command line interface (CLI) to indicate what needs to be updated in a cluster via a collection of standard Kubernetes YAML files, the library then contains the functionality to interact with the cluster to make this happen.

The collection of YAML files passed to the client are referred to as Helm Charts, they define the Kubernetes objects such as deployments, ingress, services etc.

The act of the library using these YAML files to update the cluster is referred to as a Release.

So far you maybe thinking that you can achieve the same outcome by applying the same YAML files to Kubernetes directly using the kubectl CLI. Whilst this is true where Helm adds value is where you need to deploy the same application into multiple environments with certain configuration or set-up differences.

Values, Parametrisation and Repositories

It will be a common practice to need to deploy an application to multiple environments with a differing numbers of instances, servicing requests on different domains or any other differences between testing and production environments.

Using Kubernetes directly means either maintaining multiple copies of YAML files or having some process to alter them prior to them being applied to the cluster. Both of these approaches have the potential to cause inconsistency and errors.

To avoid this Helm provides a templating engine to allow a single set of YAML files to become parameterised. The syntax of this templating when you first view it can be quite daunting, while we won't go into detail about it here, like with any language over time as you use it more it will eventually click.

Alongside these parameterised YAML files you specify a Values YAML that defines the environment specific values that should be applied to the parameterised YAML defining the Kubernetes objects.

This allows the YAML files to be consistent between all environments in terms of overall structure whilst varying where they need too. This combination of a Values YAML file and the parameterised YAML defining the Kubernetes objects are what we refer to as Helm Charts.

It maybe that your application is something that needs to be deployed in multiple clusters, for these situations your Helm charts can be provided via a repository in a similar way that we might make re-usable Docker images available.

I think it's at this point that describing Helm as a package manager starts to make sense.

When we think about code package managers we think of re-usable libraries where functionality can be shared and customised in multiple use cases. Helm is allowing us to achieve the same thing with Kubernetes. Without needing to necessarily understand all that is required to deploy the application we can pull down Helm charts, specify our custom values and start using the functionality in our cluster.

When to Use

The benefit you will achieve by using Helm is largely tied to the scale of the Kubernetes cluster you are managing and the number of applications you are deploying.

If you have a simple cluster with a minimal applications deployed maybe the overhead of dealing with Kubernetes directly is manageable. If you have a large cluster with multiple applications or you have many clusters each with different applications deployed you will benefit more due to the ability it offers to ensure consistency.

You will also benefit from using Helm if you need to deploy a lot of 3rd party applications into your cluster, whether this might be to manage databases, ingress controllers, certificate stores, monitoring or any other cross cutting concern you need to have available in he cluster.

The package manager nature of Helm will reduce the overhead in managing all these dependencies in the same way that you manage dependencies at a code level.

As with many tools your need for Helm may grow over time as you estate increases in complexity. If like me you didn't immediately comprehend the nature and purpose of Helm then hopefully this post has helped you recognise what it can offer and how it could benefit your use case.