A Two Coffee Problem: October 2024

Saturday, 19 October 2024

Underpinning Kubernetes

Kubernetes is the de facto choice for deploying containerized applications at scale. Because of that we are all now familiar with its building blocks that allow us to build our applications such as deployments, ingresses, services and pods.

But what is it that underpins these entities and how does Kubernetes manage this infrastructure. The answer lies in the Kubernetes control plane and the nodes it deploys our applications too.

The control plane manages and makes decisions related to the management of the cluster, in this sense it acts as the clusters brain. It also provides an interface to allow us to interact with the cluster for monitoring and management.

The nodes are the work horses of the cluster where infrastructure and applications are deployed and run.

Both the control plane and the nodes comprise a number of elements each with their own role in providing us with an environment in which to run our applications.

Control Plane Components

The control plane is made up of a number of components responsible for the management of the cluster, in general these components run on dedicated infrastructure away from the pods running applications.

The kube-apiserver provides the control plane with a front end via a suite of REST APIs. These APIs are resource based allowing for interactions with the various elements of Kubernetes such as deployments, services and pods.

In order to manage the cluster the control plane needs to be able to store data related to its state, this is provided by etcd in the form of a highly available key value store.

The kube-scheduler is responsible for detecting when new pods are required and allocating a node for them to run on. Many factors are taken into account when allocating a node including resource and affinity requirements, software or hardware restrictions and data locality.

The control plane contains a number of controllers responsible for different aspects of the management of the cluster, these controllers are all managed by the kube-controller-manager. In general each controller is responsible for monitoring and managing one or more resources within the clusters, as an example the Node Controller monitors for and responds to nodes failing.

By far the most common way of standing up a Kubernetes cluster is via a cloud provider. The cloud-controller-manager provides a bridge between the internal concepts of Kubernetes and the cloud provider specific API that is helping to implement them. An example of this would be the Route Controller responsible for configuring routes in the underlying cloud infrastructure.

Node Components

The node components run on every node in the cluster that are running pods providing the runtime environment for applications.

The kubelet is responsible for receiving PodSpecs and ensuring that the pods and containers it describes are running and healthy on the node.

An important element in being able to run containers is the Container Runtime. This runtime provides the mechanism for the node to act as a host for containers. This includes pulling the images from a container registry as well as managing their lifecycle. Kubernetes supports a number of different runtimes with this being a choice to be made when you are constructing your cluster.

An optional component is the kube-proxy that maintains network rules on the node that plays an important role in implementing the services concept within the cluster.

Add Ons

In order to allow the functionality of a cluster to be extended Kubernetes provides the ability to define Add ons.

Add ons cover many different pieces of functionality.

Some relate to networking by providing internal DNS for the cluster allowing for service discovery, or by providing load balancers to distribute traffic among the clusters nodes. Others relate to the provisioning of storage for use by the application running within the cluster. Another important aspect is security with some add ons allowing for security policies to be applied to clusters resources and applications.

Any add ons you choose to use are installed within the cluster with the above examples by no means being an exhaustive list.

As an application developer deploying code into a cluster you don't necessarily need a working knowledge of how this infrastructure is being underpinned. Bit I'm a believer that having an understanding of the environment where your code will run will help you write better code.

That isn't to say that you need to become an expert, but a working knowledge of the building blocks and the roles they play will help you become a more well rounded engineer and enable you to make better decisions.

Sunday, 13 October 2024

Compiling Knowledge

Any software engineer who works with a compiled language will know the almost religious concept of the build. Whether you've broken the build, whether you've declared that it builds on my machine, or whether you've ever felt like you are in a life or death struggle with the compiler. The build is a process that must happen to turn your toil into something useful for users to interact with.

But what is actually happening when your code is being complied? In this post we are certainly not going to take a deep dive into compiler theory, it takes a special breed of engineer to work in that realm, but an understanding of the various processes involved can be helpful on the road to becoming a well rounded developer.

From One to Another

To start at the beginning, why is a compiler needed? Software by the time it runs on a CPU is a set of pretty simple operations knows as the instruction set. These instructions involve simple mathematical and logical operations alongside moving data between registers and areas of memory.

Whilst it is possible to program at this level using assembly language, it would be an impossibly difficult task to write software at scale. As engineers we want to be able to code at a higher level.

Compilers give us the ability to do that by acting as translators, they take the software we've written in a high level language such as C++, C#, Java etc and turn this into a program the CPU can run.

That simple description of what a compiler does belies the complexity of what it takes to achieve that outcome, so it shouldn't be too much of a surprise that implementing it takes several different process and phases.

Phases

The first phase of compilation is Lexical Analysis, this involves reading the input source code and breaking it up into its constituent parts, usually referred to as tokens.

The next phase is Syntax Analysis, also know as parsing. This is where the compiler ensures that the tokens representing the input source code conform to the grammer of the programming language. The output of this stage is something called the Abstract Syntax Tree (AST), this represents the structure of the code as described by a series of interconnected nodes in a tree structure representing paths through the code.

Next comes Semantic Analysis, it is at this stage that the compiler checks that the code actually makes sense and obeys the rules of the programming language including its type system. The compiler is checking that variables are declared correctly, that functions are called correctly and any other semantic errors that may exist in the source code.

Once these analysis phases are complete the compiler can move onto Intermediate Code Generation. At this stage the compiler generates an intermediate representation of what will become the final program that is easier to translate into the machine code the CPU can run.

The compiler will then run an Optimisation stage to apply certain optimisations to the intermediate code to improve overall performance.

Finally the compiler moves onto Code Generation in order to produce the final binary, at this stage the high level language of the input source code has been converted into an executable that can be run on the target CPU.

Front End, Middle End and Backend

The phases described above are often segregated into front end, middle end and backend. This enables a layered approach to be taken to the architecture of the compiler and allows for a certain degree of independence. This means different teams can work on these areas of the compiler as well as making it possible for different parts of compilers to be re-used and shared.

Front end usually refers to the initial analysis phases and is specific to a particular programming language. Should any of the code fail this analysis errors and warnings will be generated to indicate to the developer which lines of source code are incorrect. In this sense the front end is the most visible part of the compiler that developers will interact with.

The middle end is generally responsible for optimisation. Many compilers will have settings for how aggressive this optimisation is and depending on the target environments may also distinguish between optimising for speed or memory footprint.

The backend represents the final stage where code that can actually run on the CPU is generated.

This layering allows for example for front ends related to different programming languages to be combined with different backends that produce code for particular families of CPUs, with the intermediary representation acting as the glue to bind them together.

As we said at the start of this post understanding exactly how compilers work is a large undertaking. But having any appreciation of the basic architecture and phases will help you deal with those battles you may have when trying to build your software. Compiler messages may sometimes seem unclear or frustrating so this knowledge may save valuable time in figuring out what you need to do to keep the compiler happy.