Saturday, 9 November 2024

Network of Networks

 


When we're searching for analogies to describe the operation of the internet we often fall back on that of posting a letter. Each packet of data we need to send and receive can be compared to a parcel we need to get from A to B.

In order to achieve that you need to identify the best route to take between those two points. For parcels these routes are relatively static, for the internet that isn't the case with routes being much more dynamic.

Border Gateway Protocol (BGP) is a path vector routing protocol that makes routing decisions on how to get data packets to their intended destination. As internet users we may not directly interact with it but without it the internet wouldn't be functional across the globe as it is today.

Autonomous Systems

The internet as the name suggests is a network of networks, it isn't one cohesive systems, it is a large number of individual networks. These individual networks in the context of routing traffic are referred to as Autonomous Systems (AS).

If we continue our postal analogy then each AS is like an individual postal region covering a town or county.

Each AS consists of routers that know how to route traffic internally but need to rely on connections to neighbouring AS to route traffic outside of its region. Because the internet is a dynamic system with routes appearing and disappearing each AS needs to be kept up to date with routes each other AS can help with.

This is achieved via peering sessions where AS connect to each other in order to build up the full picture of how to route traffic to destinations outside of each others boundary. BGP is the mechanism that allows this routing information to be exchanged.

Not Just The Shortest

In order to become a AS you must register with the Internet Assigned Numbers Authority (IANA) which will assign the AS to a Regional Internet Registry (RIR) as well as allocating a 16 or 32 bit identifier. As of the late 2010s there were around 64,000 AS registered worldwide a number which will have continued to grow.

AS tend to be managed and run by large organisations, typically Internet Service Providers (ISP) but also large tech companies, governments or large institutions such as universities.

Often there are multiple possible routes for a packet to reach its destination. In order to allow the best route to be chosen BGP allows AS to apply attributes to each route that can be factored into the decision on which route to take.

These attributes may indicate hop count (how many steps are involved in getting to the destination) as well as weight where an AS can indicate which route it would prefer traffic to take. 

Because some AS are managed by commercial business this creates an interesting quirk in how traffic is routed. It would be assumed that the shortest or fastest route would be chosen. But because some companies will charge for their AS to handle traffic or may not want to help competitors then commercial relationships are sometimes factored into which route to take.

When BGP Goes Wrong

As described earlier the management of routing traffic through the internet is autonomous, it is far too complicated and dynamic to be managed by hand. An AS uses BGP to announce which routes it can provide access to, other AS then use this information to route traffic from within inside their network.

There have been several examples of AS accidentally announcing they can provide access to routes which they can't in fact handle. In 2004 a Turkish ISP accidentally announced it could handle any route on the internet, as this misinformation spread to more and more AS internet access ground to a halt across a large part of the globe. In 2008 a Pakistani ISP which was blocking traffic internally to YouTube accidentally started announcing externally it could handle connections to YouTube and caused an outage in the region to that service.

These are both examples of BGP hijacking, in these cases the announcement of false routes was accidental but this isn't always the case. This can sometimes be done maliciously in order to lure traffic meant for a legitimate site to an imposter, in 2018 hackers announced bad BGP routes for traffic being hosted in Amazon AWS and were able to steal a large amount of cryptocurrency.

In an effort to combat this kind of activity Resource Public Key Infrastructure (RPKI) allows BGP data to be cryptographically signed in order to validate that an AS is authorised to announce a route for a particular resource.

The internet is such an important part of our day to day lives its easy to consider it a slick homogeneous system that always works, and it is true to say that its a miracle of engineering that its been able to scale to the network of networks that we use today. However it is actually a very large collection of individual elements and sometime its fragility is exposed and we see it is all too easy for it to falter. 

Saturday, 19 October 2024

Underpinning Kubernetes

 


Kubernetes is the de facto choice for deploying containerized applications at scale. Because of that we are all now familiar with its building blocks that allow us to build our applications such as deployments, ingresses, services and pods.

But what is it that underpins these entities and how does Kubernetes manage this infrastructure. The answer lies in the Kubernetes control plane and the nodes it deploys our applications too.

The control plane manages and makes decisions related to the management of the cluster, in this sense it acts as the clusters brain. It also provides an interface to allow us to interact with the cluster for monitoring and management.

The nodes are the work horses of the cluster where infrastructure and applications are deployed and run.

Both the control plane and the nodes comprise a number of elements each with their own role in providing us with an environment in which to run our applications.

Control Plane Components

The control plane is made up of a number of components responsible for the management of the cluster, in general these components run on dedicated infrastructure away from the pods running applications.

The kube-apiserver provides the control plane with a front end via a suite of REST APIs. These APIs are resource based allowing for interactions with the various elements of Kubernetes such as deployments, services and pods.

In order to manage the cluster the control plane needs to be able to store data related to its state, this is provided by etcd in the form of a highly available key value store.

The kube-scheduler is responsible for detecting when new pods are required and allocating a node for them to run on. Many factors are taken into account when allocating a node including resource and affinity requirements, software or hardware restrictions and data locality.

The control plane contains a number of controllers responsible for different aspects of the management of the cluster, these controllers are all managed by the kube-controller-manager. In general each controller is responsible for monitoring and managing one or more resources within the clusters, as an example the Node Controller monitors for and responds to nodes failing. 

By far the most common way of standing up a Kubernetes cluster is via a cloud provider. The cloud-controller-manager provides a bridge between the internal concepts of Kubernetes and the cloud provider specific API that is helping to implement them. An example of this would be the Route Controller responsible for configuring routes in the underlying cloud infrastructure.

Node Components

The node components run on every node in the cluster that are running pods providing the runtime environment for applications.

The kubelet is responsible for receiving PodSpecs and ensuring that the pods and containers it describes are running and healthy on the node.

An important element in being able to run containers is the Container Runtime. This runtime provides the mechanism for the node to act as a host for containers. This includes pulling the images from a container registry as well as managing their lifecycle. Kubernetes supports a number of different runtimes with this being a choice to be made when you are constructing your cluster.

An optional component is the kube-proxy that maintains network rules on the node that plays an important role in implementing the services concept within the cluster.

Add Ons

In order to allow the functionality of a cluster to be extended Kubernetes provides the ability to define Add ons.

Add ons cover many different pieces of functionality.

Some relate to networking by providing internal DNS for the cluster allowing for service discovery, or by providing load balancers to distribute traffic among the clusters nodes. Others relate to the provisioning of storage for use by the application running within the cluster. Another important aspect is security with some add ons allowing for security policies to be applied to clusters resources and applications.

Any add ons you choose to use are installed within the cluster with the above examples by no means being an exhaustive list.

As an application developer deploying code into a cluster you don't necessarily need a working knowledge of how this infrastructure is being underpinned. Bit I'm a believer that having an understanding of the environment where your code will run will help you write better code.

That isn't to say that you need to become an expert, but a working knowledge of the building blocks and the roles they play will help you become a more well rounded engineer and enable you to make better decisions. 

Sunday, 13 October 2024

Compiling Knowledge

 


Any software engineer who works with a compiled language will know the almost religious concept of the build. Whether you've broken the build, whether you've declared that it builds on my machine, or whether you've ever felt like you are in a life or death struggle with the compiler. The build is a process that must happen to turn your toil into something useful for users to interact with.

But what is actually happening when your code is being complied? In this post we are certainly not going to take a deep dive into compiler theory, it takes a special breed of engineer to work in that realm, but an understanding of the various processes involved can be helpful on the road to becoming a well rounded developer.

From One to Another

To start at the beginning, why is a compiler needed? Software by the time it runs on a CPU is a set of pretty simple operations knows as the instruction set. These instructions involve simple mathematical and logical operations alongside moving data between registers and areas of memory.

Whilst it is possible to program at this level using assembly language, it would be an impossibly difficult task to write software at scale. As engineers we want to be able to code at a higher level.

Compilers give us the ability to do that by acting as translators, they take the software we've written in a high level language such as C++, C#, Java etc and turn this into a program the CPU can run.

That simple description of what a compiler does belies the complexity of what it takes to achieve that outcome, so it shouldn't be too much of a surprise that implementing it takes several different process and phases.     

Phases

The first phase of compilation is Lexical Analysis, this involves reading the input source code and breaking it up into its constituent parts, usually referred to as tokens. 

The next phase is Syntax Analysis, also know as parsing. This is where the compiler ensures that the tokens representing the input source code conform to the grammer of the programming language. The output of this stage is something called the Abstract Syntax Tree (AST), this represents the structure of the code as described by a series of interconnected nodes in a tree structure representing paths through the code.

Next comes Semantic Analysis, it is at this stage that the compiler checks that the code actually makes sense and obeys the rules of the programming language including its type system. The compiler is checking that variables are declared correctly, that functions are called correctly and any other semantic errors that may exist in the source code. 

Once these analysis phases are complete the compiler can move onto Intermediate Code Generation. At this stage the compiler generates an intermediate representation of what will become the final program that is easier to translate into the machine code the CPU can run.

The compiler will then run an Optimisation stage to apply certain optimisations to the intermediate code to improve overall performance.

Finally the compiler moves onto Code Generation in order to produce the final binary, at this stage the high level language of the input source code has been converted into an executable that can be run on the target CPU.    

Front End, Middle End and Backend

The phases described above are often segregated into front end, middle end and backend. This enables a layered approach to be taken to the architecture of the compiler and allows for a certain degree of independence. This means different teams can work on these areas of the compiler as well as making it possible for different parts of compilers to be re-used and shared.

Front end usually refers to the initial analysis phases and is specific to a particular programming language. Should any of the code fail this analysis errors and warnings will be generated to indicate to the developer which lines of source code are incorrect. In this sense the front end is the most visible part of the compiler that developers will interact with.

The middle end is generally responsible for optimisation. Many compilers will have settings for how aggressive this optimisation is and depending on the target environments may also distinguish between optimising for speed or memory footprint.

The backend represents the final stage where code that can actually run on the CPU is generated.

This layering allows for example for front ends related to different programming languages to be combined with different backends that produce code for particular families of CPUs, with the intermediary representation acting as the glue to bind them together.

As we said at the start of this post understanding exactly how compilers work is a large undertaking. But having any appreciation of the basic architecture and phases will help you deal with those battles you may have when trying to build your software. Compiler messages may sometimes seem unclear or frustrating so this knowledge may save valuable time in figuring out what you need to do to keep the compiler happy.

Saturday, 14 September 2024

Terraforming Your World

 


Software Engineers are very good at managing source code. We have developed effective strategies and tools to allow us to branch, version, review, cherry pick and revert changes. It is for this reason we've been keen to try and control all aspects of our engineering environment in the same way.

One such area is the infrastructure on which our code runs.

Infrastructure as Code (IaC) is the process of managing and deploying compute, network and storage resources via files that can be part of the source control process.

Traditionally these resources may have been managed via manual interactions with a portal or front end application from your cloud provider of choice. But manual processes are prone to error and inconsistency making it difficult and time consuming to manage and operate infrastructure in this way.

Environments might not always be created in exactly the same way, and in the event of a problem there is a lack of an effective change log to enable changes to be reverted back to a known good state.

One of the tools that attempts to allow infrastructure to be developed in the same way as software is Terraform by Hashicorp.

Terraform

Terraform allows required infrastructure to be defined in configuration files where the indicated resources are created within the cloud provider via interaction with their APIs.

These interactions are encapsulated via a Provider which defines the resources that can be created and managed within that cloud. These providers are declared within the configuration files and can be pulled in from the Terraform Registry which acts like a package manager for Terraform.

Working with Terraform follows a three stage process.

Firstly in the Write phase the configuration files which describe the required infrastructure are created. These files can span multiple cloud providers and can include anything from a VPC to compute resource to networking infrastructure.

Next comes the Plan phase. Terraform is a state driven tool, it records the current state of your infrastructure and applies the necessary changes based on the updated desired state in the configuration files. As part of the plan phase Terraform creates a plan of the actions that must be taken to move the current state of the infrastructure to match the desired state, whether this be creating, changing or deleting elements. This plan can then be reviewed to ensure it matches with the intention behind the configuration changes.

Finally in the Apply phase Terraform uses the cloud provider APIs, via the associated provider, to ensure the deployed infrastructure aligns with the new state of the configuration files.

Objects

Terraform supports a number of different objects that can be described in the configuration files, presented below is not an exhaustive list but describes some of the more fundamental elements that are used to manage common infrastructure requirements.

Firstly we have Resources which are used to describe any object that should be created in the cloud provider, this could be anything from a compute instance to a DNS record.

A Data Source provides Terraform with access to data defined outside of Terraform. This might be data from pre-existing infrastructure, configuration held outside of Terraform or a database.

Input Variables allow configuration files to be customised to a specific use case increasing their reusability. Output Variables allow the Terraform process to return certain data about the infrastructure that has been created, this might be to further document the infrastructure that is now in place or act as an input to another Terraform process managing separate but connected infrastructure. 

Modules act like libraries in a software development context to allow infrastructure configuration to be packaged and re-used across many different applications.

Syntax

We have made many references in this article to configuration files but what do these actually consist of?

They are defined using the Hashicrop Configuration Language (HCL) and follow a format similar to JSON, in fact it is also possible for Terraform to work with JSON directly.

All content is defined with structures called blocks:

resource "my_provider_object_type" "my_object" {
  some_argument = "abc123"
}

Presented above is an extremely simple example of a block.

Firstly we define the type of the block, in this case a resource. Then comes a series of labels with the number of labels being dependent on the block type. For a resource block two labels are required, the first describing the type of the resource as defined by the provider, the second being a name that can be used to refer to the resource in other areas of the configuration files.

Inside the block the resource may require a series of arguments to be provided to it in order to configure and control the underlying cloud resource that will be created.

This post hasn't been intended to be a deep dive into Terraform, instead I've been trying to stoke your interest in the ways an IaC approach can help you apply the same rigour and process to your infrastructure management as you do to your source code.

Many of the concepts within Terraform have a close alignment to those in software engineering. Using an IaC approach alongside traditional source code management can help foster a DevOps mentality where the team responsible for writing the software can also be responsible for managing the infrastructure it runs on. Not only will this allow their knowledge of the software to shape the creation of the infrastructure but also in reverse knowing where and on what infrastructure their code will run may well allow them to write better software.


Tuesday, 3 September 2024

Being at the Helm

 


The majority of containerized applications that are being deployed at any reasonable scale will likely be using some flavour of Kubernetes.

As a container orchestration platform Kubernetes allows the deployment of multiple applications to be organised around the concepts of pods, services, ingress, deployments etc defined in YAML configuration files. 

In this post we won't go into detail around these concepts and will assume a familiarity with their purpose and operation.

Whilst Kubernetes makes this process simpler, when it's being used for multiple applications managing the large number of YAML files can come with its own challenges.

This is where Helm comes into the picture. Described as a package manager for Kubernetes Helm provides a way to manage updates to the YAML configuration files and version them to ensure consistency and allow for re-use.

I initially didn't quite understand the notation of Helm being a package manager but as I've used it more I've come to realise why this is how it's described.

Charts and Releases

The Helm architecture consists of two main elements, the client and the library.

The Helm client provides a command line interface (CLI) to indicate what needs to be updated in a cluster via a collection of standard Kubernetes YAML files, the library then contains the functionality to interact with the cluster to make this happen.

The collection of YAML files passed to the client are referred to as Helm Charts, they define the Kubernetes objects such as deployments, ingress, services etc.

The act of the library using these YAML files to update the cluster is referred to as a Release.

So far you maybe thinking that you can achieve the same outcome by applying the same YAML files to Kubernetes directly using the kubectl CLI. Whilst this is true where Helm adds value is where you need to deploy the same application into multiple environments with certain configuration or set-up differences. 

Values, Parametrisation and Repositories

It will be a common practice to need to deploy an application to multiple environments with a differing numbers of instances, servicing requests on different domains or any other  differences between testing and production environments.

Using Kubernetes directly means either maintaining multiple copies of YAML files or having some process to alter them prior to them being applied to the cluster. Both of these approaches have the potential to cause inconsistency and errors.

To avoid this Helm provides a templating engine to allow a single set of YAML files to become parameterised. The syntax of this templating when you first view it can be quite daunting, while we won't go into detail about it here, like with any language over time as you use it more it will eventually click.

Alongside these parameterised YAML files you specify a Values YAML that defines the environment specific values that should be applied to the parameterised YAML defining the Kubernetes objects.

This allows the YAML files to be consistent between all environments in terms of overall structure whilst varying where they need too. This combination of a Values YAML file and the parameterised YAML defining the Kubernetes objects are what we refer to as Helm Charts.

It maybe that your application is something that needs to be deployed in multiple clusters, for these situations your Helm charts can be provided via a repository in a similar way that we might make re-usable Docker images available.

I think it's at this point that describing Helm as a package manager starts to make sense.

When we think about code package managers we think of re-usable libraries where functionality can be shared and customised in multiple use cases. Helm is allowing us to achieve the same thing with Kubernetes. Without needing to necessarily understand all that is required to deploy the application we can pull down Helm charts, specify our custom values and start using the functionality in our cluster.   

When to Use

The benefit you will achieve by using Helm is largely tied to the scale of the Kubernetes cluster you are managing and the number of applications you are deploying.

If you have a simple cluster with a minimal applications deployed maybe the overhead of dealing with Kubernetes directly is manageable. If you have a large cluster with multiple applications or you have many clusters each with different applications deployed you will benefit more due to the ability it offers to ensure consistency.

You will also benefit from using Helm if you need to deploy a lot of 3rd party applications into your cluster, whether this might be to manage databases, ingress controllers, certificate stores, monitoring or any other cross cutting concern you need to have available in he cluster.

The package manager nature of Helm will reduce the overhead in managing all these dependencies in the same way that you manage dependencies at a code level.

As with many tools your need for Helm may grow over time as you estate increases in complexity. If like me you didn't immediately comprehend the nature and purpose of Helm then hopefully this post has helped you recognise what it can offer and how it could benefit your use case.

Monday, 26 August 2024

Avoiding Toiling

 


Site Reliability Engineering (SRE) is the practice of applying software engineering principles to the management of infrastructure and operations.

Originating at Google in the early 2000s the sorts of things an SRE team might work on include system availability, latency and performance, efficiency, monitoring and the ability to deliver change.

Optimising these kinds of system aspects covers many different topics and areas, one of which is the management of toil.

Toil in this context is not work we don't particularly enjoy doing or don't find stimulating, it has a specific meaning defined by aspects other than our enjoyment of the tasks it involves.

What is Toil?

Toil is work that exhibits some or all of the following attributes.

It is Manual in nature, even if a human isn't necessarily doing the work it requires human initiation, monitoring or any other aspect that means a team member has to oversee its operation.

Toil is Repetitive, the times work has to be done may vary and may not necessarily be at regular intervals, but the task needs to be performed multiple times and will never necessarily be deemed finished.

It is Tactical meaning it is generally reactive, it has to be undertaken either in relation to something happening within the system for example when monitoring highlights something is failing or is sub-optimal.

It has No Enduring Value, this means it leaves the system in the same state as before the work happened. It hasn't improved any aspect of the system or eliminated the need for the work to happen again in the future.

It Scales with Service Growth. Some work items need to happen regardless of how much a system is used. This tends to be viewed as overhead and is simply the cost of having the system in the first place. Toil scales with system use meaning the more users you attract the greater the impact of the toil on your team.

Finally toil can be Automated, some tasks will always require human involvement, but for a task to be toil it must be possible for it to be automated.

What is Toils Impact?

It would be wrong to suggest that toil can be totally eliminated, having a production system being used by large numbers of people is always going to incur a certain amount of toil, and it is unlikely that the whole engineering effort of your organisation can be dedicated to removing it.

Also, much like technical debt, even if you do reach a point where you feel its eliminated the chances are a future change in the system will likely re-introduce it.

But also like technical debt the first step is to acknowledge toil exists, develop ways to be able to detect it and have a strategy for managing it and trying to keep it to a reasonable minimum.

Toils impact is that it engages your engineering resource on tasks that don't add to or improve your system. It may keep it up and running but that is a low ambition to have for any system 

It's also important to recognise that large amounts of toil is likely to impact a teams morale, very few engineers will embark on their career looking to spend large amounts of time on repetitive tasks that lead to no overall value.

The Alternative to Toil

The alternative to spending time on toil is to spend time on engineering. Engineering is a broad concept but in this context it means work that improves the system itself or enables to to be managed in a more efficient way.

As we said previously completely eliminating toil is probably an unrealistic aim. But it is possible to measure how much time your team is spending on toil related tasks. Once you are able to estimate this then it is possible both to set a sensible limit on how much time is spent on these tasks but also measure the effectiveness of any engineering activities designed to reduce it.

This engineering activity might relate to software engineering, refactoring code for performance or reliability, automating testing or certain aspects of the build and deployment pipeline. It might also be more aimed at system engineering, analysing the correctness of the infrastructure the system is running on, analysing the nature of system failures or automating the management of infrastructure.

As previously stated we can view toil as a form of technical debt. In the early days of a system we may take certain shortcuts that at the time are manageable but as the system grows come with a bigger and bigger impact. Time spent trying to fix this debt will set you on a path for gradual system improvement, both for your users and the teams that work on the system.

Saturday, 13 July 2024

The Language of Love

 


Software engineers are often polyglots who will learn or be exposed to multiple programming languages over the course of their career. But I think most will always hold a special affection for the first language they learn, most likely because it's the first time they realise they have the ability to write code and achieve an outcome. Once that bug bites it steers you towards a path where you continue to hone that craft.

For me that language is C and its successor C++.

Potentially my view is biased because of the things I've outlined above but I believe C is a very good language for all potential developers to start with. If you learn how to code close to the metal it will develop skills and a way of thinking that will be of benefit to you as you progress onto high level languages with greater levels of abstraction from how your code is actually running.

In The Beginning

In the late 1960s and early 1970s as the Unix operating system was being developed engineers realised that they needed a program language that could be used to write utilities and programs to run on the newly forming platform.

One of the initial protagonists in this field was Ken Thompson.

After dismissing existing programming languages such as Fortran he started to develop a variant of an existing language called BCLP. He concentrated on simplifying the language structures and making it less verbose. He called this new language B with the first version being released around 1969.

In 1971 Dennis Ritchie continued to develop B to utilise features of more modern computers as well as adding new data types. This culminated in the release of New B. Throughout 1972 the development continued adding more data types, arrays and pointers and the language was renamed C.

In 1973 Unix was re-written in C with even more data types being added as C continued to be developed through the 1970s. This eventually resulted in the release of what many consider the be the definitive book on the C programming language, written by Brian Kernighan and Dennis Ritchie The C Programming Language became known as K&R C and became the unofficial specification for the language.

C has continued to be under active development right up until the present with C23 expected to be released in 2024.

C with Classes

In 1979 Bjarne Stroustrup began work on what he deemed "C with Classes".

Adding classes to C turned into an object oriented language, where C had found a home in embedded programming running close to the metal, adding classes made it more suitable for large scale software development.

In 1982 Stroustrup began work on C++ adding new features such as inheritance, polymorphism, virtual functions and operator overloading. In 1985 he released the book The C++ Programming Language which become the unofficial specification for the language with the first commercial version being released later that year.

Much like C, C++ has continued to be developed with new versions being released up until the present day.

Usage Today

Software Engineering is often considered to be a fast moving enterprise, and while many other programming languages have been developed over the lifetime of C and C++ both are still very widely used.

Often being used when performance is critical, the fact they run close to the metal allows for highly optimised code for use cases such as gaming, network appliances and operating systems.

Usage of C and C++ can often strike fear into the heart of developers who aren't experienced in their use. However the skills that usage of C and C++ can develop will prove invaluable even when working with higher level languages so I would encourage all software engineers to spend some time expose themselves to the languages.

Good engineers can apply their skills using any programming language, the principles and practices of good software development don't vary that much between languages or paradigms. But often there are better choices of language for certain situations, and C and C++ are still the correct choice for many applications.

Friday, 28 June 2024

Vulnerable From All Sides

 


Bugs in software engineering are a fact of life, no engineer no matter what he perceives his or her skill level to be has never written a non-trivial piece of software that didn't on some level have some bugs.

These maybe logical flaws, undesirable performance characteristics or unintended consequences when given bad input. As the security of software products has grown ever more important the presence of a particular kind of bug has become something we have to be very vigilant about. 

A vulnerability is a bug that has a detrimental impact on the security of software. Whereas any other bug might cause annoyance or frustration a vulnerability may have more meaningful consequences such as data loss, loss of system access or handing over control to untrusted third parties.

If bugs, and therefore vulnerabilities, are a fact of life then what can be done about them? Well as with anything being aware of them, and making others aware, is a step in the right direction.

Common Vulnerabilities and Exposures 

Run by the Mitre Corporation with funding from the US Division of Homeland Security Common Vulnerabilities and Exposures (CVE) is a glossary that catalogues software vulnerabilities and via the Common Vulnerability Scoring System (CVSS) provides them with a score to indicate their seriousness.

 In order for a vulnerability to be added to the CVE it needs to meet the following criteria.

It must be independent of any other issues, meaning it must be possible to fix or patch the vulnerability without needing to fix issues elsewhere.

The software vendor must be aware of the issue and acknowledge that it represents a security risk. 

It must be a proven risk, when a vulnerability is submitted it must be alongside evidence of the nature of the exploit and the impact it has.

It must exist in a single code base, if multiple vendors are effected by the same or similar issues then each will receive its own CVE identifier.

Common Vulnerability Scoring System (CVSS)

Once a vulnerability is identified it is given a score via the Common Vulnerability Scoring System (CVSS). This score ranges from 0 to 10, with 10 representing the most severe.

CVSS itself is based on a reasonably complicated mathematical formula, I won't here present all the elements that go into this score but the factors outlined below give a flavour of the aspects of a vulnerability that are taken into account, they are sometime referred to as the base factors.

Firstly Access Vector, this relates to the access that an attacker needs to be able to exploit the vulnerability. Do they for example need physical access to a device, or can they exploit it remotely from inside or outside a network. Related to this is Access Complexity, does an attack need certain conditions to exist at the time of the attack or for the system to be in a certain configuration.

Authentication takes into account the level of authentication an attacker needs. This might range from none to admin level.

Confidentially assesses the impact in terms of data loss of an attacker exploiting the vulnerability. This could range from trivial data loss to the mass export of large amounts of data.

In a similar vein Integrity assesses an attackers ability to change or modify data held within the system, and Availability looks at the ability to effect the availability of the system to legitimate users.

Two other important factors are Exploitability and Remediation Level. The first relating to whether code is known to exist that enables the vulnerability to be exploited, and the latter referring to whether the software vendor has a fix or workaround to provide protection.

These and other factors are weighted within the calculation to provide the overall score.

Patching, Zero Days and Day Ones

The main defence against the exploitation of vulnerabilities is via installing patches to the affected software. Provided by the vendor these are software changes that address the underlying flaws causing the vulnerability in order to stop it being exploited.

This leads to important aspects about the lifetime of a vulnerability.

The life of the vulnerability starts when it is first inadvertently entered into the codebase. Eventually it is discovered by the vendor or a so called white hat hacker who notifies the vendor. The vendor will disclose the issue via CVE while alongside this working on a patch to address it.

This leads to a period of time where the vulnerability is known about, by the vendor, white hack hackers and possibly the more nefarious black hat hackers but where a patch isn't yet available. At this point vulnerabilities are referred to as zero days, they may or may not be being exploited and no patch exists to make the software safe again.

It may seem like once the patch is available the danger has passed. However once a patch is released the nature of the patch often provides evidence of the original vulnerability and provide ideas on how it is exploitable. At this point the vulnerability is referred to as a Day One, the circle of those who may have the ability to exploit it has increased, and vulnerable systems are not yet made safe until the patch has been installed.

CVE provides an invaluable resource in documenting vulnerabilities, forewarned is forearmed. Knowing a vulnerability exists means the defensive action can start, and making sure you stay up to date with all available patches means you are as protected as you can be.


Saturday, 15 June 2024

The World Wide Internet

 


Surfing the web, getting online and hitting the net are terms that ubiquitous among the verbs that describe how we live our lives. They stopped being technical terms a long time ago and now simply describe daily activities that all generations understand and take part in.

In becoming such commonly used terms they have lost some of their meaning with the web and the internet being interchangeable in most peoples minds. However this isn't the case, they do represent two different if complementary technologies.

Internet vs Web

The Internet is the set of protocols that has allowed computers all over the world to be connected and to exchange data, the so called "network of networks". It is concerned with how data is sent and received not what the data actually represents.

Protocols such the Internet Protocol and the Transmission Control Protocol allow each computer to be addressable and routable to allow the free flowing transmission of data.

The World Wide Web (WWW or W3) is an information system that builds on top of the interconnection between devices that the Internet provides to define a system for how information should be represented, displayed and linked together.

Defined by the concepts of hypermedia and hypertext it provides the means by which we are able to view data online, how we are provided links to that data and that data is visually depicted.

The History of the Internet

As computer science emerged as an academic discipline in the 1950s access to computing resource was scarce. For this reason scientist wanted to develop a way for access to be time shared such that many different teams could take advantage of the emerging technology.

This effort culminated in the development of the first wide are network Advanced Research Projects Agency Network (ARPANET) built by US Department of Defence in 1969.

Initially this interconnected network connected a number of universities including the University of California, Stanford Research Institute and the University of Utah.

In 1973 ARPANET expanded to become international with Norwegian Seismic Array and University College London being brought onboard. Into the 1980s ARPANET continued to grow and started to be referred to as the Internet as short hand for Internetwork.

In 1982 the TCP/IP protocols were standardised and the foundations of what we now know as the Internet were starting to be put in place.

The History of the Web

The Web was the invention of Sir Tim Berners-Lee as part of his work at CERN in Switzerland. The problems he was trying to solve related to the storing, updating and discoverability of documents in large datasets being worked on by large numbers of distributed teams.

In 1989 he submitted his proposal for a system that could solve these problems and in 1990 a working prototype was completed including and HTTP server and the very first browser named after the project and called WorldWideWeb.

Building on top of the network provided by the Internet the project defined the HTTP protocol, the structure of URLs and HTML as the way that the data in documents could be represented.

In 1993 CERN made the decision to make these protocols and the code behind them available royalty free, a decision that would change the world forever and enabled the number of web sites in the world to steadily grow from tens, to hundreds, to thousands, to the vast numbers that we now take for granted.

Many technologies end up having a profound impact on our lives without its terminology becoming understood by those outside technological circles. But the Internet and the web are different, they are so embedded in our lives that URLs, hyperlinks, web address etc are normal everyday words.

In the early days of ARPANET, and probably also for the WWW project, although those teams may have realised they were working on what could be important technologies I think they wouldn't have anticipated quite where their work would lead. But when an idea is a good one, it can go in many unexpected directions.

Sunday, 9 June 2024

What's In a Name

 


Surfing the web seems like straightforward undertaking, you type the website you want to go to into your browsers address bar, or you click on a result from a search engine and within no time the website you wanted to visit is in front of you.

But how did this happen simply by typing a web address into a browser? How is the connection made between you and a website out there somewhere in the world?

The answer lies in the Domain Name System (DNS).

All servers on the internet that are hosting the websites you want to access are addressable via a unique Internet Protocol (IP) address. For example as I write this article linkedin.com is currently addressable via 13.107.42.14. These addresses aren't practical for human use which is why we give websites names such as linkedin.com.

DNS is the process by which these human readable names are translated into the IP addresses that can be used to actually access the websites content.

DNS Elements 

Four main elements are involved in a DNS lookup.

A DNS Recursor is a server that receives queries from clients to resolve a websites host name into an IP address. The recursor will usually not be able to provide the answer itself but knows how to recursively navigate the phone directory of the internet in order to give the answer back to the client.

A Root Nameserver is usually the first port of call for the recursor, it can be thought as like a directory of phone directories, based on the area of the internet the websites domain points at it directs the recursor at the correct directory that can be used to continue the DNS query.

A Top Level Domain (TLD) Nameserver acts as the phone directory for a specific part of the internet based on the TLD portion of the web address. For example a TLD nameserver will exists for .com addresses, .co.uk addresses and so on.

An Authoritative Nameserver is the final link in the chain, it is the part of the phone directory that can provide the IP addresses for the website you are looking for.

DNS Resolution

To bring this process to life let's look at the path of a DNS query if you were trying to get to mywebsite.com.

The user types mywebsite.com into their browser and hits enter, the browser then asks a DNS recursor to provide the IP address for mywebsite.com.

The recursor first queries a root nameserver to find the TLD nameserver thats appropriate for this request.

In this example the root nameserver will respond with the TLD nameserver for .com addresses.

The TLD nameserver will then respond with the authoritative nameserver for the websites domain, in this example mywebsite.com, the location of this server will be related to where the website is being hosted. The authoritative nameserver then responds with the IP address for the website, the recursor returns this to the users browser and the website can be loaded.

DNS Security

DNS is one of the fundamental technologies that has its origins in the foundation of the web. At this time when the blueprint of the web was being created security was less of a concern to those solving these engineering problems, it was assumed that the authenticity of the links in the chain could be taken on trust.

Unfortunately in the modern web this level of trust in other actors can be misplaced. When a server claims to be the authoritative nameserver for a particular website how can you trust that this is the case and you aren't going to be directed to a rogue impersonation of the website you are trying to reach.

Domain Name System Security Extensions (DNSSEC) is attempting to replace the trust based system with one that is based on provable security. It introduces the signing and validation of the DNS records being returned from the various elements involved in a DNS query so that their authenticity can be determined.

DNS is one of the technologies that is now taken for granted but solves a problem without which the web as we know it wouldn't be able to exist. On the surface it sounds like a simple problem to solve but the scale of the web means even the simplest of solutions has to be able to scale to a world wide scale.