A Two Coffee Problem

Saturday, 14 September 2024

Terraforming Your World

Software Engineers are very good at managing source code. We have developed effective strategies and tools to allow us to branch, version, review, cherry pick and revert changes. It is for this reason we've been keen to try and control all aspects of our engineering environment in the same way.

One such area is the infrastructure on which our code runs.

Infrastructure as Code (IaC) is the process of managing and deploying compute, network and storage resources via files that can be part of the source control process.

Traditionally these resources may have been managed via manual interactions with a portal or front end application from your cloud provider of choice. But manual processes are prone to error and inconsistency making it difficult and time consuming to manage and operate infrastructure in this way.

Environments might not always be created in exactly the same way, and in the event of a problem there is a lack of an effective change log to enable changes to be reverted back to a known good state.

One of the tools that attempts to allow infrastructure to be developed in the same way as software is Terraform by Hashicorp.

Terraform

Terraform allows required infrastructure to be defined in configuration files where the indicated resources are created within the cloud provider via interaction with their APIs.

These interactions are encapsulated via a Provider which defines the resources that can be created and managed within that cloud. These providers are declared within the configuration files and can be pulled in from the Terraform Registry which acts like a package manager for Terraform.

Working with Terraform follows a three stage process.

Firstly in the Write phase the configuration files which describe the required infrastructure are created. These files can span multiple cloud providers and can include anything from a VPC to compute resource to networking infrastructure.

Next comes the Plan phase. Terraform is a state driven tool, it records the current state of your infrastructure and applies the necessary changes based on the updated desired state in the configuration files. As part of the plan phase Terraform creates a plan of the actions that must be taken to move the current state of the infrastructure to match the desired state, whether this be creating, changing or deleting elements. This plan can then be reviewed to ensure it matches with the intention behind the configuration changes.

Finally in the Apply phase Terraform uses the cloud provider APIs, via the associated provider, to ensure the deployed infrastructure aligns with the new state of the configuration files.

Objects

Terraform supports a number of different objects that can be described in the configuration files, presented below is not an exhaustive list but describes some of the more fundamental elements that are used to manage common infrastructure requirements.

Firstly we have Resources which are used to describe any object that should be created in the cloud provider, this could be anything from a compute instance to a DNS record.

A Data Source provides Terraform with access to data defined outside of Terraform. This might be data from pre-existing infrastructure, configuration held outside of Terraform or a database.

Input Variables allow configuration files to be customised to a specific use case increasing their reusability. Output Variables allow the Terraform process to return certain data about the infrastructure that has been created, this might be to further document the infrastructure that is now in place or act as an input to another Terraform process managing separate but connected infrastructure.

Modules act like libraries in a software development context to allow infrastructure configuration to be packaged and re-used across many different applications.

Syntax

We have made many references in this article to configuration files but what do these actually consist of?

They are defined using the Hashicrop Configuration Language (HCL) and follow a format similar to JSON, in fact it is also possible for Terraform to work with JSON directly.

All content is defined with structures called blocks:

resource "my_provider_object_type" "my_object" {

some_argument = "abc123"

}

Presented above is an extremely simple example of a block.

Firstly we define the type of the block, in this case a resource. Then comes a series of labels with the number of labels being dependent on the block type. For a resource block two labels are required, the first describing the type of the resource as defined by the provider, the second being a name that can be used to refer to the resource in other areas of the configuration files.

Inside the block the resource may require a series of arguments to be provided to it in order to configure and control the underlying cloud resource that will be created.

This post hasn't been intended to be a deep dive into Terraform, instead I've been trying to stoke your interest in the ways an IaC approach can help you apply the same rigour and process to your infrastructure management as you do to your source code.

Many of the concepts within Terraform have a close alignment to those in software engineering. Using an IaC approach alongside traditional source code management can help foster a DevOps mentality where the team responsible for writing the software can also be responsible for managing the infrastructure it runs on. Not only will this allow their knowledge of the software to shape the creation of the infrastructure but also in reverse knowing where and on what infrastructure their code will run may well allow them to write better software.

Tuesday, 3 September 2024

Being at the Helm

The majority of containerized applications that are being deployed at any reasonable scale will likely be using some flavour of Kubernetes.

As a container orchestration platform Kubernetes allows the deployment of multiple applications to be organised around the concepts of pods, services, ingress, deployments etc defined in YAML configuration files.

In this post we won't go into detail around these concepts and will assume a familiarity with their purpose and operation.

Whilst Kubernetes makes this process simpler, when it's being used for multiple applications managing the large number of YAML files can come with its own challenges.

This is where Helm comes into the picture. Described as a package manager for Kubernetes Helm provides a way to manage updates to the YAML configuration files and version them to ensure consistency and allow for re-use.

I initially didn't quite understand the notation of Helm being a package manager but as I've used it more I've come to realise why this is how it's described.

Charts and Releases

The Helm architecture consists of two main elements, the client and the library.

The Helm client provides a command line interface (CLI) to indicate what needs to be updated in a cluster via a collection of standard Kubernetes YAML files, the library then contains the functionality to interact with the cluster to make this happen.

The collection of YAML files passed to the client are referred to as Helm Charts, they define the Kubernetes objects such as deployments, ingress, services etc.

The act of the library using these YAML files to update the cluster is referred to as a Release.

So far you maybe thinking that you can achieve the same outcome by applying the same YAML files to Kubernetes directly using the kubectl CLI. Whilst this is true where Helm adds value is where you need to deploy the same application into multiple environments with certain configuration or set-up differences.

Values, Parametrisation and Repositories

It will be a common practice to need to deploy an application to multiple environments with a differing numbers of instances, servicing requests on different domains or any other differences between testing and production environments.

Using Kubernetes directly means either maintaining multiple copies of YAML files or having some process to alter them prior to them being applied to the cluster. Both of these approaches have the potential to cause inconsistency and errors.

To avoid this Helm provides a templating engine to allow a single set of YAML files to become parameterised. The syntax of this templating when you first view it can be quite daunting, while we won't go into detail about it here, like with any language over time as you use it more it will eventually click.

Alongside these parameterised YAML files you specify a Values YAML that defines the environment specific values that should be applied to the parameterised YAML defining the Kubernetes objects.

This allows the YAML files to be consistent between all environments in terms of overall structure whilst varying where they need too. This combination of a Values YAML file and the parameterised YAML defining the Kubernetes objects are what we refer to as Helm Charts.

It maybe that your application is something that needs to be deployed in multiple clusters, for these situations your Helm charts can be provided via a repository in a similar way that we might make re-usable Docker images available.

I think it's at this point that describing Helm as a package manager starts to make sense.

When we think about code package managers we think of re-usable libraries where functionality can be shared and customised in multiple use cases. Helm is allowing us to achieve the same thing with Kubernetes. Without needing to necessarily understand all that is required to deploy the application we can pull down Helm charts, specify our custom values and start using the functionality in our cluster.

When to Use

The benefit you will achieve by using Helm is largely tied to the scale of the Kubernetes cluster you are managing and the number of applications you are deploying.

If you have a simple cluster with a minimal applications deployed maybe the overhead of dealing with Kubernetes directly is manageable. If you have a large cluster with multiple applications or you have many clusters each with different applications deployed you will benefit more due to the ability it offers to ensure consistency.

You will also benefit from using Helm if you need to deploy a lot of 3rd party applications into your cluster, whether this might be to manage databases, ingress controllers, certificate stores, monitoring or any other cross cutting concern you need to have available in he cluster.

The package manager nature of Helm will reduce the overhead in managing all these dependencies in the same way that you manage dependencies at a code level.

As with many tools your need for Helm may grow over time as you estate increases in complexity. If like me you didn't immediately comprehend the nature and purpose of Helm then hopefully this post has helped you recognise what it can offer and how it could benefit your use case.

Monday, 26 August 2024

Avoiding Toiling

Site Reliability Engineering (SRE) is the practice of applying software engineering principles to the management of infrastructure and operations.

Originating at Google in the early 2000s the sorts of things an SRE team might work on include system availability, latency and performance, efficiency, monitoring and the ability to deliver change.

Optimising these kinds of system aspects covers many different topics and areas, one of which is the management of toil.

Toil in this context is not work we don't particularly enjoy doing or don't find stimulating, it has a specific meaning defined by aspects other than our enjoyment of the tasks it involves.

What is Toil?

Toil is work that exhibits some or all of the following attributes.

It is Manual in nature, even if a human isn't necessarily doing the work it requires human initiation, monitoring or any other aspect that means a team member has to oversee its operation.

Toil is Repetitive, the times work has to be done may vary and may not necessarily be at regular intervals, but the task needs to be performed multiple times and will never necessarily be deemed finished.

It is Tactical meaning it is generally reactive, it has to be undertaken either in relation to something happening within the system for example when monitoring highlights something is failing or is sub-optimal.

It has No Enduring Value, this means it leaves the system in the same state as before the work happened. It hasn't improved any aspect of the system or eliminated the need for the work to happen again in the future.

It Scales with Service Growth. Some work items need to happen regardless of how much a system is used. This tends to be viewed as overhead and is simply the cost of having the system in the first place. Toil scales with system use meaning the more users you attract the greater the impact of the toil on your team.

Finally toil can be Automated, some tasks will always require human involvement, but for a task to be toil it must be possible for it to be automated.

What is Toils Impact?

It would be wrong to suggest that toil can be totally eliminated, having a production system being used by large numbers of people is always going to incur a certain amount of toil, and it is unlikely that the whole engineering effort of your organisation can be dedicated to removing it.

Also, much like technical debt, even if you do reach a point where you feel its eliminated the chances are a future change in the system will likely re-introduce it.

But also like technical debt the first step is to acknowledge toil exists, develop ways to be able to detect it and have a strategy for managing it and trying to keep it to a reasonable minimum.

Toils impact is that it engages your engineering resource on tasks that don't add to or improve your system. It may keep it up and running but that is a low ambition to have for any system

It's also important to recognise that large amounts of toil is likely to impact a teams morale, very few engineers will embark on their career looking to spend large amounts of time on repetitive tasks that lead to no overall value.

The Alternative to Toil

The alternative to spending time on toil is to spend time on engineering. Engineering is a broad concept but in this context it means work that improves the system itself or enables to to be managed in a more efficient way.

As we said previously completely eliminating toil is probably an unrealistic aim. But it is possible to measure how much time your team is spending on toil related tasks. Once you are able to estimate this then it is possible both to set a sensible limit on how much time is spent on these tasks but also measure the effectiveness of any engineering activities designed to reduce it.

This engineering activity might relate to software engineering, refactoring code for performance or reliability, automating testing or certain aspects of the build and deployment pipeline. It might also be more aimed at system engineering, analysing the correctness of the infrastructure the system is running on, analysing the nature of system failures or automating the management of infrastructure.

As previously stated we can view toil as a form of technical debt. In the early days of a system we may take certain shortcuts that at the time are manageable but as the system grows come with a bigger and bigger impact. Time spent trying to fix this debt will set you on a path for gradual system improvement, both for your users and the teams that work on the system.

Saturday, 13 July 2024

The Language of Love

Software engineers are often polyglots who will learn or be exposed to multiple programming languages over the course of their career. But I think most will always hold a special affection for the first language they learn, most likely because it's the first time they realise they have the ability to write code and achieve an outcome. Once that bug bites it steers you towards a path where you continue to hone that craft.

For me that language is C and its successor C++.

Potentially my view is biased because of the things I've outlined above but I believe C is a very good language for all potential developers to start with. If you learn how to code close to the metal it will develop skills and a way of thinking that will be of benefit to you as you progress onto high level languages with greater levels of abstraction from how your code is actually running.

In The Beginning

In the late 1960s and early 1970s as the Unix operating system was being developed engineers realised that they needed a program language that could be used to write utilities and programs to run on the newly forming platform.

One of the initial protagonists in this field was Ken Thompson.

After dismissing existing programming languages such as Fortran he started to develop a variant of an existing language called BCLP. He concentrated on simplifying the language structures and making it less verbose. He called this new language B with the first version being released around 1969.

In 1971 Dennis Ritchie continued to develop B to utilise features of more modern computers as well as adding new data types. This culminated in the release of New B. Throughout 1972 the development continued adding more data types, arrays and pointers and the language was renamed C.

In 1973 Unix was re-written in C with even more data types being added as C continued to be developed through the 1970s. This eventually resulted in the release of what many consider the be the definitive book on the C programming language, written by Brian Kernighan and Dennis Ritchie The C Programming Language became known as K&R C and became the unofficial specification for the language.

C has continued to be under active development right up until the present with C23 expected to be released in 2024.

C with Classes

In 1979 Bjarne Stroustrup began work on what he deemed "C with Classes".

Adding classes to C turned into an object oriented language, where C had found a home in embedded programming running close to the metal, adding classes made it more suitable for large scale software development.

In 1982 Stroustrup began work on C++ adding new features such as inheritance, polymorphism, virtual functions and operator overloading. In 1985 he released the book The C++ Programming Language which become the unofficial specification for the language with the first commercial version being released later that year.

Much like C, C++ has continued to be developed with new versions being released up until the present day.

Usage Today

Software Engineering is often considered to be a fast moving enterprise, and while many other programming languages have been developed over the lifetime of C and C++ both are still very widely used.

Often being used when performance is critical, the fact they run close to the metal allows for highly optimised code for use cases such as gaming, network appliances and operating systems.

Usage of C and C++ can often strike fear into the heart of developers who aren't experienced in their use. However the skills that usage of C and C++ can develop will prove invaluable even when working with higher level languages so I would encourage all software engineers to spend some time expose themselves to the languages.

Good engineers can apply their skills using any programming language, the principles and practices of good software development don't vary that much between languages or paradigms. But often there are better choices of language for certain situations, and C and C++ are still the correct choice for many applications.

Friday, 28 June 2024

Vulnerable From All Sides

Bugs in software engineering are a fact of life, no engineer no matter what he perceives his or her skill level to be has never written a non-trivial piece of software that didn't on some level have some bugs.

These maybe logical flaws, undesirable performance characteristics or unintended consequences when given bad input. As the security of software products has grown ever more important the presence of a particular kind of bug has become something we have to be very vigilant about.

A vulnerability is a bug that has a detrimental impact on the security of software. Whereas any other bug might cause annoyance or frustration a vulnerability may have more meaningful consequences such as data loss, loss of system access or handing over control to untrusted third parties.

If bugs, and therefore vulnerabilities, are a fact of life then what can be done about them? Well as with anything being aware of them, and making others aware, is a step in the right direction.

Common Vulnerabilities and Exposures

Run by the Mitre Corporation with funding from the US Division of Homeland Security Common Vulnerabilities and Exposures (CVE) is a glossary that catalogues software vulnerabilities and via the Common Vulnerability Scoring System (CVSS) provides them with a score to indicate their seriousness.

In order for a vulnerability to be added to the CVE it needs to meet the following criteria.

It must be independent of any other issues, meaning it must be possible to fix or patch the vulnerability without needing to fix issues elsewhere.

The software vendor must be aware of the issue and acknowledge that it represents a security risk.

It must be a proven risk, when a vulnerability is submitted it must be alongside evidence of the nature of the exploit and the impact it has.

It must exist in a single code base, if multiple vendors are effected by the same or similar issues then each will receive its own CVE identifier.

Common Vulnerability Scoring System (CVSS)

Once a vulnerability is identified it is given a score via the Common Vulnerability Scoring System (CVSS). This score ranges from 0 to 10, with 10 representing the most severe.

CVSS itself is based on a reasonably complicated mathematical formula, I won't here present all the elements that go into this score but the factors outlined below give a flavour of the aspects of a vulnerability that are taken into account, they are sometime referred to as the base factors.

Firstly Access Vector, this relates to the access that an attacker needs to be able to exploit the vulnerability. Do they for example need physical access to a device, or can they exploit it remotely from inside or outside a network. Related to this is Access Complexity, does an attack need certain conditions to exist at the time of the attack or for the system to be in a certain configuration.

Authentication takes into account the level of authentication an attacker needs. This might range from none to admin level.

Confidentially assesses the impact in terms of data loss of an attacker exploiting the vulnerability. This could range from trivial data loss to the mass export of large amounts of data.

In a similar vein Integrity assesses an attackers ability to change or modify data held within the system, and Availability looks at the ability to effect the availability of the system to legitimate users.

Two other important factors are Exploitability and Remediation Level. The first relating to whether code is known to exist that enables the vulnerability to be exploited, and the latter referring to whether the software vendor has a fix or workaround to provide protection.

These and other factors are weighted within the calculation to provide the overall score.

Patching, Zero Days and Day Ones

The main defence against the exploitation of vulnerabilities is via installing patches to the affected software. Provided by the vendor these are software changes that address the underlying flaws causing the vulnerability in order to stop it being exploited.

This leads to important aspects about the lifetime of a vulnerability.

The life of the vulnerability starts when it is first inadvertently entered into the codebase. Eventually it is discovered by the vendor or a so called white hat hacker who notifies the vendor. The vendor will disclose the issue via CVE while alongside this working on a patch to address it.

This leads to a period of time where the vulnerability is known about, by the vendor, white hack hackers and possibly the more nefarious black hat hackers but where a patch isn't yet available. At this point vulnerabilities are referred to as zero days, they may or may not be being exploited and no patch exists to make the software safe again.

It may seem like once the patch is available the danger has passed. However once a patch is released the nature of the patch often provides evidence of the original vulnerability and provide ideas on how it is exploitable. At this point the vulnerability is referred to as a Day One, the circle of those who may have the ability to exploit it has increased, and vulnerable systems are not yet made safe until the patch has been installed.

CVE provides an invaluable resource in documenting vulnerabilities, forewarned is forearmed. Knowing a vulnerability exists means the defensive action can start, and making sure you stay up to date with all available patches means you are as protected as you can be.

Saturday, 15 June 2024

The World Wide Internet

Surfing the web, getting online and hitting the net are terms that ubiquitous among the verbs that describe how we live our lives. They stopped being technical terms a long time ago and now simply describe daily activities that all generations understand and take part in.

In becoming such commonly used terms they have lost some of their meaning with the web and the internet being interchangeable in most peoples minds. However this isn't the case, they do represent two different if complementary technologies.

Internet vs Web

The Internet is the set of protocols that has allowed computers all over the world to be connected and to exchange data, the so called "network of networks". It is concerned with how data is sent and received not what the data actually represents.

Protocols such the Internet Protocol and the Transmission Control Protocol allow each computer to be addressable and routable to allow the free flowing transmission of data.

The World Wide Web (WWW or W3) is an information system that builds on top of the interconnection between devices that the Internet provides to define a system for how information should be represented, displayed and linked together.

Defined by the concepts of hypermedia and hypertext it provides the means by which we are able to view data online, how we are provided links to that data and that data is visually depicted.

The History of the Internet

As computer science emerged as an academic discipline in the 1950s access to computing resource was scarce. For this reason scientist wanted to develop a way for access to be time shared such that many different teams could take advantage of the emerging technology.

This effort culminated in the development of the first wide are network Advanced Research Projects Agency Network (ARPANET) built by US Department of Defence in 1969.

Initially this interconnected network connected a number of universities including the University of California, Stanford Research Institute and the University of Utah.

In 1973 ARPANET expanded to become international with Norwegian Seismic Array and University College London being brought onboard. Into the 1980s ARPANET continued to grow and started to be referred to as the Internet as short hand for Internetwork.

In 1982 the TCP/IP protocols were standardised and the foundations of what we now know as the Internet were starting to be put in place.

The History of the Web

The Web was the invention of Sir Tim Berners-Lee as part of his work at CERN in Switzerland. The problems he was trying to solve related to the storing, updating and discoverability of documents in large datasets being worked on by large numbers of distributed teams.

In 1989 he submitted his proposal for a system that could solve these problems and in 1990 a working prototype was completed including and HTTP server and the very first browser named after the project and called WorldWideWeb.

Building on top of the network provided by the Internet the project defined the HTTP protocol, the structure of URLs and HTML as the way that the data in documents could be represented.

In 1993 CERN made the decision to make these protocols and the code behind them available royalty free, a decision that would change the world forever and enabled the number of web sites in the world to steadily grow from tens, to hundreds, to thousands, to the vast numbers that we now take for granted.

Many technologies end up having a profound impact on our lives without its terminology becoming understood by those outside technological circles. But the Internet and the web are different, they are so embedded in our lives that URLs, hyperlinks, web address etc are normal everyday words.

In the early days of ARPANET, and probably also for the WWW project, although those teams may have realised they were working on what could be important technologies I think they wouldn't have anticipated quite where their work would lead. But when an idea is a good one, it can go in many unexpected directions.

Sunday, 9 June 2024

What's In a Name

Surfing the web seems like straightforward undertaking, you type the website you want to go to into your browsers address bar, or you click on a result from a search engine and within no time the website you wanted to visit is in front of you.

But how did this happen simply by typing a web address into a browser? How is the connection made between you and a website out there somewhere in the world?

The answer lies in the Domain Name System (DNS).

All servers on the internet that are hosting the websites you want to access are addressable via a unique Internet Protocol (IP) address. For example as I write this article linkedin.com is currently addressable via 13.107.42.14. These addresses aren't practical for human use which is why we give websites names such as linkedin.com.

DNS is the process by which these human readable names are translated into the IP addresses that can be used to actually access the websites content.

DNS Elements

Four main elements are involved in a DNS lookup.

A DNS Recursor is a server that receives queries from clients to resolve a websites host name into an IP address. The recursor will usually not be able to provide the answer itself but knows how to recursively navigate the phone directory of the internet in order to give the answer back to the client.

A Root Nameserver is usually the first port of call for the recursor, it can be thought as like a directory of phone directories, based on the area of the internet the websites domain points at it directs the recursor at the correct directory that can be used to continue the DNS query.

A Top Level Domain (TLD) Nameserver acts as the phone directory for a specific part of the internet based on the TLD portion of the web address. For example a TLD nameserver will exists for .com addresses, .co.uk addresses and so on.

An Authoritative Nameserver is the final link in the chain, it is the part of the phone directory that can provide the IP addresses for the website you are looking for.

DNS Resolution

To bring this process to life let's look at the path of a DNS query if you were trying to get to mywebsite.com.

The user types mywebsite.com into their browser and hits enter, the browser then asks a DNS recursor to provide the IP address for mywebsite.com.

The recursor first queries a root nameserver to find the TLD nameserver thats appropriate for this request.

In this example the root nameserver will respond with the TLD nameserver for .com addresses.

The TLD nameserver will then respond with the authoritative nameserver for the websites domain, in this example mywebsite.com, the location of this server will be related to where the website is being hosted. The authoritative nameserver then responds with the IP address for the website, the recursor returns this to the users browser and the website can be loaded.

DNS Security

DNS is one of the fundamental technologies that has its origins in the foundation of the web. At this time when the blueprint of the web was being created security was less of a concern to those solving these engineering problems, it was assumed that the authenticity of the links in the chain could be taken on trust.

Unfortunately in the modern web this level of trust in other actors can be misplaced. When a server claims to be the authoritative nameserver for a particular website how can you trust that this is the case and you aren't going to be directed to a rogue impersonation of the website you are trying to reach.

Domain Name System Security Extensions (DNSSEC) is attempting to replace the trust based system with one that is based on provable security. It introduces the signing and validation of the DNS records being returned from the various elements involved in a DNS query so that their authenticity can be determined.

DNS is one of the technologies that is now taken for granted but solves a problem without which the web as we know it wouldn't be able to exist. On the surface it sounds like a simple problem to solve but the scale of the web means even the simplest of solutions has to be able to scale to a world wide scale.

Saturday, 13 April 2024

The Web of Trust

Everyday all of us type a web address into a browser or click on a link provided by a search engine and interact with the web sites that are presented.

Whilst we should always be vigilant, on many occasions we will simply trust that the site we are interacting with is genuine and the data flowing between us and it is secure.

Our browsers do a good job of making sure our surfing is safe, but how exactly is that being achieved. How do we create trust between a website and its users?

SSL/TLS

Netscape first attempted to solve this trust problem by introducing the Socket Secure Layer (SSL) protocol in the early 90s. Initial versions of the protocol still had many flaws but by the release of SSLv3.0 in 1996 it had matured into a technology that was able to provide a mechanism for trust on the web.

As SSL became a foundational part of the web, and because security related protocols always have to be under constant evolution to maintain safety, the Internet Engineering Task Force (IETF) developed Transport Layer Security (TLS) in 1999 as an enhancement to SSLv3.0.

TLS has continued to be developed with TLSv1.3 being released in 2018.

Its primary purpose is to ensure data being exchanged by a server and a client is secured, but also to establish a level of trust such that the two parties can be sure who they are exchanging the data with.

Creating this functionality relies on a few different elements.

Public Key Encryption

Public key encryption is a form of asymmetric encryption that uses a pair of related keys deemed public and private.

The mathematics behind this relationship between the keys is too complex to go into in this post, but the functionality it provides is based on the fact that the public key can be used to encrypt data that only the private key can decrypt.

This means the public key can be freely distributed and used to encrypt data that only the holder of the private key can decrypt.

The keys can also be used to produce and verify digital signatures. This involves the holder of some data using a mathematical process to "sign" this data using their private key.

The receiver of the data can use the public key to verify the signature and therefore prove that the data came from someone who has the corresponding private key.

Public Key Infrastructure (PKI)

Public Key Infrastructure (PKI) builds on top of the functionality provided by public key encryption to provide a system for establishing trust between client and server.

This is achieved via the issuance of digital certificates from a Certificate Authority (CA).

The CA is at the heart of the trust relationship of the web. When two parties, the client and server, are trying to form a trust relationship they must delegate to a 3rd party that they both already trust, this is the CA.

The CA establishes the identity of the organisation the client will interact with via off line means and issues a digital certificate. This certificate establishes the identity of the organisation, its public key and is signed by the CA to prove it was the one that issued the certificate.

A client when it receives the certificate from the server can use the CA's public key to verify the signature and therefore trust the data in the certificate.

It's possible to have various levels of CA's that may delegate trust to other CA's, deemed intermediary CA's. But all certificates should ultimately be able to be traced back to a so called Root CA that all parties on the web have agreed to trust and whose public keys are available to all participants.

Certificates and Handshakes

All of the systems previously described are combined whenever we visit a web site to establish trust and security.

A user types a web address into the browser or clicks a link provided by a search engine.
The user's browser issues a request to the web site to establish a secure connection.
The server in response sends the browser it's certificate.
The browser validates the certificate authenticity by verifying the signature of the Root CA that the certificate is issued from using the public key of the CA that has been pre-installed on the users machine.
Once the certificate is validated, the browser creates a symmetric encryption key that will be used to secure future communication between the browser and the web site. It encrypts the symmetric key using the servers public key and sends it to the server.
The users browser has now established the identity of the web site, based on the data contained in its validated certificate, and both parties now have a shared symmetric key that can be used to secure the rest of their communication in the session.

There are certain pieces of functionality that are fundamental to allowing the web to operate in the way it does.

Without the functionality provided by SSL/TLS it wouldn't be possible to use the web as freely as we do whilst also trusting that we can do so in a safe and secure manner.

Monday, 1 April 2024

Imagining the Worst

In the modern technological landscape the list of possible security threats can seem endless. The breadth of potential attackers and potential vectors for their attacks has never been so large, does this mean we are all just helpless waiting for an attack and the terrible consequences to befall us?

One way to be proactive in the face of these dangers is to try and anticipate what form these treats might take, what damage they could do and what countermeasures it might be possible to take.

Threat modelling is a technique for enumerating the threats a system might face, identifying whether or not safeguards might exist and analysing the consequences of these attacks succeeding.

To help developers and engineers with the threat modelling process Microsoft developed the STRIDE mnemonic in 1999 to serve as a checklist of things for teams to consider when analysing the potential impact of threats to their system.

STRIDE

The STRIDE mnemonic attempts to categorise potential threats in terms of the impact they may have, this allows teams to analyse if any part of a system may be susceptible, and if so how this might be mitigated.

Spoofing is the process of falsely identifying yourself within a system. This might be by using stolen user credentials, leaked access tokens or cookies and any other form of session hijacking.

Tampering involves the malicious manipulation of data either at rest, for example altering data within a database, or while in transit, for example by acting as a main in the middle.

Repudiation relates to an attacker being able to cover their tracks by exploiting any lack of logging or ability to trace actions within a system, this might also include an attacker having the ability to falsify an audit trail to hide malicious activity.

Information Disclosure occurs when information is available to users who shouldn't be able to view it. This might cover a system returning database records a user has no entitlement to view, or the ability of an attacker to intercept data in transit, again for example by acting as a man in the middle.

Denial of Service is any attack that denies users the ability to legitimately use a system, of which the most common form of attack is to overwhelm a system with requests or otherwise cause the system to become unresponsive or unusable.

Elevation of Privilege occurs when an attacker is able to elevate their permissions within a system under attack, normally this would mean obtaining administrator privileges or otherwise penetrating a network sufficiently to be trusted more than a normal external user.

Threat Analysis

Many tools and processes exist for implementing threat modelling, but most will revolve around a team of system experts brainstorming potential threats that a system or sub-system might be susceptible too.

This involves using analysis helpers such as STRIDE to put yourself in the mindset of an attacker. For example you may asses if an authentication system could be exploited via spoofing. The answer might be no because of certain mitigations, or yes because of certain flaws.

When applying this style of analysis to all the aspects of STRIDE it is unlikely that you will find the system is completely protected against all possible attacks. Instead you're a looking to demonstrate that it is adequately protected given the likelihood of an attack being successful and the benefit that would be gained by an attacker if they were successful.

Security is not a design activity that is ever truly complete and instead will be something that evolves over time. You can either choose to learn by mistakes when attacker are successful or you can attempt to pro-actively preempt this by performing some self critical internal analysis to ensure security levels are the highest they can be.

Sunday, 24 March 2024

Having a Distrustful Mindset

I've talked previously about the concept of zero trust as it relates to security. When trying to apply an over arching philosophy like this it can feel quite daunting as to where to start.

Rather than having to create a comprehensive design in how to re-engineer your existing applications and network it can help to start with a shift in mindset in how you view the security properties of your system.

Security is an ever evolving discipline with many different facets so I wouldn't want to give the impression that the items I mention below are the only things you need to think about, but I think they do help foster the kind view that you need to have a healthy distrust of the world around you.

Authenticate

Authentication is the process of identifying the actors within a system. Traditionally this means authenticating users via them supplying a username/password or other such shared secret.

But this can also be extended to cover many other links in the chain.

This might include identifying the client or application the user is using to make the requests, the network the requests are coming from or even the physical device that is being used to communicate.

It's possible for all of these aspects of the flow of data to be used to indicate that something malicious maybe happening, and therefore properly identifying all these elements will enable you to asses if you can trust them.

Properly identifying all elements also enables comprehensive logging of any actions performed.

Authorise

Once you've identified these elements you can move onto to authorisation. This is the process of deciding if the requested operation should be allowed. Normally this would mean is this user allowed to view this data or perform this action.

But again the concept can be extended to cover more aspects of the system. Should this client or application be allowed to perform this action? Should these kinds of requests be allowed from this area of the network? Should this physical device be used to perform this action?

Going beyond authorising just the user again increases your ability to both detect malicious actions and prevent them.

Authorisation can also be linked to the method being used for authentication. Some actions a user may want to perform might be linked to needing a higher level of authentication. As an example some admin actions might require multi-factor authentication onto top of a username and password.

Encrypt

Pretty much all applications involve the movement and manipulation of data, and a large number of threats will relate to trying to expose that data to parties that shouldn't be able to view it.

One defence against this is to make sure that at the points where the data doesn't need to be viewed or worked with it is encrypted.

This will broadly come down to encrypting the data at rest and in transit. When the data is being stored and when it is being moved it should always be encrypted, the only time it should be in plain text is when it is being shown to the user. A user that has been properly authentication and authorised to view that data, via that application, from that network using that device.

This mindset should also be applied to other aspects of data sensitivity, identifying which data items are sensitive and how they are being transported. An example of this would be considering what data items are being included in the query string of requests, because this might be logged in various places data within it should be assessed for sensitivity.

Authentication, authorisation and encryption aren't the only security related factors you need to be thinking about. But embracing them and thinking about them in a greater depth will help you dive deeper into the security of your software and system and be aware of a greater breadth of possible threats.