A Two Coffee Problem: January 2024

Sunday 28 January 2024

The Problem with Castles and Moats

Pretty much any system accessible via the internet can expect to come under regular attacks of varying sophistication. This may range from the simply curious to those that mean to cause harm and damage.

Protecting yourself from these intrusions is therefore a key activity in the day to day operation of any team.

But is it realistic to expect to always be able to keep attackers at bay on the edge of your infrastructure? Are external threats the only thing you should be concerned about?

Zero Trust Security takes an approach that answers no to both those questions. It tries to instil defence in depth to ensure you protect yourself from many different attack vectors and actors.

Castles and Moats

A traditional approach to security, often termed castle and moat, takes an approach where access to a network is hard obtain, but once access is granted then there is an implicit trust of anyone and anything inside the network perimeter.

The source of this implicit trust probably comes from a desire for convenience but also a belief that attackers should be kept outside the network at all times.

Of course keeping attackers outside should be the goal, but the problem with castle and moat is that if an attacker does gain access, which is unfortunately likely to happen given the abundance and skill of some attackers, they then have free reign within the network to do what they like.

Principles of Zero Trust

Zero trust security is based on a set of principles designed to remove the implicit trust that comes with a castle and moat approach. These principles assume that attackers and both inside and outside the network, therefore no user or device should be trusted unless they are verified and their access validated.

The fact that both users and machines are part of the trust evaluation is key. Rather than a network being open with access permitted from any part to any other part, the network is segmented into different areas with rules enforced over which parts of a network can connect to which other parts.

Another important consideration is that of least privilege, this means even after a user or device has been authenticated they are only authorised to have the lowest level of access required to fulfil their role.

Zero trust will often also employ mechanisms to limit the risk of the exposure of credentials. This might be the regular rotation of passwords, implementing multi-factor authentication and a requirement for regular re-authentication rather than long running sessions.

Advantages and Benefits

All of these measures are deigned to limit what an attacker on the inside of the network can achieve, and crucially to prevent them being able to roam the network at will.

Rather than fighting one battle with attackers on the perimeter with high stakes we assume at some point we will lose and try to defend our assets and resources on multiple levels.

Zero trust also acknowledges that threats don't just come from the outside world. Someone who has legitimate access to the network might also have malicious intents. These so called malicious insiders can cause as greater damage as any external attacker, and have the added advantage of understanding the network topology and operation.

It's an unfortunate reality of the modern technology landscape that no system or part of a system can be deemed completely safe. The battle with would be attackers often becomes an arms race, placing your faith in your ability to always win this race can leave you open to large amounts of damage for any momentary slip in your ability to repel attackers.

Assume its possible they might get in and protect your network and your data from all possible angles. In this instance it isn't paranoia, they really are out to get you.

Sunday 14 January 2024

Backend for Frontend

Any software engineer that has worked within the client server model will be familiar with the generalised categorisation as applications as frontend or backend.

Backend applications provide access to data and apply the necessary business logic around this access and the updating of these data sources. Front end applications provide an interface for users to view and interact with this data.

In most situations these applications are running in separate environments, such as a users browser or mobile phone in the case of the front end applications, and a remote server for the backend application. Interaction between these two applications is then generally achieved via some sort of API interface.

Having this separation allows both applications to be developed and deployed independently but the design of the APIs that binds the two together is key to drive this source of efficiency. One approach is of course to have a traditional generic API interface designed to serve many possible uses, but a Backend for Frontend (BFF) takes a different road in order to provide an interface specific to the needs of the frontend it is serving.

The Problem of Many Frontends

Let's imagine we start working on a web frontend application to provide functionality to users via the browser. We develop a backend API to provide access to the necessary data and functions with the development of both apps proceeding in tandem.

Our product is a success so we are asked to produce a mobile app to provide access to the same functionality, so we drive this mobile app from the same backend API. Clearly it will be possible to build the app using this API, but is it the optimal approach? A mobile device comes with very different limitations to a desktop browser. This is in terms of performance, network access, screen real estate and just the general way in which users tend to interact with it.

We are then asked to provide access via a voice based app for a digital virtual assistant where we have to deal with the problem of having a very different medium to communicate with our users.

These competing needs put a lot of pressure on the team developing the backend application, creating a bottle neck in development and making it difficult to maintain a consistent and coherent interface to the API.

But what about if we took a different approach?

Backend for Frontend

That different approach is the concept of a Backend for Frontend (BFF).

A BFF is a backend application where the API interface is tailored specifically for the needs of the frontend it is designed to serve. This tailoring includes the data it exposes, both in terms of depth and shape, as well as the orchestration of business processes the user may wish to trigger.

In our above example we would build separate BFFs to serve web, mobile and voice.

The web BFF would expose larger amounts of data and provide access to more complicated business flows requiring multiple interactions. The mobile BFF provides access to a more compact dataset to reduce the amount of data passing between frontend and backend, as well as providing an increased level of orchestration to reduce the number of API calls involved in implementing an outcome. The voice BFF returns a very different data schema to provide for the unique user interface thats required.

Most likely all three BFFs are built on top of an internal enterprise API layer meaning their sole responsibility is to provide the optimised interface for the frontends they aligned to.

Development of the BFF can sit alongside the team developing the paired frontend application leading to less bottle necks in development allowing for each channel to operate on its own release cadence.

Pitfalls and Considerations

So does this mean a BFF is an obvious choice when developing within the client server model? There are very few absolutes in engineering so like any pattern its never the case that it should be applied in every situation, and careful consideration still needs to be given to its implementation.

Firstly you do need to confirm that your frontends do have different needs. Above I've made the generalisation that mobile and web frontends can't efficiently be driven by the same API. In general that might be a reasonable assumption to make but you should take the time to asses if the optimal API surfaces would actually be different for your use case.

Secondly you should consider how to implement multiple BFFs whilst maintaining strict separation of concerns to avoid duplication. A BFFs implementation should be solely concerned with the shaping of data and functionality for the frontends they serve. Internal business rules should be implemented in a different shared layer rather than duplicated across the BFFs, failure to do this will lead to inconsistencies in experience as well as creating inefficiencies in development.

Lastly consideration should be given to the fact that a BFF approach leads to more applications being developed and deployed. Failure to have teams structured to develop within this model and to have a good DevOps story to manage the increased number of deployments will stop you achieving the promised increase in efficiency.

So many systems demonstrate the frontend/backend split that the devil in the detail of how these applications will interact is one of the most important factors in how successful your efforts will turn out to be. The performance, usability and development efficiency of your applications is in large part going to be related to how you implement this interaction. BFFs should be one of the tools at your disposal to ensure frontend and backend can work in harmony to deliver great outcomes.

Sunday 7 January 2024

Teaching the Machine

In my previous post on Large Language Models (LLMs) I referred several times to the models being trained, but what does this actually mean?

The process I was referencing is known as Machine Learning (ML), it comes in many different forms and is probably the most important and time consuming aspect of building Artificial Intelligence (AI).

Much like our own learning, the training of an AI model is an iterative process that must be fine tuned and repeated in order to ensure a workable and useful model is delivered.

Logic vs Maths

Traditionally engineered software is a collection of logical statements, such as if-else, do-while and for-each, that form fixed predetermined paths from input to output. The problem the software deals with must essentially be solved in the minds of the engineers producing it so they can determine these logical paths.

However there are many classification of problems where it is impractical to solve them by writing software in this way. This may be complex medical problems, problems of processing complex data such as natural language or problems where the actual solution isn't known in advance.

The process of Machine Learning exposes statistical algorithms to large amounts of data in order to find relationships. This allows the trained model to generalize and make predictions or observations about unseen data without the need for explicit instructions on how the data should be processed.

Supervised Learning

Broadly speaking Machine Learning falls into two categories, supervised and unsupervised with the difference between the two being whether the required output is known in advance.

Supervised learning uses datasets that consist of many input data items, referred to as features, and a desired output known as a label. As an example we might have a medical dataset covering many different aspects of a persons health and a marker of whether or not they have a disease such as diabetes. The goal of the training is to develop a model that given someones health data can predict if they are likely to develop diabetes.

The model learns by processing the training data and identifying the relationships between the various data items, with its accuracy being assessed by how well it can produce the required output. When via a process of iteration and tweaks to the mathematical algorithms being used the model is deemed to be trained it is used to process previously unseen data.

Many types of supervised learning use some form of mathematical regression to define trend lines for datasets where this trend line forms the basis for prediction and definition of the output label.

Human experts in the problem space the model is dealing with are key in the process of identifying the data features that the model should work with and ensuring the data it is trained with is of sufficient quality and diversity to produce a workable and accurate model.

Unsupervised Learning

Unsupervised learning, also sometimes referred to as deep learning, involves datasets that don't have a predefined label defining what the output should be. The model identifies previously unknown relationships in the data in order to cluster datapoints and find commonalties, this then enables the model to look for the presence of these commonalties in new data it is exposed to.

Examples of this type of learning might be analysis of customer buying behaviour in order to predict future purchases, the ability to recognise the content of images, or in the case of an LLM like ChatGPT the ability to predict the natural language that forms an answer to a question.

Supervised learning is generally based on neural networks. Originally based on the organisation of the human brain, a neural network consists of a number of interconnected neurons. However rather than being biological in nature, these neurons are mathematical models that take a number of inputs and produce a single output to pass onto the next neuron in the chain.

Although the types of problems that are addressed by supervised learning are generally those where there is no preconceived wisdom on the relationship between the data items, human experts still play a crucial role in analysing the outputs of the model in order to drive the iterative training process that will increase the models accuracy.

The portrayal of AI in science fiction would have us believe that a switch is flipped and a fully formed intelligence springs into life holding infinite knowledge. The reality is that its a pain staking, costly and time consuming process that requires many cycles to perfect. Machine Learning is essential the nuts and bolts of AI, services such as ChatGPT live or die based on the expertise and engineering that is applied in this phase. The fundamentals of the learning process applies equally to machines as to humans.

Monday 1 January 2024

The Power of Maths and Data

Artificial Intelligence (AI) is one of those relatively rare terms that has made the jump from software engineering into the lexicon of everyday life.

Many of the usages of the term are really nothing more than marketing ploys to engender products with a greater technological cachet. However the publics exposure to Large Language Models (LLMs), most notably in the form of ChatGPT, has given a glimpse into the potential of AI.

There is however an unfortunate anthropomorphic effect that comes with the term intelligence. When we observe LLMs like ChatGPT in action we tend to equate its operation with our own intelligence and imagine the machine thinking and reasoning for itself.

While you could have a philosophical debate about what it means to "think", I don't believe viewing the technology in this way is helpful, and is what leads to many of the perceived doomsday scenarios we are told it could lead us towards.

So what is actually happening inside an LLM?

Patterns Leading to Understanding

AI is a very broad term covering many different technologies and applications. But in general it can be viewed as using mathematics to find patterns in data and using this knowledge to predict, classify or in the case of LLMs generate output.

Whilst some applications of AI may look at a narrow dataset such as, credit card transactions to classify activity as fraud, or health data to predict the likelihood of disease, LLMs are trained on extremely large amounts of human language.

The vast size of the datasets used means the patterns the model is able to identify gives it a general purpose understanding of natural language. This enables it to have an understanding of language supplied to it as well as generating language in response.

The key aspect to keep in mind here is that this understanding of language relates to knowing the patterns and relationships between words based on the observations of a large dataset rather than an understanding of the actual words themselves. Given this set of input words, what is the most likely set of output words that statistically would form an answer to that query.

Transformers

The most common architectural pattern used to build LLMs is called a Transformer Model.

Transformer Models are an example of something called a neural network. Originally based on the organisation of the human brain, a neural network consists of a number of interconnected neurons.

However rather than being biological in nature, these neurons are mathematical models that take a number of inputs and produce a single output to pass onto the next neuron in the chain.

A Transformer consists of an encoder and a decoder.

The encoder takes input language and divides it up into a number of tokens which we can think of the constituent parts of words. Mathematical equations are then applied to these tokens to understand the relationship between them. This produces a mathematical representation of the input language allowing the model to predict the potential output language.

The decoder then runs this process in reverse to move from the mathematical representation of the output back into tokens to form the language to provide to the user.

When trained on a significantly large amount of varied data this allows the model to provide answers to questions on many subjects.

Use Cases and Downsides

The generative nature of LLMs makes them ideal for use cases such as chat bots, document generation, and if trained on appropriate data sets even specialised tasks such as code generation.

Their ability to interact also enables them to be used for applications such as chat bots along with a conversational approach to information retrieval from large datasets.

The LLM approach can also be applied to data sources other than language, allowing for audio and image generation as well as language.

However the nature of how LLMs are trained can lead to some downsides its important to be aware of. LLMs will reflect the nature of the data they are trained on, if that data has natural human bias then this will be reflected in the output language that the model produces.

LLMs can also display a behaviour called hallucination. This is where the model produces output language that while coherent isn't factually accurate or doesn't relate to the input language. There are many reasons for this but most relate to the earlier point that the models output is based on mathematical analysis rather than an in built understanding of the language it is provided or returning.

The AI revolution is real, and its potential impacts are made visible to the majority of us via LLMs such as ChatGPT or Google's Bard. It is also the interactions with these models that drives a lot of peoples fears about the direction the technology will take us. But it's important to appreciate how these models are doing what they do before becoming overly fearful or pessimistic.