A Two Coffee Problem: The Power of Maths and Data

Artificial Intelligence (AI) is one of those relatively rare terms that has made the jump from software engineering into the lexicon of everyday life.

Many of the usages of the term are really nothing more than marketing ploys to engender products with a greater technological cachet. However the publics exposure to Large Language Models (LLMs), most notably in the form of ChatGPT, has given a glimpse into the potential of AI.

There is however an unfortunate anthropomorphic effect that comes with the term intelligence. When we observe LLMs like ChatGPT in action we tend to equate its operation with our own intelligence and imagine the machine thinking and reasoning for itself.

While you could have a philosophical debate about what it means to "think", I don't believe viewing the technology in this way is helpful, and is what leads to many of the perceived doomsday scenarios we are told it could lead us towards.

So what is actually happening inside an LLM?

Patterns Leading to Understanding

AI is a very broad term covering many different technologies and applications. But in general it can be viewed as using mathematics to find patterns in data and using this knowledge to predict, classify or in the case of LLMs generate output.

Whilst some applications of AI may look at a narrow dataset such as, credit card transactions to classify activity as fraud, or health data to predict the likelihood of disease, LLMs are trained on extremely large amounts of human language.

The vast size of the datasets used means the patterns the model is able to identify gives it a general purpose understanding of natural language. This enables it to have an understanding of language supplied to it as well as generating language in response.

The key aspect to keep in mind here is that this understanding of language relates to knowing the patterns and relationships between words based on the observations of a large dataset rather than an understanding of the actual words themselves. Given this set of input words, what is the most likely set of output words that statistically would form an answer to that query.

Transformers

The most common architectural pattern used to build LLMs is called a Transformer Model.

Transformer Models are an example of something called a neural network. Originally based on the organisation of the human brain, a neural network consists of a number of interconnected neurons.

However rather than being biological in nature, these neurons are mathematical models that take a number of inputs and produce a single output to pass onto the next neuron in the chain.

A Transformer consists of an encoder and a decoder.

The encoder takes input language and divides it up into a number of tokens which we can think of the constituent parts of words. Mathematical equations are then applied to these tokens to understand the relationship between them. This produces a mathematical representation of the input language allowing the model to predict the potential output language.

The decoder then runs this process in reverse to move from the mathematical representation of the output back into tokens to form the language to provide to the user.

When trained on a significantly large amount of varied data this allows the model to provide answers to questions on many subjects.

Use Cases and Downsides

The generative nature of LLMs makes them ideal for use cases such as chat bots, document generation, and if trained on appropriate data sets even specialised tasks such as code generation.

Their ability to interact also enables them to be used for applications such as chat bots along with a conversational approach to information retrieval from large datasets.

The LLM approach can also be applied to data sources other than language, allowing for audio and image generation as well as language.

However the nature of how LLMs are trained can lead to some downsides its important to be aware of. LLMs will reflect the nature of the data they are trained on, if that data has natural human bias then this will be reflected in the output language that the model produces.

LLMs can also display a behaviour called hallucination. This is where the model produces output language that while coherent isn't factually accurate or doesn't relate to the input language. There are many reasons for this but most relate to the earlier point that the models output is based on mathematical analysis rather than an in built understanding of the language it is provided or returning.

The AI revolution is real, and its potential impacts are made visible to the majority of us via LLMs such as ChatGPT or Google's Bard. It is also the interactions with these models that drives a lot of peoples fears about the direction the technology will take us. But it's important to appreciate how these models are doing what they do before becoming overly fearful or pessimistic.

A Two Coffee Problem

Monday, 1 January 2024

The Power of Maths and Data

No comments:

Post a Comment

Blog Archive