A Two Coffee Problem: Teaching the Machine

In my previous post on Large Language Models (LLMs) I referred several times to the models being trained, but what does this actually mean?

The process I was referencing is known as Machine Learning (ML), it comes in many different forms and is probably the most important and time consuming aspect of building Artificial Intelligence (AI).

Much like our own learning, the training of an AI model is an iterative process that must be fine tuned and repeated in order to ensure a workable and useful model is delivered.

Logic vs Maths

Traditionally engineered software is a collection of logical statements, such as if-else, do-while and for-each, that form fixed predetermined paths from input to output. The problem the software deals with must essentially be solved in the minds of the engineers producing it so they can determine these logical paths.

However there are many classification of problems where it is impractical to solve them by writing software in this way. This may be complex medical problems, problems of processing complex data such as natural language or problems where the actual solution isn't known in advance.

The process of Machine Learning exposes statistical algorithms to large amounts of data in order to find relationships. This allows the trained model to generalize and make predictions or observations about unseen data without the need for explicit instructions on how the data should be processed.

Supervised Learning

Broadly speaking Machine Learning falls into two categories, supervised and unsupervised with the difference between the two being whether the required output is known in advance.

Supervised learning uses datasets that consist of many input data items, referred to as features, and a desired output known as a label. As an example we might have a medical dataset covering many different aspects of a persons health and a marker of whether or not they have a disease such as diabetes. The goal of the training is to develop a model that given someones health data can predict if they are likely to develop diabetes.

The model learns by processing the training data and identifying the relationships between the various data items, with its accuracy being assessed by how well it can produce the required output. When via a process of iteration and tweaks to the mathematical algorithms being used the model is deemed to be trained it is used to process previously unseen data.

Many types of supervised learning use some form of mathematical regression to define trend lines for datasets where this trend line forms the basis for prediction and definition of the output label.

Human experts in the problem space the model is dealing with are key in the process of identifying the data features that the model should work with and ensuring the data it is trained with is of sufficient quality and diversity to produce a workable and accurate model.

Unsupervised Learning

Unsupervised learning, also sometimes referred to as deep learning, involves datasets that don't have a predefined label defining what the output should be. The model identifies previously unknown relationships in the data in order to cluster datapoints and find commonalties, this then enables the model to look for the presence of these commonalties in new data it is exposed to.

Examples of this type of learning might be analysis of customer buying behaviour in order to predict future purchases, the ability to recognise the content of images, or in the case of an LLM like ChatGPT the ability to predict the natural language that forms an answer to a question.

Supervised learning is generally based on neural networks. Originally based on the organisation of the human brain, a neural network consists of a number of interconnected neurons. However rather than being biological in nature, these neurons are mathematical models that take a number of inputs and produce a single output to pass onto the next neuron in the chain.

Although the types of problems that are addressed by supervised learning are generally those where there is no preconceived wisdom on the relationship between the data items, human experts still play a crucial role in analysing the outputs of the model in order to drive the iterative training process that will increase the models accuracy.

The portrayal of AI in science fiction would have us believe that a switch is flipped and a fully formed intelligence springs into life holding infinite knowledge. The reality is that its a pain staking, costly and time consuming process that requires many cycles to perfect. Machine Learning is essential the nuts and bolts of AI, services such as ChatGPT live or die based on the expertise and engineering that is applied in this phase. The fundamentals of the learning process applies equally to machines as to humans.

A Two Coffee Problem

Sunday, 7 January 2024

Teaching the Machine

No comments:

Post a Comment

Blog Archive