Sunday, 7 July 2019

Data Driven Success



Over the last couple of decades the majority of businesses have realised that the data they accumulate is one of their most valuable assets. This isn't necessarily because it has inherent value as a commodity, but because of the insights it can provide to drive efficiency, innovation and effectiveness.

However none of this is possible if you don't understand how data is accumulated, how it can be analysed and how it can be used. On the surface sometimes these aspects can appear trivial, and for the more simple applications of data maybe they are. But unlocking the higher orders of the potential of data requires a more thought through and scientific approach.

Very few, if anyone, can describe themselves as having mastered the use of data. As tools and techniques evolve new potential is unlocked. What is more important is an appreciation of the traps that lie in wait for those that don't understand the nature of data.

Big Data

The concept of big data is now so prevalent that it is almost taken as a given when discussing data related topics. I think the fact that the term makes reference to the amount of data has always been problematic. Big data is about more than simply the amount of data that has been acquired, it is about volume, variety and velocity.

Of course volume plays a part, never has our capability to store and process large amounts of data been higher. As the amount of data you collect rises the potential for insight grows.

Equally important is variety. This includes variety of sources, domains and context, if you only ever collect one particular dimension of data from one particular source the insights you can gain will be limited.

Finally velocity, insights that only become visible after long periods of data collection are likely to be harder to action effectively. Being able to accumulate data at a fast rate opens up the possibility to put its learnings to use before others can gain the same understanding.

Lake vs Warehouse

At a high level data is stored in either a structured or unstructured way. Generally after it is collected it is in an unstructured form, at this stage it is a raw material requiring refinement. In order to be used to drive insight it must transition into a structured form to support querying, exporting and socialisation.

We have come to refer to these two forms of storage as data lakes (unstructured) and data warehouses (structured). The issue with moving between structured and unstructured forms is it involves bias. This is driven from the current understanding of the nature of the data, the nature of the problem that is trying to be solved and second guessing what the answers might be.

To a large extent this bias is unavoidable. What must be minimised, and ideally eliminated, is the loss of raw data during the refinement process. As your understanding of the data, the collection process and the insights grows you will often realise your initial bias. If your raw material is lost then so is your opportunity to resolve it.

When data transitions too warehouses it should remain in the lake ready for the day when you discovery better and more insightful ways to process it.

Machine Learning

A practical example of what is possible with data is machine learning. By using this technique artificial intelligence can be created that can utilise otherwise hidden patterns in data.

Machine learning involves a model being trained to make decisions based on the analysis of a large and potentially varied data set. Traditional rules based analysis, with its inherent bias towards what it expects the data to show, doesn't offer the flexibility required to simply go where the data points. Once this training has taken place the system can be given new data as input, and based on the learnings from previous data, make judgements on the meaning behind this new data.

Machine learning can look miraculous as it appears to be based on values we don't traditionally associate with software such as reason, judgement and intuition. But actually it is simply a demonstration of the power of data and what can happen when its analysis is approached with an open mind.

Technology can sometimes be prone to fads or fashions but the appreciation of data and its uses isn't one of them. The amount and variety of data we can collect and process is only going to increase, as will our ability to derive value and insight from it. Technologies like machine learning are a demonstration of this trend, now that we've discovered the genie it isn't going back in the bottle.      


No comments:

Post a Comment