A Two Coffee Problem: October 2018

Sunday, 21 October 2018

The Value of Working

In software development we spend a large amount of time defining, debating and discovering how things work. It sounds like a simple enough question, does this software work or not? To a first approximation there can be a simple yes or no answer but the reality is usually more complicated.

Software engineering is often full of debate about the rights and wrongs of a solution, sometimes they are critical sometimes they are superficial.

Given that, how can we define when software is working?

But, why?

The fact that software is working should come with an explanation as to how, anytime this explanation delves into magic or the unexplainable "just because" then the code being fit for purpose can be called into question.

This is because the working state of software isn't binary, there can be degrees of working. Something may work but be insecure, something may work but be inefficient, something may work but only if you don't stray from the golden path.

No software is perfect, or potentially even truly finished, so some of these short comings may be acceptable, but without an effective explanation of how something works they will remain undiscovered for that determination to be made.

Obviously limits need to be placed on this explanation, it isn't necessary to understand how everything works from your code all the way down to the metal, at least not in most circumstances. But every line of code that came from your key strokes should have a clear explanation to define its existence.

Show Me

Every line of code that ever produced a defect was at one point looked upon by a developer declared to be working. But working shouldn't be an opinion, it should demonstrable by the presence of a passing test.

Legacy code is often defined as code without tests and not necessarily related to age. The reason for this is because the fear that legacy code generates comes from the inability to prove it is working. This limits the ability to refactor or evolve the code because of the risk that it will no longer fulfil its duties.

In order to maintain our faith in tests they should evolve as our understanding of the code being tested evolves. Defects in production happen despite the face tests declared the code to be working, when this is proven not to be the case the tests should evolve to catch this imperfection next time.

Enough is Enough

Code being declared to be working is not necessarily the end of the story, working is a minimum requirement. Is it secure? Will it scale? Can it be tested? Are all questions that may prolong the amount of attention that needs to be paid to the code being worked on.

While these things are important they can also lead to procrastination, the ability to recognise when something is good enough only comes with experience. Inexperience often tends to push towards either stopping at the first solution or a tendency to prematurely optimise for situations that may never arise.

Developing a more rounded attitude to this situation is born from a pragmatism that software needs to ship combined with the scars of releasing before something was truly working. To this point whether or not software continues to work post release is also something that shouldn't be taken for granted, continuing to monitor your software once its in the hands of users is what will enable you to make a better judgement next time around.

Software engineering is a discipline where the state of readiness of the end product is not a universal truth and can be in the eye of the beholder. Often something can be deemed working because it has shipped, reality often eventually proves this fallacy wrong. To a greater or lesser extent no-one has other shipped bug-free software, we shouldn't torment ourselves about that fact but just realise that working is a state in time, our job is to optimise our code such that it works for the longest possible period of time.

Sunday, 14 October 2018

Feature Proof

A software development team is in a constant cycle of delivering features, they are often the currency on which they are judged, measuring the velocity at which they can be delivered into production.

But not all features are created equally, they don't all turn out to be successful so how can we tell a potentially effective feature from one that will turn out to be a waste of effort, or maybe even worse potentially degrading to the overall experience?

These are difficult questions and this post won't provide a full-proof way to deliver great new features, it does present some criteria on which you may want to judge a potential feature when it is first proposed to be the next big thing.

Two Faced

Any new feature must provide a benefit both to users and to the business, if either of these groups end up dissatisfied then the feature will ultimately die.

A feature that only benefits the user while potentially cool and fun for the user will struggle to justify the effort involved in delivering and maintaining it, there should always be a defined benefit for the business in delivering it.

This benefit can be indirect or subtle, not every feature needs to deliver sales, but the benefit must be understood and as we will move on to discuss should be measurable. If the business benefit becomes too intangible then it can become difficult to determine success, if this happens too frequently its easy to suddenly find yourself with a system that delivers no positive business outcomes.

A feature that only delivers business value will struggle to gain traction and will do harm by persuading your users that there is nothing for them to gain from engaging with your software. Eventually a critical mass of users will reach this conclusion and your user base will collapse.

A good feature could be controversially described as a bribe, or at least a situation where you and users come to an informal unspoken agreement, they will do what you want them to do in exchange for what you're prepared to offer them.

Verifiable Success

The success of a feature shouldn't be intangible or abstract, the success of a business is neither of these things so the feature you develop to achieve that success shouldn't be either.

Before a new feature enters development there should be a hypothesis on why it will be successful and how that success is going to be measured. As we've already discussed success doesn't have to just be about the bottom line but any success worth achieving should be measurable.

Basing success on measurable analytics gives you the freedom to explore less obvious ideas, combined this with A\B testing and an effective mechanism to turn features on and off and you will provide yourself a platform to take more risks with the features you decide to try.

This also presents the opportunity to learn, the behaviour of large numbers of users is often unpredictable and counter intuitive. In this environment deploying a feature that has no defined measure of success is akin to gambling that your knowledge of your user base is complete and comprehensive, how confident are you that this is the case?

Do No Harm

Each new feature comes on the heels of those that have come before it, if you've been effective these existing feature will be delivering value both for you and your users. If this is the case then the ultimate fail would be for a new feature to degrade the performance or otherwise compromise this value chain.

No new feature should put at risk the effective operation of what has come before. This shouldn't be because of any shortcomings in the feature itself but also its development shouldn't serve to distract development effort from fixing defects or inefficiencies that currently exist in production.

Users can become frustrated by a lack of features, but they become angry when features they do want to use fail them. Too often a release can be seen as underwhelming if it only contains defects fixes but these releases can deliver the most value because they are fixing and improving features that you know users are already engaged with.

Feature development is an inexact science, if there was a guaranteed formula to delivering them then no enterprise would ever fail. It also means that the advice given in this post also comes with no guarantee, but hopefully it enforces the fact that new features need thought and a feature for the sake of delivering a feature is unlikely to benefit anybody. Once again in software engineering we may have found in an instance where less is more.

Sunday, 7 October 2018

Testing Outside the Box

Automated testing of some description is now common place within the majority of development teams. This can take many different forms, unit testing, integration testing or BDD based testing.

These different approaches are designed to test individual units of code, complete sub-systems or an entire feature. Many teams will also be automating non-functional aspects of their code via NFT and Penetration testing.

But does this represent the entire toolbox of automated testing tools that can be utilised to employ a shift left strategy? Whilst they certainly represent the core of any effective automation strategy if we think outside the box then we can come up with more inventive ways to verify the correct operation of our code.

Randomised UI Testing

Using BDD inspired tests to verify the implementation of a feature usually involves the automation of interaction with the applications UI, simulating clicking, scrolling, tapping and data entry. This interaction will be based on how we intend users to navigate and interact with our UI, however users are strange beasts and will often do things we didn't expect or cater for.

Randomised UI testing attempts to highlight potential issues if users do stray off script by interacting with the UI in a random non-structured way. The tests do not start out with a scenario or outcome they are trying to provoke or test for, instead they keep bashing on your UI for a prescribed period of time hoping to break your application.

Sometimes these tests will uncover failures in how your application deals with non-golden path journeys. On occasion the issues it finds may well be very unlikely to ever be triggered by real users but non the less highlights areas where your code could be more defensive or less wasteful with resources.

Mutation Testing

Unit tests are the first line of defence to prove that code is still doing the things it was originally intend to do, but how often do we verify that the tests can be relied upon to fail if the code they are testing does develop bugs?

This is the goal of mutation testing, supporting tooling will deliberately alter the code being tested and then run your unit tests in the hope that at least one test will fail and successfully highlight the fact that the code under tests is no longer valid. If all your tests pass then these mutations could be introduced by developers and potentially not be caught.

The majority of tooling in this area makes subtle changes to the intermediary output provided by interpreted languages. This may involve swapping logical operators, mathematical operators, post and prefix conditions or assignment operations.

Issues highlighted by mutation testing enable you to improve your unit tests to ensure that they cover all aspects of the code they are testing. It's also possible that they will highlight redundant code that has not material impact and can therefore be removed.

Constrained Environment

Up until now we've concentrated on testing the functional aspects of code but failures in non-functional aspects can be equally impacting to users. One approach to this is Non Functional Testing (NFT) that pours more and more load on your system in order to test it to breaking point.

Another approach can be to run your system under normal load conditions but within an environment that is constrained in someway.

This might be running with less memory than you have in your standard environment, or less CPU. It might mean running your software alongside certain other applications that will compete for resource or deliberately limiting bandwidth and adding latency to network calls.

In a mobile or IoT context this could take the form of running in patchy network conditions or with low battery power.

Although this style of testing can be automated they don't necessarily produce pass\fail output, instead they allow you to learn about how your system reacts under adversity. This learning may highlight aspects of your system that aren't apparent when resources are plentiful or conditions are perfect.

It's also possible that this style of testing will show that you can reduce the resources of the environment you deploy your code into and benefit from cost savings.

The first goal of any automated testing approach should be to verify correct operation of code but with a little imagination its possible to test other aspects of your code base and create feedback loops that drive continual improvement.

Effective development also involves treating all code that your write equally, tests are as much part of your production code as the binaries that are deployed into the production.

Defects in these test will allow defects into your production environment that will potentially impact users. Not only should care be taken in writing them but they should be under the same focus of improvement as every other line of code that you write.