A Two Coffee Problem: August 2020

Sunday, 23 August 2020

Effective Logging

Every developer reading this will have on many occasions been knee deep in an applications logs while trying to investigate an issue in the code base.

Being able to diagnose and resolve issues by reviewing logs is an important skill for any developer to acquire. However the application of this skill comes not just in being able to review logs but also in knowing what to log.

Presented below are a few ideas on best practices that can increase the effectiveness of your logs. Knowing exactly what to log will vary from application to application, but regardless of the nature of your code base there are certain steps that can be taken to increase the value of your logs.

Context

At the point of creating a log entry clearly the most important piece of data is the application state or request etc that you are logging. However the context that surrounds that entry can be equally important in diagnosing issues where causation is less apparent.

Sometimes you will look at logs and the problem will be obvious, the request is wrong or the response clearly indicates an error. However more often what is causing the problem is less clear and will depend on the context around the application at that time.

To be able to have a chance of getting to the bottom of these sorts of issues as much addtional context as possible should be added to each log entry. Certain contextual information is obvious like the username associated with a request but any addtional information you can collect alongside the main log entry could prove invaluable in spotting patterns.

Many logging frameworks have support for collecting this kind of contextual data, providing this can be done in a structured manner to not create noise in the logs then a good piece of general advice would be to log any additional context that it is possible to gather.

Correlation

Most applications of any complexity will have several links in the chain when processing a request. Information will flow upstream and downstream between different applications and systems, each one a potential cause of failure.

If you are only able to look at each applications logs in isolation without being able to tie a thread through each to determine the path the request took, then the amount of time taken to resolve issues is going to be extended.

To address this it's important to add an element of correlation between all log entries. This can be as simple as GUID that is passed between each application and added as contextual information to all log statements. This GUID can then act as a key when searching the logs to provide all the log entries that are associated with an individual request.

This will enable you to replay the journey the request took through your infrastructure and determine at what point things started to go wrong.

Structure

On a few occasions in this post we have mentioned having to search through an applications logs. When all you are presented with when reviewing logs is a wall of text it is very easy to induce a snow blindness that stops you from being able to garner any useful information.

If you log in a structured manner then it enables logs to be searched in more sophisticated ways than simply looking for certain snippets of text.

Many technologies exist for providing this log structure depending upon your technology platform. In general these framework rather than simply logging text will log data using a structured format such as JSON. Tools can then load these JSON entries and present a mechanism for searching the logs based on the kind of contextual information we have previously discussed.

Like so many aspects of coding that don't directly relate to the functionality users are consuming, logging can very often be an afterthought. This is ultimately self defeating since no application is perfect and you are certain to rely on logging on many occasions to resolve the issues caused by this imperfection.

The techniques described in this post are an attempt to increase the value provided by your logs to ensure when you inevitably need to review them the process is less stressful and as effective as possible.

Sunday, 16 August 2020

Is TDD Dead?

Test Driven Development (TDD) has long been viewed as one of the universal tenets of software engineering by the majority of engineers. The perceived wisdom being that applying it to a software development lifecycle ensures quality via inherent testability and via an increased focus on the interface to the code under test and the required functionality.

In recent years some have started to challenge the ideas behind TDD and question whether or not it actually leads to higher quality code. In May 2014 Kent Beck, David Heinemeier Hansson and Matin Fowler debated TDD and challenged its dominance as part of software engineering best practices.

The full transcript of their conversation can be found here: Is TDD Dead?.

Presented below are some of my thoughts on the topics they discussed. To be clear, I am not arguing that TDD should be abandoned. My aim is to provoke debate and try to understand if totally adherence to TDD should be relaxed or approaches modified.

Flavours of TDD

Before debating the merits or otherwise of TDD it's important to acknowledge that different approached to it exist. Strict adherence to TDD implies writing test first and using the so-called red-green refactor technique to move from failing tests to working code.

I think large numbers of teams who would purport to follow TDD will regularly not write tests first. Engineers will often find themselves needing to investigate how to implement certain functionality, with this investigation inevitability leading to writing some or all of an implementation prior to considering tests.

TDD as being discussed here applies to both scenarios, a looser definition of TDD would simply define TDD has an emphasis on code being testable. Many of the pro's and possible con's being discussed would apply equally whether or not tests were written prior to implementation or afterwards.

Test Induced Design Damage

Perhaps the most significant of the con's presented about TDD is that of test induced design damage.

Because discussions around TDD tend to focus on unit testing then adopting a TDD approach and focusing on testability tends to focus on enabling a class under test to be isolated from its dependencies. The tool used to achieve this is indirection, placing dependencies behind interfaces that can be mocked within tests.

One of the principle causes of test induced design damage is confusion and complication that comes from excessive indirection. I would say this potential design damage is not inherent in the use of indirection but is very easy to accidentally achieve if the interface employed to de-couple a class from a dependency is badly formed.

A badly formed interface where the abstraction being presented isn't clear or is inconsistent can have a large detrimental effect on the readability of code. This damage is very often enhanced when looking at the setup and verification of these interactions on mock dependencies.

Aside from testability another perceived advantage to indirection is the ability at some later point to change the implementation of a dependency without the need for wide spared changes in dependent code. Whilst these situations certainly exist perhaps they don't occur as often as we might think.

Test Confidence

The main reason for having tests is as a source of confidence that the code being tested is in working condition. As soon as the confidence is eroded then the value of the tests is significantly reduced.

One source of this erosion of confidence can be a lack of understanding of what the tests are validating. When tests employ a large number of mocks, each with their own setup and verification steps, it is easy for tests to become unwieldy and difficult to follow.

As the class under test is refactored and the interaction with mocks is modified the complexity can easily be compounded as engineers who don't fully understand how the tests work need to modify them to get them back to a passing state.

This can easily lead to a "just get them to pass" attitude, if this means there is no longer confidence that the tests are valid and verifying the correct functionality then any confidence that the tests passing means we are in a working state is lost.

None of this should be viewed as saying that unit tests or the use indirection are inherently bad. Instead I think it is hinting at the fact that maybe the testability of code needs to be viewed based on the application of multiple types of tests.

Certain classes will lend themselves well to unit testing, the tests will be clear and confidence will be derived from them passing. Other more complex areas of code maybe better suited to integration testing where multiple classes are tested as a complete functional block. Providing these integration tests are able to test and prove functionality this should still provide the needed confidence of a working state following refactoring.

So many aspects of software engineering are imperfect with no answer being correct 100% of the time. Maybe this is also true of TDD, in general it provides many benefits but if it can on occasion have a negative impact maybe we need to employ more of a test mix so that our overall test suite gives us the confidence we need to release working software.

Sunday, 9 August 2020

API Toolbox

The increasing application of the Software as a Service (SaaS) delivery model means that APIs are the regular means by which we interact with the services that help us write software. Even if APIs aren't the primary mechanism by which we consume the service it is increasingly common place, and potentially even expected, that an API surface will be available to analyse metrics and data related to consumption.

API is a broad term with many different approaches to implementation available to us. Sometimes changes in technology are related to trends or views on implementation correctness, such as migration from SOAP to REST being driven by the desire for a more lightweight representation of data.

However sometimes technology choice is driven by the nature of the API being implemented, the data it is expected to convey and the make-up of the likely consumers. Presented below are some of the technologies available along with the circumstances that might drive you to choose them to implement your API.

REST

REST is still by far the most common approach to implementing an API. Characterised by its use of HTTP verbs and status codes to provide a stateless interface along with the use of JSON to represent data in a human readable format, it still represents a good technology choice for the majority of APIs.

Difficulties in consuming REST are often first and foremost caused by an unintuitive surface making APIs difficult to discover or by an incoherent data model making the data returned from the API hard to work with and derive value from.

In recent years other technologies have gained transaction as alternatives to REST. I believe these do not represent alternative implementations in all circumstances, instead they look to improve upon REST in certain situations.

GraphQL

The majority of APIs relate to the exposure of data from underlying data sources. The richer the data sources the more there becomes a risk that a users will become flooded by data.

REST APIs are built around the concept of a resource, you make a request to return a particular resource and the entire resource is returned to you. With a large data model this can increase noise in the data when an entire resource is returned when only a small sub-set of data was required. This not only means bandwidth is wasted by returning unnecessary data items but performance may be further impacted by potentially making unnecessary additional calls to downstream systems.

A complex data model can also make it harder to intuitively discover what data items are available.

To address some of these issues Facebook developed GraphQL providing a language for querying data from an API. Having a query language means only the data items that are required can be requested, reducing the amount of data returned along with potentially reducing the work the API must do to acquire the entire resource. A query based approach also provides an element of discoverability and schema identification.

To a certain extent it is possible to achieve a queryable interface using REST, by use of path parameters and the query string, but GraphQL increases the flexibility that can be provided.

gRPC

Because REST will in most circumstances represent data using JSON under certain circumstances it can consume larger amounts of bandwidth than is strictly necessary. In the majority of cases the impact of this will be minimal but for certain consumers, most notably IoT devices, this can have a significant impact.

To address this technologies such as gRPC represent data in a binary format, this means only the minimum amount of bandwidth is consumed. gRPC also moves away from API endpoints in favour of remote method calls on both server and client.

Both these aspects make it ideal for scenarios where data must be transferred using minimal resources, both in terms of bandwidth but also in terms of power consumption of transceivers etc.

Personally I believe REST is never going to be a bad choice for implementing an API but under circumstances it may not be optimal. Shown here are a couple scenarios where this may be the case.

Recognising when these situations arise will enable you to consider possible alternatives. Using them when they aren't necessary is likely to make your API harder to understand and consume, but using them in the scenarios they are designed to address will optimise your API implementation and open it up to a larger and more diverse set of consumers.

Saturday, 1 August 2020

Twelve Factor App - Admin Processes

The concept of the Twelve Factor app was developed by engineers at Heroku to describe the core principles and qualities that they believe are key to the adoption of the Software as a Service (SaaS) methodology.

First published in 2011 the Heroku platform is unashamedly opinionated in the enforcement of these principles, the relevance of them to effective software development has only intensified as the adoption of cloud computing has increased the number of us deploying and managing server side software.

The twelfth principle relates to the management of administrative tasks:

"Any needed admin tasks should be kept in source control and packaged with the application."

Administrative Tasks

It is quite common for engineers managing an application to need to perform one off admin processes within the environment the application is deployed into. The most common of these will be data migrations to accommodate schema changes or other updates to how data is stored.

Other examples might be needing to extract data from the environment for debugging or investigating issues or needing to inspect aspects of the application as it runs.

These administrative tasks are a natural part of managing an evolving application as requirements and needs change over time.

Process Formation

Within a twelve factor app the process formation is the mechanism that allows an application to effectively scale as demand grows. Each aspect of the application runs in its own process that can therefore be independently scaled to meet the changing scale and shape of the demand being placed on the application.

Administrative tasks should be treated no differently, they should be executed within the same process formation and the code or scripts associated with them should be part of the applications repository.

The changes being made by these administrative tasks need to be recorded alongside the applications source code. Not only so that these changes are recorded but also to allow every other environment, including a developers local development environment, to be kept in sync with the production environment.

Changes such as database migrations are often iterative in nature and so the history of the changes that have been applied are vital to understanding the structure of the data stores the application is running against.

REPL Shells

A twelve factor app strongly favours technologies that provide a REPL shell environment. This allows administrative tasks to be consistently applied across all environments. Locally developers can simply invoke scripts via the shell within their development environment. Within a deployed environment a shell can be opened on the machine to achieve the same outcome, this can either be manual or automated via the applications deployment process.

Issues will often arise in production due to information about previous changes not being available to all team members. A previous deployment make have been tweaked or fixed by applying changes to a database or some other update of the environment. When the application is re-deployed to a fresh environment these un-recorded changes are not re-applied and the same issue raises it's head again.

An applications repository should contain the history of all changes made to an application along with all the resources necessary to get the application up and running. Secret knowledge of tweaks and changes that are needed to get the application need to be avoided at all costs.

Modern deployment techniques allow for the automation of all sorts of processes, there is ever reducing reasons for manual changes to an environment to be required during an applications deployment. That is not to say that it will never be necessary when an issues arises for manual changes to be made, but as soon as this happens the next question needs to be how do we automate this for the next deployment?