A Two Coffee Problem: July 2020

Sunday, 26 July 2020

Twelve Factor App - Logs

The concept of the Twelve Factor app was developed by engineers at Heroku to describe the core principles and qualities that they believe are key to the adoption of the Software as a Service (SaaS) methodology.

First published in 2011 the Heroku platform is unashamedly opinionated in the enforcement of these principles, the relevance of them to effective software development has only intensified as the adoption of cloud computing has increased the number of us deploying and managing server side software.

The eleventh principle relates to the handling of logs:

"Applications should produce logs as event streams and leave the execution environment to aggregate"

The Importance of Logging

One of the primary mechanisms for monitoring the behaviour and operation of a deployed application is via logging. Exactly what is being logged will vary by application but log entries will relate to events and operations taking place within the application, this time ordered list presents a history of what has taken place both good and bad.

Often logging is an after thought with not as much attention being payed to it as probably should be given its importance in maintaining application health and performance. Whenever a developer needs to investigate any issue in the application, or to validate correct operation, they are likely to use the application logs as their primary source of information.

Aggregated Streams

For logs to be useful they need to be aggregated and stored. An app following twelve factor principles does not play a part in this aggregation and storage process, it simply ensures the flow of logging data through the standard output mechanism of the tech stack being employed (such as stdout).

This simplifies the approach to logging in the code base and allows for different logging strategies to be employed in different environments.

In a development environment a developer may simply review the logs in the terminal, while in production or staging environments the logs will be collected and managed by the infrastructure.

Log Management

This approach also opens up the possibility of going beyond simple file logging to use log management services that can increase the value that can be derived from the underlying data.

Tools such as Splunk, or any other big data solution, make it possible to collect large amounts of data that would be impractical if every aspect of logging had to be implemented within the application code base. Frameworks such as Serilog give the log data structure enabling it to be queried and mined for information that might be hard to glean by simply reading the entries as text.

Effective logging along with the ability to review the data it provides are important skills for any developer to learn. When an issue has struck and stakeholders are looking for answers than this skill we help you steer a course back towards a working system.

The simpler the approach to logging is the more the data it produces can be relied upon. Many tools exist to help in this regard and ensuring your applications only role is to feed data in to the system will help to make this as simple as possible.

Saturday, 18 July 2020

Twelve Factor App - Dev/Prod Parity

The tenth principle relates to the parity that should exist between all the environments where the application will run:

"All environments should be as similar as possible"

Environments

The exact number of environments that an application will run in will vary from team to team. Incidentally, one of the benefits of following the twelve factor principles is that standing up new environments should be a straight forward and repeatable process.

Despite variation in the exact number broadly speaking environments will fall into three categories. Development, Staging and Production.

In Development code is hot off the press and will more than likely be unstable. It may be running on the developers machine or in some shared environment where everyones latest and greatest is being deployed immediately after it's merged.

In Staging we have refined our code to a point where we think it is ready for showtime. Before we make this final step we need to verify that our new code works in conjunction with everyone else's both new and existing.

Finally when everything is tested we make the final step of putting our new bits into production for real users to make use of.

Gaps

Often there are undesirable gaps between these environments that manifest themselves in three areas.

Time gaps exist when code is delayed from reaching production, it maybe days, weeks or months from a developers keystrokes to the code being deployed into production. This build up of change can lead to big bang deployments where large numbers of changes are dropped into production alongside each other. Clearly this increases the possibility that unforeseen problems will manifest themselves, the volume of change is also likely to hinder any investigation into what has gone wrong in order to find resolution.

Personal gaps exist when different individuals are responsible for the administration or management of the environments. Clearly not every developer is going to be granted access to production but when administration is undertaken by those not close to the code important context is lost. Conversely, if developers have no appreciation for the nature of the environment their code will be deployed into mistakes may be made that could have been avoided.

Tools gaps exist whenever different versions or types of tooling are used in each environment. The likelihood of this will vary greatly depending upon your technology stack, but it can often be tempting to use a more lightweight version of a particular backing service in Development compared to Production. Anytime there is a difference between Production and other environments than a small doubt will be created that testing is proving the code will work once deployed.

Closing the Gaps

A twelve factor app looks to eliminate or at least reduce the scope of these gaps.

Reducing the time gap by ensuring the lag between a developer writing code and it ending up in Production where it can provide benefit is as small as possible. Often implementing this speed of deployment is not a technical challenge, instead it is a battle against fear. The only way to overcome this fear is to evolve an automated testing strategy such that your build and deployment process is self evaluating and self proving.

Personal gaps are overcome by ensuring developers are closely aligned to the deployment of code and its management once in the environment. This gives developers important context and insight into what becomes of their code once it's written and merged. This principle should also be extended to include responsibility for issues and problems that may arise once the code is in use, this accountability isn't about assigning blame but to aid the development of a sense of when the seeds of an issue may be laid in the code base.

Tooling gaps can be harder too close since there may be cost and practicability arguments to be made as to why the same technology isn't being used everywhere. However a twelve factor mindset instills a wish for these gaps between production and other environments to be made as small as possible. It's important to realise that this isn't about scale, clearly having a Development environment scaled to deal with production loads is inefficient and costly. Instead this is about behaviour and functionality, these aspects of the backing services your application uses should be predictable and uniform in all environments.

Modern techniques such as so called Infrastructure as Code and containerisation mean it is now much simpler to try and ensure alignment between environments can be maintained. Bugs grow in the gaps between environments. Discounting issues caused by insufficient testing, all bugs that have ever been shipped to production, which if we're being honest is a large number, have passed through test environments without being noticed. Ensuring close alignment between environments gives less space for these bugs to develop and grow without being spotted.

Sunday, 12 July 2020

Twelve Factor App - Disposability

The ninth principle relates to the disposability of processes and the relation to start-up and shutdown:

"Fast startup and shutdown are advocated for a more robust and resilient system."

Disposable Processes

An applications source code and the binary it produces represent the processor instructions that produce the intended functionality. A process is an instance, provided by the devices operating system, of these instructions running against the devices processor.

At a high level the process is made up of, the applications instructions, allocated memory and threads of execution, handles to file system resources and security attributes such as the user who owns the process etc.

Within a twelve factor app processes are first class citizens and a fundamental part of the architectural approach. They should also strive to be completely disposable. This means they should be able to be started, stopped and killed whenever necessary and complete all of these stages in a timely and efficient manner.

Startup Time

A process should not be slow to start, that is to say the time between a launch command being issued and the process being available to service requests should be as small as possible.

The reason this is desirable is because a twelve factor app scales via the application of a process formation, creating more processes to deal with demand as it comes. If these processes are slow to start-up then it limits the ability of an application to scale effectively.

Fast start up time also improves the robustness of the application. Sometimes processes may need to be migrated to new hardware or execution environment, either as part of a release or because of an underlying failure. Again a fast start-up processes allows this to happen quickly with minimum impact on the ability of the application to deal with demand.

Graceful Shutdown

A process should also be able to be gracefully shutdown or terminated. This means the ability for it finish dealing with a request quickly, to offload the necessary work to a queuing system in an effective manner.

This is mainly achieved by the process in general trying to avoid long running tasks or tasks that are non-transactional in nature and therefore can't be easily stopped or transferred to another process.

Alongside this graceful shutdown behaviour the proper implementation of a queuing system is essentially to ensure the application isn't vulnerable to sudden failure causing irreparable data loss or damage. The work the application does must be able to be easily represented in distinct items that can be queued and re-queued in an idempotent manner ensuring the application can be relied upon to operate in a clean consistent way.

When looking at the application in isolation the code executed as part of start-up and shutdown can sometimes be overlooked. Normally we are more concerned with what the code is doing while it is running.

However in a cloud based deployment environment applications don't just start and then run for an indefinitely long period of time. They are regularly required to start and stop as the application is scaled or re-deployed. Putting some thought into these areas of the code will lead to benefits in your ability to provide a robust and scaleable application that can flex to meet demand and cope with failure.

Monday, 6 July 2020

Twelve Factor App - Concurrency

The eight principle relates to how applications should scale:

"Concurrency is advocated by scaling individual processes."

The Process Model

A complex application may be made up of multiple processes each with a particular role providing functionality to the whole. The process model also represents a method of scaling an application by enabling the same functionality to be available multiple times by replica processes running on the same machine.

First Class Citizen

Within a twelve factor app a process is a first class citizen by modelling the various process that may make up the app in a similar manner to the Unix process model.

Modelling an application in this way involves identifying the distinct, and non-overlapping, areas of functionality that it is comprised of and architecting them such that they can each be assigned their own process.

This divide and conquer technique should be recognisable to most engineers. Isolate and contain common functionality and ensure that the bonds between each distinct block are loose.

Process Formation

Aside from laying a path towards a SOLID architecture this process based approach provides a robust scaling mechanism to allow an application to respond to peaks in demand.

While a process can scale internally to handle an increased workload, via multithreading etc, at a certain point this will no longer be practical. When this point is reached then more processes can be spawned in order to deal with the increased demands being placed on the application.

Because the applications internal functional blocks have been modelled as individual processes this scaling can be focussed on the areas of functionality that are in highest demand. The ability to focus the scaling of the application like this ensures we make the best possible use of all available resources without wasting them on scaling parts of the application that don't need to scale out.

The resultant stacks of processes, each varying in height due to the vertical scaling we've mentioned, is referred too as the process formation.

The ability to effectively scale an application and make the best use of available resources has become ever more important over time. Not only is it required to meet the demands of your users, but inefficient scaling by simple throwing more compute at the problem is a costly exercise.

An applications architecture plays a role in deciding how successful it can be scaled. The scaling of a monolithic code base with inadequate seams between its parts will always be an inefficient and costly business, like so many things this needs to be considered early on in the application lifecycle and not left until it's too late.