A Two Coffee Problem: 2020

Sunday, 20 September 2020

Transactions and Concurrency

The majority of applications will make use of some kind of data storage solution. In order to maintain the integrity of this data storage a large number of solutions will involve the concept of transactions in order to manage updates and modifications.

Intertwined with the concept of transactions is that of concurrency, executing transactions sequentially may ensure the protection of the data store but the application would be severely hampered by the lack of throughput.

In order to strike a balance between performance and consistency several strategies have evolved to deal with concurrently executing transactions.

ACID Transactions

Before we deal with concurrency we should first define the properties of a transaction. A transaction represents a set of instructions to run against the data store. Generally this will involve modifications to the data being stored and may result in data being returned to the caller.

In order for transactions to be executed whilst maintaining the integrity of the data store their implementation should follow certain rules. These rules are often characterised by the acronym ACID:

Atomicity: The instructions being executed within the transaction may have side effects on the data being stored. The execution of a transaction should be atomic in nature meaning either all side effects persist or none of them. This leads to the concept of transactions being "rolled back" should an error occur during execution.

Consistency: All transactions should leave the data store in a consistent state. The definition of consistency will vary between data stores but transactions should not leave the data in an unknown or inconsistent state according to whatever rules may be in place for the data set in question.

Isolation: Transactions should not be aware of each other or otherwise interact. Any errors that may occur in a transaction should not be visible to or affect other transactions.

Durability: Once a transaction has been successfully executed and "committed" to the data store then its effects should be persistent from that moment on regardless of any subsequent errors that may happen within the data store.

Concurrency Problems

So if we ensure that transactions follow the ACID rules why do we need further strategies for dealing with concurrency? Despite the ACID rules it is still possible for transactions to inadvertently cause errors when they are running concurrently against the same data set.

These problems can be intricate depending on the data being stored but some examples include:

Lost Update: Two transactions both operate on the same data item setting it to different values, this causes the update from the first transaction to be lost following the execution of the second transaction.

Dirty Read: Transactions read a value that is later rolled back following the transaction that originally set that value being aborted.

Incorrect Summary: A transaction presents an incorrect summary of a data set because a second transaction alters values while the first transaction is creating the summary.

Concurrency Control

The impact of the problems described in the previous section may vary between applications. This leads to a variety of strategies for trying to lessen or eradicate them from impacting the performance or correctness of the application. We won't here go into the detail of their implementation but they broadly fall into three categories:

Optimistic: An optimistic strategy assumes the chances of transactions clashing is low. Therefore checks on consistency and isolation are delayed until just before a transaction is committed allowing a high level of concurrency. If it turns out a problem has occurred then the transaction must be rolled back and executed again.

Pessimistic: A pessimistic strategy assumes the chance of errors is high and transactions should be blocked from executing concurrently if there is a possibility it could be the cause of an error.

Semi-Optimistic: A semi-optimistic strategy attempts to find a middle ground where transactions that appear safe are allowed to execute concurrently but transactions that appear to carry a risk are blocked from doing so.

Which strategy you choose is a balance between performance and consistency of data. An optimistic approach will provide higher performance by allowing a higher level of concurrency but with the possible overhead of transactions needing to be re-executed if a problem does occur. A pessimistic strategy will offer higher protection against errors but the blocking of concurrency, for example by table locking in a database, will reduce throughput.

Choosing the correct strategy will vary depending on the nature of your data set and the transactions you need to perform on it. Understanding the impacts and benefits of each type of strategy may help you develop your schema or approach to operating on the data to make the most of each approach where appropriate.

Sometimes it will be obvious you are using the wrong strategy, you may see a high level of transactions being aborted and re-run, or you may be suffering from a lack of throughput because of an overly pessimistic approach. As with most things in software engineering it can be a grey area to decide which strategy is best for your needs but being armed with the costs and benefits of possible strategies will be invaluable in enabling you to make a choice.

Sunday, 13 September 2020

Anaemic Models and Domain Driven Design

Domain Driven Design (DDD) is an approach to software development where the classes ,including their methods and properties, are designed to reflect the business domain in which they operate.

Some would argue this was the whole motivation behind Object Orientated Programming (OOP) in the first place, however many code bases don't actually take this approach. Either classes are written to reflect the internal structure of the code rather than the domain, or they tend to be operated upon rather than performing the operations themselves.

This has lead to advocates of DDD defining anaemic models as a prevalent anti-pattern.

Anaemic Models

An anaemic model can be thought of as a class that purely acts as a container for data, typically they will consist solely of getter and setters with no methods that perform operations on the underlying data.

The reason this is often seen as an anti-pattern is that this lack of inherent domain logic allows the model to be in an invalid state, whether or not this invalid state is allowed to affect the system as a whole is dependent on other classes recognising this invalid state and either fixing it or raising appropriate errors.

The fact this logic exists in other classes also raises the possibility that the all important business logic of the domain is scattered across the code base rather than being in a central place. This acts as a barrier to fully understanding the logic of the domain and can easily lead to bugs when refactoring of the code base is undertaken based on these misunderstandings.

The driver behind these anaemic models is often strict adherence to principles such as the Single Responsibility Principle (SRP) driving a want to separate representation of data from the logic that acts upon it.

DDD's answer to these issues is to centralise both the storage of data and the logic that acts upon it in the same object.

Aggregate Root

An aggregate root acts as a collection of objects all bound by a common context within a domain. As an example an aggregate root representing a customer might contain objects representing that customers, address, contact details, order history, marketing preferences etc.

The job of the aggregate root is to ensure the consistency and validity of the domain by not allowing external objects to hold a reference to or operate on the data in its collection. External code that wishes to operate on the domain must call methods on the aggregate root where these methods will enforce the logic of the domain and not allow operations that would put the domain in an invalid state.

As well as ensuring validity the aggregate root also acts as central documentation for the domain and the associated business logic, aiding understanding of the domain and allowing safer refactoring.

No Anaemic Models?

So should a code base never contain anaemic models? There is a practicality argument indicating that they cannot always be avoided.

Strict adherence to domain rules does come at a price, hydration of these models from API responses or data storage is often complicated by not being friendly to standard deserialisation or instantiation from the execution of a query. This will often lead to the need for anaemic models to act as Data Transfer Objects (DTOs) in these situations to bring the data into an application before then applying it to a domain.

The important point here is that not all models are designed to represent a domain or have business rules associated with them, some models role is simply to act as a bucket of information that will be processed or transformed into a domain at some later point. In these situations taking a DDD approach would add extra complexity with no benefit.

Recognising the difference between these types of models is key to choosing the right approach. This will come from a strong understanding of the domain in which your software operates in, recognising the domain contexts this leads to and ensuring these are implemented in the proper way. Other models in your code that exist purely to ease the flow of data through the system can be implemented in a more relaxed manner.

It is very easy for developers to become distant from the domain their code operates in, this doesn't make them bad engineers but it does hinder their ability to properly model their business in the code base. Make an attempt to understand your business domain and you'll surprised how it improves your code.

Sunday, 6 September 2020

Event Sourcing

Traditionally data storage systems store the current state of a domain at a particular moment in time. Whilst this is a natural approach, in some complex systems with a potentially rapidly changing data set the loss of the history of changes to the domain can lead to a loss in integrity, or an inability to recover from errors.

Event sourcing is an approach to help solve this problem by storing the history of changes to the domain rather than just the current state.

This won't be suitable or advantageous in every situation but it is a technique that can offer a lot of benefits when an ability to cope with large numbers of concurrent changes is of upmost importance.

The Problem

Most applications will use a Create, Retrieve, Update, Delete (CRUD) approach to storing the current state of the domain.

This leads to a typical workflow of retrieving data, performing some form of modification on that data and then committing it back to the data store. This will often be achieved by the use of transactions to try to ensure consistency in the data when it may be modified by multiple concurrent processes.

This approach can have downsides, the locking mechanisms used to protect data integrity can impair performance, transactions can fail if multiple processes are trying to update the same data set and without any kind of auditing mechanism this can lead to data loss.

Event Sourcing Pattern

Event sourcing attempts to address these issues by taking a different approach to what data is stored. Rather than storing the current state of a domain and having consumers perform operations to modify it, an event sourcing approach creates append only storage that records the events that describe the associated changes to the domain model they relate to.

The current state of the domain is still available by determining the aggregation of all events on the domain model, but equally the state of the domain at any point during the accumulation of those events can also be determined. This allows for the history of the domain to walked back and forward, allowing the cause of any inconsistency to be determined and addressed.

The events will usually have a meaning within the context of the domain, describing an action that took place with its associated data items and implications for the underlying domain. Importantly they should be immutable meaning they can be emitted by the associated area of the application and processed by the data store in an asynchronous manner. This leads to performance improvements and reduces the risk of contention in the data store.

When to Use It

Event sourcing is most appropriate when your system would naturally describe it's domain in terms of a series of actions or state changes, for example a series of updates to a shopping cart followed by a purchase. It is also an advantageous approach when your application has a low tolerance for conflicting updates in the domain causing inconsistency in the data set.

An event sourcing approach can itself come with downsides so it is not always appropriate for all applications. Most importantly whenever the state of the domain is visualised the system will only be eventually consistent, the asynchronous nature of event processing means any view of the data might not yet reflect events that are still being dealt with. If your application needs to be able to present an always accurate real time view of the data then this form of data store may not be appropriate.

It is also likely to add unwarranted overhead for an application with a simple domain model with a low chance of conflicting modifications causing a loss of integrity.

As with most patterns in software engineering one size does not fit all. If you recognise the problems described here in the operation of your application then event sourcing maybe an alternative approach you could benefit from. However if you applications domain model is relatively simple and you aren't regularly dealing with problems caused by inconsistency in your data store then there is probably no reason to deviate from your current approach.

Sunday, 23 August 2020

Effective Logging

Every developer reading this will have on many occasions been knee deep in an applications logs while trying to investigate an issue in the code base.

Being able to diagnose and resolve issues by reviewing logs is an important skill for any developer to acquire. However the application of this skill comes not just in being able to review logs but also in knowing what to log.

Presented below are a few ideas on best practices that can increase the effectiveness of your logs. Knowing exactly what to log will vary from application to application, but regardless of the nature of your code base there are certain steps that can be taken to increase the value of your logs.

Context

At the point of creating a log entry clearly the most important piece of data is the application state or request etc that you are logging. However the context that surrounds that entry can be equally important in diagnosing issues where causation is less apparent.

Sometimes you will look at logs and the problem will be obvious, the request is wrong or the response clearly indicates an error. However more often what is causing the problem is less clear and will depend on the context around the application at that time.

To be able to have a chance of getting to the bottom of these sorts of issues as much addtional context as possible should be added to each log entry. Certain contextual information is obvious like the username associated with a request but any addtional information you can collect alongside the main log entry could prove invaluable in spotting patterns.

Many logging frameworks have support for collecting this kind of contextual data, providing this can be done in a structured manner to not create noise in the logs then a good piece of general advice would be to log any additional context that it is possible to gather.

Correlation

Most applications of any complexity will have several links in the chain when processing a request. Information will flow upstream and downstream between different applications and systems, each one a potential cause of failure.

If you are only able to look at each applications logs in isolation without being able to tie a thread through each to determine the path the request took, then the amount of time taken to resolve issues is going to be extended.

To address this it's important to add an element of correlation between all log entries. This can be as simple as GUID that is passed between each application and added as contextual information to all log statements. This GUID can then act as a key when searching the logs to provide all the log entries that are associated with an individual request.

This will enable you to replay the journey the request took through your infrastructure and determine at what point things started to go wrong.

Structure

On a few occasions in this post we have mentioned having to search through an applications logs. When all you are presented with when reviewing logs is a wall of text it is very easy to induce a snow blindness that stops you from being able to garner any useful information.

If you log in a structured manner then it enables logs to be searched in more sophisticated ways than simply looking for certain snippets of text.

Many technologies exist for providing this log structure depending upon your technology platform. In general these framework rather than simply logging text will log data using a structured format such as JSON. Tools can then load these JSON entries and present a mechanism for searching the logs based on the kind of contextual information we have previously discussed.

Like so many aspects of coding that don't directly relate to the functionality users are consuming, logging can very often be an afterthought. This is ultimately self defeating since no application is perfect and you are certain to rely on logging on many occasions to resolve the issues caused by this imperfection.

The techniques described in this post are an attempt to increase the value provided by your logs to ensure when you inevitably need to review them the process is less stressful and as effective as possible.

Sunday, 16 August 2020

Is TDD Dead?

Test Driven Development (TDD) has long been viewed as one of the universal tenets of software engineering by the majority of engineers. The perceived wisdom being that applying it to a software development lifecycle ensures quality via inherent testability and via an increased focus on the interface to the code under test and the required functionality.

In recent years some have started to challenge the ideas behind TDD and question whether or not it actually leads to higher quality code. In May 2014 Kent Beck, David Heinemeier Hansson and Matin Fowler debated TDD and challenged its dominance as part of software engineering best practices.

The full transcript of their conversation can be found here: Is TDD Dead?.

Presented below are some of my thoughts on the topics they discussed. To be clear, I am not arguing that TDD should be abandoned. My aim is to provoke debate and try to understand if totally adherence to TDD should be relaxed or approaches modified.

Flavours of TDD

Before debating the merits or otherwise of TDD it's important to acknowledge that different approached to it exist. Strict adherence to TDD implies writing test first and using the so-called red-green refactor technique to move from failing tests to working code.

I think large numbers of teams who would purport to follow TDD will regularly not write tests first. Engineers will often find themselves needing to investigate how to implement certain functionality, with this investigation inevitability leading to writing some or all of an implementation prior to considering tests.

TDD as being discussed here applies to both scenarios, a looser definition of TDD would simply define TDD has an emphasis on code being testable. Many of the pro's and possible con's being discussed would apply equally whether or not tests were written prior to implementation or afterwards.

Test Induced Design Damage

Perhaps the most significant of the con's presented about TDD is that of test induced design damage.

Because discussions around TDD tend to focus on unit testing then adopting a TDD approach and focusing on testability tends to focus on enabling a class under test to be isolated from its dependencies. The tool used to achieve this is indirection, placing dependencies behind interfaces that can be mocked within tests.

One of the principle causes of test induced design damage is confusion and complication that comes from excessive indirection. I would say this potential design damage is not inherent in the use of indirection but is very easy to accidentally achieve if the interface employed to de-couple a class from a dependency is badly formed.

A badly formed interface where the abstraction being presented isn't clear or is inconsistent can have a large detrimental effect on the readability of code. This damage is very often enhanced when looking at the setup and verification of these interactions on mock dependencies.

Aside from testability another perceived advantage to indirection is the ability at some later point to change the implementation of a dependency without the need for wide spared changes in dependent code. Whilst these situations certainly exist perhaps they don't occur as often as we might think.

Test Confidence

The main reason for having tests is as a source of confidence that the code being tested is in working condition. As soon as the confidence is eroded then the value of the tests is significantly reduced.

One source of this erosion of confidence can be a lack of understanding of what the tests are validating. When tests employ a large number of mocks, each with their own setup and verification steps, it is easy for tests to become unwieldy and difficult to follow.

As the class under test is refactored and the interaction with mocks is modified the complexity can easily be compounded as engineers who don't fully understand how the tests work need to modify them to get them back to a passing state.

This can easily lead to a "just get them to pass" attitude, if this means there is no longer confidence that the tests are valid and verifying the correct functionality then any confidence that the tests passing means we are in a working state is lost.

None of this should be viewed as saying that unit tests or the use indirection are inherently bad. Instead I think it is hinting at the fact that maybe the testability of code needs to be viewed based on the application of multiple types of tests.

Certain classes will lend themselves well to unit testing, the tests will be clear and confidence will be derived from them passing. Other more complex areas of code maybe better suited to integration testing where multiple classes are tested as a complete functional block. Providing these integration tests are able to test and prove functionality this should still provide the needed confidence of a working state following refactoring.

So many aspects of software engineering are imperfect with no answer being correct 100% of the time. Maybe this is also true of TDD, in general it provides many benefits but if it can on occasion have a negative impact maybe we need to employ more of a test mix so that our overall test suite gives us the confidence we need to release working software.

Sunday, 9 August 2020

API Toolbox

The increasing application of the Software as a Service (SaaS) delivery model means that APIs are the regular means by which we interact with the services that help us write software. Even if APIs aren't the primary mechanism by which we consume the service it is increasingly common place, and potentially even expected, that an API surface will be available to analyse metrics and data related to consumption.

API is a broad term with many different approaches to implementation available to us. Sometimes changes in technology are related to trends or views on implementation correctness, such as migration from SOAP to REST being driven by the desire for a more lightweight representation of data.

However sometimes technology choice is driven by the nature of the API being implemented, the data it is expected to convey and the make-up of the likely consumers. Presented below are some of the technologies available along with the circumstances that might drive you to choose them to implement your API.

REST

REST is still by far the most common approach to implementing an API. Characterised by its use of HTTP verbs and status codes to provide a stateless interface along with the use of JSON to represent data in a human readable format, it still represents a good technology choice for the majority of APIs.

Difficulties in consuming REST are often first and foremost caused by an unintuitive surface making APIs difficult to discover or by an incoherent data model making the data returned from the API hard to work with and derive value from.

In recent years other technologies have gained transaction as alternatives to REST. I believe these do not represent alternative implementations in all circumstances, instead they look to improve upon REST in certain situations.

GraphQL

The majority of APIs relate to the exposure of data from underlying data sources. The richer the data sources the more there becomes a risk that a users will become flooded by data.

REST APIs are built around the concept of a resource, you make a request to return a particular resource and the entire resource is returned to you. With a large data model this can increase noise in the data when an entire resource is returned when only a small sub-set of data was required. This not only means bandwidth is wasted by returning unnecessary data items but performance may be further impacted by potentially making unnecessary additional calls to downstream systems.

A complex data model can also make it harder to intuitively discover what data items are available.

To address some of these issues Facebook developed GraphQL providing a language for querying data from an API. Having a query language means only the data items that are required can be requested, reducing the amount of data returned along with potentially reducing the work the API must do to acquire the entire resource. A query based approach also provides an element of discoverability and schema identification.

To a certain extent it is possible to achieve a queryable interface using REST, by use of path parameters and the query string, but GraphQL increases the flexibility that can be provided.

gRPC

Because REST will in most circumstances represent data using JSON under certain circumstances it can consume larger amounts of bandwidth than is strictly necessary. In the majority of cases the impact of this will be minimal but for certain consumers, most notably IoT devices, this can have a significant impact.

To address this technologies such as gRPC represent data in a binary format, this means only the minimum amount of bandwidth is consumed. gRPC also moves away from API endpoints in favour of remote method calls on both server and client.

Both these aspects make it ideal for scenarios where data must be transferred using minimal resources, both in terms of bandwidth but also in terms of power consumption of transceivers etc.

Personally I believe REST is never going to be a bad choice for implementing an API but under circumstances it may not be optimal. Shown here are a couple scenarios where this may be the case.

Recognising when these situations arise will enable you to consider possible alternatives. Using them when they aren't necessary is likely to make your API harder to understand and consume, but using them in the scenarios they are designed to address will optimise your API implementation and open it up to a larger and more diverse set of consumers.

Saturday, 1 August 2020

Twelve Factor App - Admin Processes

The concept of the Twelve Factor app was developed by engineers at Heroku to describe the core principles and qualities that they believe are key to the adoption of the Software as a Service (SaaS) methodology.

First published in 2011 the Heroku platform is unashamedly opinionated in the enforcement of these principles, the relevance of them to effective software development has only intensified as the adoption of cloud computing has increased the number of us deploying and managing server side software.

The twelfth principle relates to the management of administrative tasks:

"Any needed admin tasks should be kept in source control and packaged with the application."

Administrative Tasks

It is quite common for engineers managing an application to need to perform one off admin processes within the environment the application is deployed into. The most common of these will be data migrations to accommodate schema changes or other updates to how data is stored.

Other examples might be needing to extract data from the environment for debugging or investigating issues or needing to inspect aspects of the application as it runs.

These administrative tasks are a natural part of managing an evolving application as requirements and needs change over time.

Process Formation

Within a twelve factor app the process formation is the mechanism that allows an application to effectively scale as demand grows. Each aspect of the application runs in its own process that can therefore be independently scaled to meet the changing scale and shape of the demand being placed on the application.

Administrative tasks should be treated no differently, they should be executed within the same process formation and the code or scripts associated with them should be part of the applications repository.

The changes being made by these administrative tasks need to be recorded alongside the applications source code. Not only so that these changes are recorded but also to allow every other environment, including a developers local development environment, to be kept in sync with the production environment.

Changes such as database migrations are often iterative in nature and so the history of the changes that have been applied are vital to understanding the structure of the data stores the application is running against.

REPL Shells

A twelve factor app strongly favours technologies that provide a REPL shell environment. This allows administrative tasks to be consistently applied across all environments. Locally developers can simply invoke scripts via the shell within their development environment. Within a deployed environment a shell can be opened on the machine to achieve the same outcome, this can either be manual or automated via the applications deployment process.

Issues will often arise in production due to information about previous changes not being available to all team members. A previous deployment make have been tweaked or fixed by applying changes to a database or some other update of the environment. When the application is re-deployed to a fresh environment these un-recorded changes are not re-applied and the same issue raises it's head again.

An applications repository should contain the history of all changes made to an application along with all the resources necessary to get the application up and running. Secret knowledge of tweaks and changes that are needed to get the application need to be avoided at all costs.

Modern deployment techniques allow for the automation of all sorts of processes, there is ever reducing reasons for manual changes to an environment to be required during an applications deployment. That is not to say that it will never be necessary when an issues arises for manual changes to be made, but as soon as this happens the next question needs to be how do we automate this for the next deployment?

Sunday, 26 July 2020

Twelve Factor App - Logs

The eleventh principle relates to the handling of logs:

"Applications should produce logs as event streams and leave the execution environment to aggregate"

The Importance of Logging

One of the primary mechanisms for monitoring the behaviour and operation of a deployed application is via logging. Exactly what is being logged will vary by application but log entries will relate to events and operations taking place within the application, this time ordered list presents a history of what has taken place both good and bad.

Often logging is an after thought with not as much attention being payed to it as probably should be given its importance in maintaining application health and performance. Whenever a developer needs to investigate any issue in the application, or to validate correct operation, they are likely to use the application logs as their primary source of information.

Aggregated Streams

For logs to be useful they need to be aggregated and stored. An app following twelve factor principles does not play a part in this aggregation and storage process, it simply ensures the flow of logging data through the standard output mechanism of the tech stack being employed (such as stdout).

This simplifies the approach to logging in the code base and allows for different logging strategies to be employed in different environments.

In a development environment a developer may simply review the logs in the terminal, while in production or staging environments the logs will be collected and managed by the infrastructure.

Log Management

This approach also opens up the possibility of going beyond simple file logging to use log management services that can increase the value that can be derived from the underlying data.

Tools such as Splunk, or any other big data solution, make it possible to collect large amounts of data that would be impractical if every aspect of logging had to be implemented within the application code base. Frameworks such as Serilog give the log data structure enabling it to be queried and mined for information that might be hard to glean by simply reading the entries as text.

Effective logging along with the ability to review the data it provides are important skills for any developer to learn. When an issue has struck and stakeholders are looking for answers than this skill we help you steer a course back towards a working system.

The simpler the approach to logging is the more the data it produces can be relied upon. Many tools exist to help in this regard and ensuring your applications only role is to feed data in to the system will help to make this as simple as possible.

Saturday, 18 July 2020

Twelve Factor App - Dev/Prod Parity

The tenth principle relates to the parity that should exist between all the environments where the application will run:

"All environments should be as similar as possible"

Environments

The exact number of environments that an application will run in will vary from team to team. Incidentally, one of the benefits of following the twelve factor principles is that standing up new environments should be a straight forward and repeatable process.

Despite variation in the exact number broadly speaking environments will fall into three categories. Development, Staging and Production.

In Development code is hot off the press and will more than likely be unstable. It may be running on the developers machine or in some shared environment where everyones latest and greatest is being deployed immediately after it's merged.

In Staging we have refined our code to a point where we think it is ready for showtime. Before we make this final step we need to verify that our new code works in conjunction with everyone else's both new and existing.

Finally when everything is tested we make the final step of putting our new bits into production for real users to make use of.

Gaps

Often there are undesirable gaps between these environments that manifest themselves in three areas.

Time gaps exist when code is delayed from reaching production, it maybe days, weeks or months from a developers keystrokes to the code being deployed into production. This build up of change can lead to big bang deployments where large numbers of changes are dropped into production alongside each other. Clearly this increases the possibility that unforeseen problems will manifest themselves, the volume of change is also likely to hinder any investigation into what has gone wrong in order to find resolution.

Personal gaps exist when different individuals are responsible for the administration or management of the environments. Clearly not every developer is going to be granted access to production but when administration is undertaken by those not close to the code important context is lost. Conversely, if developers have no appreciation for the nature of the environment their code will be deployed into mistakes may be made that could have been avoided.

Tools gaps exist whenever different versions or types of tooling are used in each environment. The likelihood of this will vary greatly depending upon your technology stack, but it can often be tempting to use a more lightweight version of a particular backing service in Development compared to Production. Anytime there is a difference between Production and other environments than a small doubt will be created that testing is proving the code will work once deployed.

Closing the Gaps

A twelve factor app looks to eliminate or at least reduce the scope of these gaps.

Reducing the time gap by ensuring the lag between a developer writing code and it ending up in Production where it can provide benefit is as small as possible. Often implementing this speed of deployment is not a technical challenge, instead it is a battle against fear. The only way to overcome this fear is to evolve an automated testing strategy such that your build and deployment process is self evaluating and self proving.

Personal gaps are overcome by ensuring developers are closely aligned to the deployment of code and its management once in the environment. This gives developers important context and insight into what becomes of their code once it's written and merged. This principle should also be extended to include responsibility for issues and problems that may arise once the code is in use, this accountability isn't about assigning blame but to aid the development of a sense of when the seeds of an issue may be laid in the code base.

Tooling gaps can be harder too close since there may be cost and practicability arguments to be made as to why the same technology isn't being used everywhere. However a twelve factor mindset instills a wish for these gaps between production and other environments to be made as small as possible. It's important to realise that this isn't about scale, clearly having a Development environment scaled to deal with production loads is inefficient and costly. Instead this is about behaviour and functionality, these aspects of the backing services your application uses should be predictable and uniform in all environments.

Modern techniques such as so called Infrastructure as Code and containerisation mean it is now much simpler to try and ensure alignment between environments can be maintained. Bugs grow in the gaps between environments. Discounting issues caused by insufficient testing, all bugs that have ever been shipped to production, which if we're being honest is a large number, have passed through test environments without being noticed. Ensuring close alignment between environments gives less space for these bugs to develop and grow without being spotted.

Sunday, 12 July 2020

Twelve Factor App - Disposability

The ninth principle relates to the disposability of processes and the relation to start-up and shutdown:

"Fast startup and shutdown are advocated for a more robust and resilient system."

Disposable Processes

An applications source code and the binary it produces represent the processor instructions that produce the intended functionality. A process is an instance, provided by the devices operating system, of these instructions running against the devices processor.

At a high level the process is made up of, the applications instructions, allocated memory and threads of execution, handles to file system resources and security attributes such as the user who owns the process etc.

Within a twelve factor app processes are first class citizens and a fundamental part of the architectural approach. They should also strive to be completely disposable. This means they should be able to be started, stopped and killed whenever necessary and complete all of these stages in a timely and efficient manner.

Startup Time

A process should not be slow to start, that is to say the time between a launch command being issued and the process being available to service requests should be as small as possible.

The reason this is desirable is because a twelve factor app scales via the application of a process formation, creating more processes to deal with demand as it comes. If these processes are slow to start-up then it limits the ability of an application to scale effectively.

Fast start up time also improves the robustness of the application. Sometimes processes may need to be migrated to new hardware or execution environment, either as part of a release or because of an underlying failure. Again a fast start-up processes allows this to happen quickly with minimum impact on the ability of the application to deal with demand.

Graceful Shutdown

A process should also be able to be gracefully shutdown or terminated. This means the ability for it finish dealing with a request quickly, to offload the necessary work to a queuing system in an effective manner.

This is mainly achieved by the process in general trying to avoid long running tasks or tasks that are non-transactional in nature and therefore can't be easily stopped or transferred to another process.

Alongside this graceful shutdown behaviour the proper implementation of a queuing system is essentially to ensure the application isn't vulnerable to sudden failure causing irreparable data loss or damage. The work the application does must be able to be easily represented in distinct items that can be queued and re-queued in an idempotent manner ensuring the application can be relied upon to operate in a clean consistent way.

When looking at the application in isolation the code executed as part of start-up and shutdown can sometimes be overlooked. Normally we are more concerned with what the code is doing while it is running.

However in a cloud based deployment environment applications don't just start and then run for an indefinitely long period of time. They are regularly required to start and stop as the application is scaled or re-deployed. Putting some thought into these areas of the code will lead to benefits in your ability to provide a robust and scaleable application that can flex to meet demand and cope with failure.

Monday, 6 July 2020

Twelve Factor App - Concurrency

The eight principle relates to how applications should scale:

"Concurrency is advocated by scaling individual processes."

The Process Model

A complex application may be made up of multiple processes each with a particular role providing functionality to the whole. The process model also represents a method of scaling an application by enabling the same functionality to be available multiple times by replica processes running on the same machine.

First Class Citizen

Within a twelve factor app a process is a first class citizen by modelling the various process that may make up the app in a similar manner to the Unix process model.

Modelling an application in this way involves identifying the distinct, and non-overlapping, areas of functionality that it is comprised of and architecting them such that they can each be assigned their own process.

This divide and conquer technique should be recognisable to most engineers. Isolate and contain common functionality and ensure that the bonds between each distinct block are loose.

Process Formation

Aside from laying a path towards a SOLID architecture this process based approach provides a robust scaling mechanism to allow an application to respond to peaks in demand.

While a process can scale internally to handle an increased workload, via multithreading etc, at a certain point this will no longer be practical. When this point is reached then more processes can be spawned in order to deal with the increased demands being placed on the application.

Because the applications internal functional blocks have been modelled as individual processes this scaling can be focussed on the areas of functionality that are in highest demand. The ability to focus the scaling of the application like this ensures we make the best possible use of all available resources without wasting them on scaling parts of the application that don't need to scale out.

The resultant stacks of processes, each varying in height due to the vertical scaling we've mentioned, is referred too as the process formation.

The ability to effectively scale an application and make the best use of available resources has become ever more important over time. Not only is it required to meet the demands of your users, but inefficient scaling by simple throwing more compute at the problem is a costly exercise.

An applications architecture plays a role in deciding how successful it can be scaled. The scaling of a monolithic code base with inadequate seams between its parts will always be an inefficient and costly business, like so many things this needs to be considered early on in the application lifecycle and not left until it's too late.

Sunday, 28 June 2020

Twelve Factor App - Port Binding

The sixth principle relates to how applications expose their functionality to the wider world or estate:

"Self-contained services should make themselves available to other services by specified ports."

Running Inside a Web Server

It is a common pattern of application development and deployment for an application to be run from within a web server.

In this model the application build process often produces a binary that is itself not executable. During the release phase the application is placed within a web server that provides an environment in which the application can be invoked and the underlying functionality consumed.

This approach means the application isn't able to function independently and is reliable on the web server to turn it into a useful application.

Self Contained Web Server

An important goal of a twelve factor app is to be self contained and not dependent on other software to be useful. Running inside a web server doesn't allow an app to meet these aspirations.

Instead a twelve factor app contains a web server implementation within it. This means when the application is built it is a fully fledged executable without the need for additional software to be used to bring it to life.

An example of this is the Kestrel web server used within .NET Core applications. This lightweight web server is built into a .NET Core application and enables it to provide its functionality as a direct result of the build process with no additional web server being required to host the application.

Using this technique a twelve factor app can expose its functionality by binding a process to a given port. As an example when debugging the application locally it might be accessed by http://localhost:5001/my/cool/api.

This same approach of a process binding to a port can be-re-used in any environment with different aspects of the application, running as different processed, exposing their functionality on different ports.

Walking on the Edge

Quite often the web server embedded into the application, such as Kestrel, can act as an edge server exposing the application directly to the outside world.

In reality this is unlikely to be the case. There may be security or networking requirements that necessity a bulkier more feature rich web server to be used to act as a reverse proxy onto the application.

The important aspect of this approach is that the application can be developed independent of these requirements, since it is not responsible for their implementation. Because it is a self contained application listening on a port the technology and set-up of the reverse proxying infrastructure can be changed and optimised with no risk that the application will fail to function or will need to be changed to reflect these changes.

Independence is an important quality for any piece of software. The more bonds we have between applications and areas of the code the more the impact of change will ripple through a code base. The twelve factor principles, by the promoting self containment, try to avoid this situation and create an environment where each area of a wider estate can be worked on in isolation.

This means each area can be optimised and taken in the best direction for the functionality it provides. Many technology platforms recognise this and are giving you the tools you need to make this a reality for your suite of applications.

Sunday, 21 June 2020

Twelve Factor App - Processes

The sixth principle relates to the process model used within the application:

"Applications should be deployed as one or more stateless processes with persisted data stored on a backing service."

The Process Model

Stateless Processes

A twelve factor app views application processes as state less sharing nothing between themselves.

In practice this means that the data the process works with and stores directly should be short lived and assumed to not persist to be available in the future. With many application processes dealing with many incoming requests it is extremely unlikely that the same process will deal with any future request from the same user.

Even if your application was only a single process it is still dangerous to assume that data will always be available when the process or the machine could be restarted or otherwise lose access to the memory space.

This isn't to say that an application cannot store and use data in a process but that it should expect the data to only be available while dealing with the current request and not available for any future requests.

State in Backing Services

A users interaction with an application is usually made up of multiple requests over a period of time. In this situation it is unrealistic to think that some data won't have to persist between these requests in order to give the user a seamless experience.

Twelve factor principles don't forbid this but they do state that the process is not a good place to store this data.

Instead a backing service, available to all application processes and independent of any process lifecycle should be used to hold the necessary user session state. Because this data is often needed to be available and updated quickly caching services such as Redis are good candidates to provides this data storage.

Many of the twelve factor principles promote behaviours and approaches that benefit scaling of an application. Relying on session sticky-ness reduces the ability of an application to scale as resources need to be reserved in order to deal with possible future requests, or fail or take longer to complete because of missing data. Writing your application to not rely on sticky-ness means you can make more efficient use of your available resources in order to ensure you can deal with the maximum number of request possible.

Integrating with backing services that can provide the necessary data storage without being tied to an application process is a relatively easy way to follow this principle and produce an efficient productive application.

Sunday, 14 June 2020

Twelve Factor App - Build, Release and Run

The fourth principle relates to the strict separation of the build, release and run phases of application development:

"The delivery pipeline should strictly consist of build, release, run."

Development to Deployment

A codebase is transformed from source code to a working application in three stages.

The Build stage converts source code into an executable binary. The resultant binary is a combination of source code produced by the development team, 3rd party dependencies and the underlying framework being used to produce the application.

The Release stage takes the binary and combines this with configuration suitable for the environment that it will be deployed into.

Finally the Run stage starts the application in the environment, bringing it to life and making the functionality it provides available to the rest of the estate.

The exact nature of all of these stages will vary greatly depending on the technologies being used during development but the essence of the three distinct stages remains. Build your code, combine it with configuration and deploy, then finally run your code in the environment.

An app following the twelve factor principles strives to maintain this strict separation.

Isolation of Change

The first important aspect of maintaining this separation is to isolate where functional changes can be made. As an example functional changes to the application should not be made as part of the run phase. Any such changes cannot be pushed upstream to the build phase and therefore create inconsistency during development and testing.

It maybe that configuration changes made during the release phase can also affect functionality but these changes can be easily applied during development in a consistent and controlled manner.

By properly isolating change we increase the repeatability of the entire process to produce consistent results, not only in a production environment but for local and shared development environments as well.

Always Releasing

Release and run and phases should be as simple and quick as possible. Not only is this a desirable quality for any piece of engineering but in a modern cloud based architecture deployment is a frequently occurring operation.

As an environment scales out with more copies of your application being required a reliable and fast deployment mechanism is essential in being able to respond to demand.

A slick and fast deployment process also reduces the amount of time it takes from developer commit to running in production. The compound effect of these marginal gains enables consistent gains and improvements in your applications performance and effectiveness.

Finally an effective release process also enables strong rollback strategies when a change into production needs to be reversed. The more complicated the release process the more any rollback is a leap into the unknown and the less likely that the environment can be returned to a working state quickly.

The need to keep these three phases separate may on the face of it seem obvious but inefficiencies and sub-optimal changes can easily cause the line between them to be blurred over time. Recognising the impact this has and being on the look out for anything that goes against this goal is key to maintaining a healthy development environment. As with most aspects of software engineering this is more of a goal than an absolute but being driven by the correct principles will never normally steer you wrong.

Sunday, 7 June 2020

Twelve Factor App - Backing Services

The third principle relates to the management of backing services:

"All backing services are treated as attached resources and attached and detached by the execution environment."

Backing Services

Applications will routinely need to consume resources that are not part of the code being deployed. These might be databases, queuing system or communication services such as the sending of email or push messages.

The management and the deployment of these services will likely be a mixture of those being supplied by the same team that are deploying the application and those that our sourced from third parties.

These services are often also drivers of change, either because of a change in supplier or because the team wants to move to a new technology. Generally these services also need to be highly available with outages not being tolerable.

Attached Resources

If an app is written to follow twelve factor principles then it will treat all backing services as attached resources accessible via a URL defined by the configuration within the environment.

The goal of this approach is to the limit the impact of needing to change the source of the functionality. If we have written custom code to interact with a resource or we are using a proprietary SDK then any change in the technology being used will mean code changes and a new deployment of the application is required.

By accessing the resource via a URL, provided by the environment, then in theory we can change the location and technology of these services without the need for any code changes.

Minimal Changes

In practice this will often be difficult to achieve, if for example we change to a different database technology this might require at least tweaks to our code if the feature sets have some variation.

However by making efforts to push the functionality provided by the services to the edge of our application we have done everything we can to avoid unnecessary change.

It is often a marker for good software architecture for there to be identifiable seams in the code that enable these kinds of changes to be achieved with minimal overall change in the structure of the code. I think this a more realistic aim than hoping to never need to make code changes when a service changes.

An application is hardly ever able to contain all the functionality that's required to achieve the overall aim, it will always be the case that the code will need to interact with external services. Ensuring the nature of these services is not hard baked into the code needs to be a top priority not just to allow for future changes but also to ensure a clean overall architecture.