A Two Coffee Problem: 2019

Sunday, 17 November 2019

As a Service Principles

Software as a Service (Saas) is a delivery model where consumers pay to access functionality provided by a system when they need it without having to manage, purchase or be responsible for the software providing the functionality.

The large scale adoption of SaaS as a delivery model by providers has been fuelled by the adoption of cloud hosting. The emergence of the cloud made it possible for providers to easily stand-up and scale compute resource in order to provide functionality to large numbers of consumers. This in turn meant consumers no longer had to own their own infrastructure and install software in order to take advantage of the service.

As the use of SaaS has grown the importance of the underlying architecture being suitable for cloud hosting has become ever more important, this has lead to the concept of the Twelve Factor App. This manifesto defines twelve factors that should be adhered to in order to build software with a scalable architecture. I'm not going to go through all twelve factors in this post, instead I'm going to concentrate on the principles that underpin the definition of these factors and which they are designed to promote.

Declarative Infrastructure

In the past knowledge about the architecture and make-up of a systems infrastructure was either held inside engineers heads or within cumbersome documentation that had a strong propensity to become outdated or inaccurate. This meant whenever changes were required or disaster struck the number of people who could effectively deal with the situation was limited, it also makes reversing changes more challenging and provides a barrier to entry for new engineers joining the team.

To deal with this situation we can use declarative formats to define our underlying infrastructure. Sometimes also referred to as Infrastructure as Code this approach uses technologies like Terraform, Chef or Puppet to drive the generation and maintenance of infrastructure in a machine and human readable format that shares many of the properties of code.

This approach allows these files to be managed under source control providing more robustness around the history of changes in the infrastructure and the ability to rollback. Since these files are driving the shape of the infrastructure they also cannot become outdated or out of sync with reality. The repeatability of applying these files also means scaling or recreating your infrastructure, in both development and production environments, becomes an error free process.

Finally new engineers joining the team, providing they are familiar with the technology being used, can easily and quickly get to grips with the nature of the infrastructure and become effective.

Clean Contract

One of the benefits of the cloud is portability, the ability too quickly move between hosting environments, operating systems or hardware configurations. The ability to do this can be easily undermined if your software does not have a clean and well defined contract between itself and it's dependencies.

This can be achieved by proper adherence to dependency declaration and dependency isolation. Software should never rely on a dependency implicitly being part of the environment it's operating in. All dependencies are explicitly declared via some form of manifest that forms part of the applications source code. Many technologies exist for achieving this depending upon the stack you are using to develop your software. Gemfiles, NuGet, Chocolatey and many others all provide a way for software to declare that it is reliant on certain other packages being available. This not only reduces the risk of unfilled dependencies once software is deployed but also makes it easier for developers to simply clone the code and build when getting started in the team.

This applies equally to the availability of system tools that maybe dependent on the underlying operating system, every effort should be made to isolate these dependencies and ensure that the functionality is provided to the application in an explicit and declarative way.

Minimal Divergence

Many issues and problems that present themselves in production will be meet with incredulity by developers since everything worked fine on their machine. These situations are born from the divergence of development and production environments. Many of the principles and techniques we've already discussed work towards reducing this possibility and ensuring as far as possible all environment are the same.

This is achieved by the proper application of continuous integration and deployment. This has the effect of reducing the amount of time between code being developed and being available in the environment and ensures that the process of code being developed and deployed aren't separate responsibilities.

An important aspect to achieving this is also to ensure that technology stacks don't vary between environments. The same database, web server and operating systems are in use everywhere that the code runs. Clearly the resource available in these environments is likely to be different but the fundamental fabric of the technology is kept in line.

The adoption of a cloud based SaaS delivery model is about more than simply where your code is hosted. Rather it is about divorcing your code from having any meaningful relationship with it's hosting environment, the freedom this gives allows for agility, efficiency and productivity.

As with most things whether or not you are embracing a SaaS model is not a binary operation that is on or off. It's about placing yourself on a scale but trying to adhere to the principles described here will help you reach the tipping point where the benefits of the approach will start to be realised.

Sunday, 20 October 2019

Responsibility Segregation

A consistent property of bad code is a lack of segregation between responsibilities. Relatively large classes will implement multiple facets of functionality and therefore be responsible for more than one aspect of a system.

This will lead to code that is difficult to follow, difficult to maintain and difficult to extend. Those large classes will frequently be modified because they are responsible for many things, if some of these changes are sub-optimal then technical debt gradually accumulates and grows. To finish the vicious circle this can compound the original problem leading to more technical debt and the downward spiral continues.

Command Query Responsibility Segregation (CQRS) is a design pattern focused on addressing this situation by defining clear responsibility boundaries and encouraging proponents to ensure these boundaries aren't breached.

Commands and Queries

Within the CQRS pattern functionality is either a command or a query.

A query is a piece of functionality that given a context will interrogate a data source to return the requested data to the caller. Critically a query should be idempotent and not change the state of the underlying data in any way.

A command is more task driven, it is a piece of functionality that given a context will change the state of an underlying data source or system. Because of its inherent side effects it should not be used to return data to the user as this is the role of a query, instead just returning the result of the downstream operation. This requirement around what a command should return can in practice be difficult to achieve, more often than not some data is required to come back from the command but the important aspect is that callers are aware that commands perform operations on data and therefore have side effects.

Although not explicitly part of the pattern an effective CQRS implementation will also not chain queries or command together. The layers of abstraction this builds can make the code difficult to follow and understand, this can lead to unintended consequences when a caller doesn't realise the chain of events that will unravel. Instead queries and commands should be composed by callers making individual and separate calls to each element in turn, potentially passing data between them and building an aggregated response to return upstream.

Database Origins

Although in the last section the pattern is presented in abstract terms, and CQRS can be applied too many different areas, the origins of the approach comes from the application of CRUD when dealing with databases.

Within this world there can be many advantages to treating reads and writes differently. Firstly there can be advantages to using different models depending on whether data is being queried or modified, also the load presented by reads and writes is often not symmetrical so being able to separate the workloads can bring performance and efficiency advantages.

Having strong abstractions over the top of data access also enables more flexibility in the approach to underlying storage with users being protected from the nuances this may involve via the interface presented by commands and queries.

Advantages

First and foremost the advantage of CQRS is the re-use that can be achieved by the promotion of separating concerns. When code does one thing and does it well the opportunity for re-use is increased. Quite often when classes are bigger and do more the functionality they offer will always be almost what you need but not quite. This either leads to a new class being created with a slightly modified interface, leading to duplication, or the existing class being tweaked leading to its integrity being further degraded.

Effectively segregated code is also likely to be easier to test since the interface to the code will be simpler and it is likely to have fewer dependencies.

Finally the code base as a whole will be understandable with a clearer structure. New members of your team will quickly be able to asses what the code base is capable of by looking at the queries and commands that can be executed.

No one pattern can be a solution for all problems but certain qualities of well constructed code should be promoted above all others. Segregation of responsibility is one of those qualities, it is almost the very definition of good architecture to promote this quality and ensure its adoption and adherence.

As a code bases grows and evolves you will likely have to introduce additional concepts alongside that of commands and queries, providing these new elements have a strong identity and clearly defined role within your system then this will enable your code to grow whilst still maintaining the well defined structure that is a recipe for success.

Sunday, 13 October 2019

Striding To Be Secure

Whenever software is deployed it is a virtual certainty that at some point it will come under some form of attack. This might be via a bot evaluating your infrastructure for the possibility of exploiting known vulnerabilities, or a concerted effort from hackers to make your code expose data and functionality it shouldn't.

Resources like the OWASP Top 10 can help you recognise common security mistakes but each piece of software, and the use cases it implements, will present varying and sometimes specific security flaws. This means it can be a valuable exercise to take a step back and try and analyse your software from the point of view of an attacker.

STRIDE is a mnemonic that can help with this kind of threat analysis by identifying the six categories of attack that hackers may try and perpetrate against your code.

Spoofing

An authenticated system will rely on some mechanism for a user to identify themselves. Spoofing is when an attack is successfully able to identify themselves as another user. This doesn't necessarily mean breaking passwords but on attacking the mechanism your system users to prove on subsequent requests that authentication has taken place.

As a rather trivial example if your system relied on users including an HTTP header indicating their user ID as a means of authentication this could easily be spoofed by an attacker by simply including the ID of the user they are trying to spoof.

Spoofing can actually occur before users even get to your code via attacks such as DNS or TCP/IP spoofing where attacks imitate your site into to lure unsuspecting users into entering their information.

These attacks will generally be countered by careful analysis of authentication systems to ensure that identity cannot be falsified.

Tampering

Tampering occurs when an attack is successfully able to modify data in transit or at rest for a malicious purpose.

The various forms of injection attack represent the classic examples of tampering. This may be SQL injection, cross site scripting or any attack that allows an attacker to inject their own code into the application.

Tampering attacks can be addressed by taking a healthy distrust in all input from the outside world and sanitising it before it gets anyway near forming part of the execution path.

Repudiation

Repudiation is the act of being able to deny that an act or operation took place. This will generally occur if your system does not have sufficient logging to be able to track all user operations, or by allowing attacks to change or destroy logs in order to cover their tracks.

The defence against repudiation is the robust implementation of audit logging. This should cover all user interactions but also the behaviour of your infrastructure and any other data source that can be used to forensically analyse whats was happening in your system at any given point of time.

Information Disclosure

Information disclosure is perhaps the worst nightmare of any business if its systems come under attack, it occurs any time an attacker is able to view data that they shouldn't be allowed to see.

This can be caused by improper application of authorisation, insecure transport mechanism, a lack of encryption, or a lack of segregation between elements of a system allowing hackers to jump from a non-critical element to a more critical part of the system.

Your systems production data needs to be treated with the utmost care and attention, access controls and authorisation must be robustly implemented to ensure that only entitled users are ever allowed to view or export data.

Denial of Service

Denial of Service (DoS) attacks are unique in the sense that they are not necessarily aimed at extracting data from your system or causing to execute specific functionality for an attacker, instead they are simply designed to stop your software being able to offer its intended functionality to your user base.

They can take many different forms but generally involve presenting your code and your infrastructure with more work than it is capable of handling, this means your site become unavailable to legitimate users or to become so slow as to be useless to them.

The exact method of protection against these kinds of attacks will vary depending on your functionality and infrastructure but will usually depend on being able to effectively measure and categorise the traffic entering your system alongside the ability to deny and block suspicious traffic at the edge of your network.

Elevation of Privilege

Privilege elevation occurs whenever a user is able to perform operations that they shouldn't be able to perform based on their role within the system, they are generally higher level functions usually reserved for administrators.

These attacks will generally rely on an insecure authorisation mechanism, as an example if a users role is controlled via a query string element then an attacker will be able to elevate their system privilege by simply inserting this element into their requests.

We deploy our software into a dangerous world, at some point it will come under attack. There is no silver bullet that means security can be deemed as finished. You are involved in a constant battle with attackers but you can often gain great insight into your system and identify areas for improvement by trying to think the way they think.

Sunday, 6 October 2019

Getting to Production

If you're a server side developer the fruition of your efforts is achieved once your code gets to production. For a team to be effective and efficient it's important that this final stage in releasing is not a scary or frightening proposition. Delivering value into production is the whole reason for your team to exist so it should be a natural and unhindered consequence of your efforts.

Achieving this productive release chain requires a deployment strategy that inspires confidence by being slick, repeatable and with a get out of jail solution for when things don't work out quite as expected.

There is no one correct strategy, this will depend on the makeup of your team, the code you are writing and the nature of your production environment. Each possibility has it's advantages and disadvantages and it will be up to you to decide what works best for your situation.

Redeploy

Perhaps the simplest strategy involves simply redeploying code to your existing servers. This will usually involve momentarily blocking incoming traffic whilst the new code is deployed and verified.

The advantage of this strategy is it doesn't involve standing up any additional infrastructure to support the release and is a straight forward and simple process to follow. However it does have disadvantages.

Firstly it involves service downtime, you are not able to service requests whilst the release is in process. Additionally since the release must be tested after deployment, whilst traffic is being blocked, it can mean this testing is conducted under pressure that is not necessarily conducive to thoroughness.

The other major disadvantage is the inability to roll back the release should a problem develop. Either you must fix forward by deploying updated code or by redeploying the previous release. Both of these mean repeating the original release process by blocking traffic, deploying and testing.

Blue Green

Using this strategy you maintain two identical production stacks, one blue and one green. At any point in time one is your active stack serving production traffic and one is inactive and only serving internal traffic, or stood down entirely. The release process involves deploying to the inactive stack, conducting appropriate testing and then switching production traffic to be served from the newly deployed code.

The advantage of this strategy is it involves virtually no down time and allows for thorough testing to be completed, on the infrastructure that will serve the traffic, prior to the release without causing impact to users. It also has the substantial benefit that should the worst happen and you need to roll back the release this is simply achieved by switching traffic back to the previously active stack, which has not been modified since it was last serving production traffic.

The main disadvantage of this approach is in the cost of maintaining two production stacks. However taking advantage of modern cloud techniques means your inactive stack only needs to be up and running in the build up to a release reducing the addtional cost.

A second disadvantage can be seen if your application is required to maintain a large amount of state, the transition of this data between the two stacks needs to be managed in order to avoid disruption to users.

Canary

Canary deployments can be viewed as a variant of the Blue Green approach. Whereas a Blue Green deployment moves all production traffic onto the new code all at once a Canary deployment does this in a more gradual phased manner.

The exact nature of the switch will depend on your technology stack. When deploying directly to servers it will likely take the form of applying a gradually increasing traffic waiting between the old and new stacks. As traffic is moved onto the new code error rates and diagnostics can be monitored for issues and the weighting reversed if necessary.

This approach is particularly well suited to a containerised deployment where the population of containers can gradually be migrated to the new code via the normal mechanism of adding and removing containers from the environment.

The advantage of this approach is that it makes it far easier to trial new approaches and implementations without having to commit to routing all production traffic to the new code. Sophisticated techniques can be applied to the change of traffic weighting to route particular use cases or users to the new code whilst allowing the majority of traffic to be served from the tried and tested deployment.

However there can be disadvantages, the deployment of new code, and any subsequent roll back, are naturally slower although this can depend on your technology stack. Depending on the nature of your codebase and functionality having multiple active versions in production can also come with its own problems in terms of backwards compatibility and state management.

As with most decisions related to software engineering whether or not any particular solution is right or wrong can be a grey area. The nature of your environment and code will influence the choice of approach, the important thing to consider is the properties that any solution should offer. These are things like the ability to quickly roll back to a working state, the ability to effectively monitor the impact of a release along with factors such as speed and repeatability.

Keep these factors in mind when choosing your solution and you will enjoy many happy years of deploying to production.

Monday, 2 September 2019

What They Don't Teach You

Although it's possible to enter the world of software engineering from many different backgrounds it is still the case that many engineers will have studied some related technological discipline.

Armed with this hard earned knowledge they enter the world of professional development assured that they can hit ground running. Whilst this maybe true on a purely technical level there are many aspects of being a professional developer that unfortunately isn't taught at universities or colleges.

I should at this juncture admit that these views are mainly based on my experience of university compared to the world of work, which is admittedly now some time ago. However I think it is still the case that sometimes the curriculum can focus on the theoretical over the practical which isn't always to the benefit of the industry.

Source Control

Aside from their IDE of choice and the ticket management system their team chooses to employ the other item developers will interact with on a daily basis is source control. In this regard the proper use of source control is crucial to the effectiveness of a team.

Understanding the different approaches to branching strategies, having an appreciation of more advanced features, along with the basic etiquette of source control are all skills that will help new team members integrate into a team quickly and smoothly.

Although some courses may attempt to cover certain aspects of source control and may explain some of the available tooling options, a lack of practical experience in working on a code base within a team can hinder students in gaining an appreciation for why source control is so important.

Continuous Integration

In a similar vein a lack of experience in working on a code base with a group of collaborators can also hinder an appreciation for the importance of Continuous Integration (CI).

The reason for the existence of CI is to solve the problem of integration hell that arises when code from multiple developers needs to be combined into a single build for release. CI has been around for long enough for even many experienced developers to not remember the troublesome days before "the build box". But what still remains prevalent is the need for stability in main line code and the shift left mentality designed to protect it.

A lack of understanding of the importance of CI can also be caused by a lack of exposure to the need to release. Obviously students are well aware of the importance of deadlines but nothing quite compares to the pressure to release combined with the scrutiny of a real user base to sharpen the mind and develop strategies to avoid mistakes.

Legacy Code

The opportunities for developers to work on truly greenfield projects are very often few and far between. Usually in the majority of codebases or systems there are areas of legacy code that may be sub-optimal in certain aspects but that are so crucial to the correct operation of the software that everyone must tread careful when making changes.

This is not to say that legacy code should never be dealt with but the strategies employed for doing that are often very different to the rip it up and start again philosophy that may be employed in other areas of code.

Learning to work effectively with code you didn't write is a fact of life for all developers. Interpreting code written by others along with writing your own code to be understandable by some future reader is something all developers have to learn.

The points made here shouldn't be interpreted as criticism for those that have a lack of experience. The majority of valuable lessons all developers need to learn are born from the experience of making mistakes, the scar tissue that these mistakes develop are almost a rite of passage for many engineers. You're nobody until you've broken the build or deployed a bug to production.

In this sense it may seem unfair to blame academic institutions for not being able to produce experienced engineers. Whilst its true you can't teach experience you can foster an appreciation for the wider world that awaits. Few people study technological subjects purely as an academic exercise, the majority are doing so with an eye to becoming a professional in the industry, in this regard including some of the more vocational skills would benefit all concerned.

Monday, 12 August 2019

Concurrency Conundrums

There are certain topics that all software engineers regardless of the technology stack they work in or the programming language they use will bump into. One of those topics is asynchronous programming.

Asynchronous programming is where a unit of work is completed independently from the code that invoked it. A large number of languages have support for writing asynchronous code, mostly this is in a concurrent style where multiple units of work are allowed to progress simultaneously without actually executing in parallel.

Unfortunately just because an engineering topic is common place doesn't mean it is well understood, this is true for asynchronous programming. This is less to do with an understanding of its implementation and more to do with an appreciation for why and when to use asynchronous programming in your code base.

Asynchronous Programming vs Multi-threading

The most common misconception about asynchronous programming is that it makes you code faster or more performant, introducing asynchronous techniques will not inherently improve the performance of you code. Another common misconception is that asynchronous programming is all about introducing multiple threads of execution.

Both of these misconceptions are understandable given that well implemented asynchronous code can appear to be running faster and on occasion can employ multiple threads but neither of these things are intrinsic to the purpose of asynchronous programming.

Asynchronous programming is actually about making the most of available resources by increasing throughput. It attempts to ensure that the maximum number of work items are allowed to progress alongside each other. Each individual work item might not complete any quicker from start to finish but the total amount of work completed in a given time frame is increased.

For a real world example imagine a takeaway kitchen on a busy Saturday night. If each member of staff takes an order, prepares it, waits for it to cook and then delivers it then the takeaway is operating in a synchronous manner, the number of orders it can cope with at any one time is dictated purely by the number of workers.

An asynchronous approach looks to reduce the amount of idle time each worker has while dealing with an individual order. While a worker is waiting for an order to cook they start preparing the next, this makes better use of the available resource and enables the takeaway to deal with more orders simultaneously without introducing anymore resource.

To complete the analogy multi-threading would be the equivalent of introducing more workers into the kitchen. To begin with this may increase the number of orders leaving the kitchen but eventually workers will start to get in each others way and overall efficiency will fall.

Bounding

In general there are two main reasons for the completion of a task to be held up. A task is CPU bound if it would be completed faster by more compute resource, a task is IO bound if its completion is reliant on the transfer of data via IO operations.

When dealing with CPU bound tasks the previously explained misconceptions about asynchronous programming and multi-threading can cause these techniques to be applied in an inappropriate manner. We have a thread that is blocked by a CPU bound task, we therefore create a new thread to handle the CPU bound task allowing the first thread to be unblocked.

However there are two issues with this approach, firstly the creation of threads is not free it involves the allocation of memory and other resources with only a finite number of them being available. Secondly, we have still ended up in a position where a thread is being blocked by the CPU bound task. The now unblocked thread is presumably still waiting for the result of the CPU bound task so this is the anthesis of the asynchronous approach by using an increased level of resource in an inefficient way.

Asynchronous programming is tailor made for dealing with IO bound tasks by allowing valuable resource to not be held waiting for a relatively long task, compared to CPU execution speed, to complete. Again creating more threads to deal with this won't get away from the fact that the task itself is bound by constraints.

There is however an exception to this rule, in GUI applications that have the concept of a UI thread moving any bounded task onto another thread is a practical step to take. In this situation the UI thread is a special resource that must be protected since it is the only resource that can update the user interface.

Deadlocks

In the opening paragraph of this post we said that the majority of software engineers will have encountered asynchronous programming, alongside this I am also willing to bet that most of these engineers will have accidentally implemented a deadlock when they were learning these techniques.

A deadlock occurs when multiple processes are all waiting on each other to complete in a circular manner meaning no task can progress and all are permanently blocked. For a deadlock to occur a few conditions are necessary.

There must be at least one resource that operates mutual exclusion meaning only one process can interact act with it and it is non-shareable. A process is currently holding the lock on this resource whilst waiting for another process to complete when the second process needs the locked resource, this circular wait is the essence of a deadlock.

There are many problems in software engineering where the first line of defence against them is to be aware that they exist. The second defence lies in using our experience of encountering them to develop a smell for when they might be about to present themselves. While it may be possible to come up with large numbers of code examples for these kinds of problems ultimately it is these slightly more intangible defences that will protect us.

Similarly there are certain intrinsic programming concepts, such as asynchronous programming, that are difficult to master. The issue isn't that engineers fail to become experts its that they fail to appreciate the danger they can represent. A good engineer is very often a humble one who understands the limits of his or her knowledge whilst having an appreciation for what lies beyond.

These engineers will tread carefully around thorny topics and take care to look for the potential problems they know can exist. Treating the complex with distain is usually the first step towards making the most common of mistakes.

Sunday, 28 July 2019

Startup Strategies

Although the concept of a startup is not unique to the technology sector it does hold a special place in its hierarchy of organisations. Many IT professionals will have worked for or with a startup at some point during their career.

The journey for a startup to become a successful organisation, or unfortunately the potentially more likely outcome of failure, can take different paths depending on the startup itself and the market it is trying to operate in.

If I was able to offer foolproof strategies to deliver success for startups then I probably wouldn't be writing this blog, what is presented below are simply thoughts on the patterns that tend to manifest themselves as startups try to grow.

Get Big Quick

Commonly startups will attempt to execute a land grab, trying to gain substantial market share in a short time period. The need for this approach is generally indicative of a startup operating in a new market with a low barrier to entry.

In this situation the startup needs to gain market share quickly before competitors enter the market. Generally this will require large amounts of capital as the startup is attempting to grow faster than profitability levels would normally support.

Returns are based on the future potential of the market as the products or services being offered become commonly consumed. At this stage the startup can convert its dominant market position into profitability. Selling this future vision of where the market will eventually end up is often the key to securing funding.

The risk of this strategy comes from the large initial capital outlay being ultimately for nothing if the market doesn't develop as predicated or competitors become more effective at gaining initial market share.

Slow Burn

The opposite approach of slow and steady growth is usually a result of a startup operating within a well established market that already has many competitors. In this situation although clearly the startup will still require funding, simply trying to gain growth via large capital investment is unlikely to succeed since existing competitors already have established market share.

The fact that the market already has many established competitors means the success of the startup is dependent on exploiting a competitive advantage. This might be based in some intellectual property that the startup has developed or by attempting to implement a disruptive strategy to change the direction of the market.

If this competitive advantage can be exploited then success will come from the steady accumulation of market share or by selling access to the startups intellectual property to competitors. The risk obviously comes from the inability to find the competitive advantage or for it to be based on technology that is easily copied or replicated by competitors.

Whereas startups operating a Get Big Fast strategy that fails may tend to go out in a blaze of glory, startups operating a Slow Burn strategy will often be subject to a slow and painful death.

Make Noise

Sometimes a startup will be based around the development of a significant new technology that has potential with a given market place. However to be able to exploit this potential advantage the startup would require large amount of capital or existing influence within the market place.

In these situations it is often unrealistic to formulate a strategy for the startup to grow to a point where this is a realistic possibility. Therefore the exit strategy of these startups is not for the startup itself to exploit the new technology, it is instead to make a lot of noise in the market place in the hope that a larger more well established organisation decides to acquire the technology.

Organisations can decide to take this action either because the technology is truly transformational or because it would require a large effort to reproduce it that would risk others gaining the advantage first.

As previously stated this is not a guide to startup success it is simply a description of the common exit strategies that can be seen when observing startups. The choice of strategy is usually influenced by the size and maturity of the market the startup is operating in along with the nature of the startups perceived technological advantage. Choosing with strategy to employ and successfully executing it is essentially the key to success and is a non-trivial problem.

Sunday, 21 July 2019

Messaging Pitfalls

A natural consequence of the adoption of a micro-services architecture was the rise in the use of messaging infrastructure, sometimes also referred to as an event bus. The principle reason for this is to avoid the temporal coupling that can occur when micro-services are chained together. This coupling can reduce the availability of functionality due to direct and potentially brittle coupling between each service.

A messaging approach allows for services to be de-coupled and more independent, this can lead to an increase in the functionality that can be attached to certain events happening within your system without this causing more and more complication in your codebase.

Unfortunately as with any architectural pattern it isn't all good news, no matter what approach you take to solving a problem its possible for anti-patterns to develop. This article is by no means an argument against messaging patterns, it is simply a guide to the potential anti-patterns that can weaken, or even eradicate, the positive impact this pattern can have on your software.

Dependencies and Leaky Events

Events should not have dependencies between each other, it should be possible to process any individual event in its entirety without having to wait for any other event to be dispatched. This will often require careful consideration to strike a balance between a large event containing a lot of data and one that is too small to be of any practical use.

Events should also encapsulate a clearly defined business process and be easily understandable simply from its name. This also means they should not leak the implementation detail of the system that produced the event.

An example of this kind of leaky event can be seen when they represent CRUD operations being performed on a database or some other storage mechanism. For example an Order Updated event lacks clarity and is simply exposing an operation performed on an underlying data structure. Instead events like Order Placed, Order Dispatched or Order Cancelled have a much clearer business context and are not tied to a data storage mechanism.

Implementing Sequences

Most messaging system are asynchronous in nature and therefore guaranteeing the order of delivery and processing can be an expensive and complex process.

Many processes often do represent a series of events so it isn't always easy to avoid these sequencing problems but in these situations the consequence of event processing happening out of order needs to be considered and every effort taken to avoid these situations.

Returning to our previous example tying payment and dispatch to an order updated event opens up the possibility of a problem if an order is dispatched before payment has been successfully taken. However triggering the shipping of an order from a Payment Received event avoids this situation.

As with our previous point accurately modelling our events on the business domain, and not its implementation, leads to an easier to understand and more robust system.

Commands in Disguise

In a micro-services architecture services will often be required to interact with each other. Traditionally this will be via a REST interface but once a messaging infrastructure is introduced it can be tempting to implement similar mechanisms using events.

When a service publishes an event it should be to impart the information of a business process having taken place to the rest of the system. The service should have no knowledge or requirements around who will consume the event and what action they will take.

An indicator that this approach is being broken is when you can identify services that publish an event and subsequently wait to receive an event in return. When this happens the services involved have lost independency and have become coupled. This isn't to say that the dependency is wrong or should be avoided but it should be made more explicit and clear via the use of REST.

This also prevents another source of leaky events where a service is forced to publish many events relating to error scenarios because other services are waiting on the outcome.

Messaging patterns can enrich micro-services in a way that makes them undeniably part of an effective micro-services implementation. But they aren't a suitable way to interconnect services in all situations, when used in these situations they can actively degrade the architecture of the system. When working in a micro-services architecture develop a play book for when to use REST and when to use events, take time to carefully construct events around business processes making every effort to avoid complex scenarios around there processing.

Sunday, 7 July 2019

Data Driven Success

Over the last couple of decades the majority of businesses have realised that the data they accumulate is one of their most valuable assets. This isn't necessarily because it has inherent value as a commodity, but because of the insights it can provide to drive efficiency, innovation and effectiveness.

However none of this is possible if you don't understand how data is accumulated, how it can be analysed and how it can be used. On the surface sometimes these aspects can appear trivial, and for the more simple applications of data maybe they are. But unlocking the higher orders of the potential of data requires a more thought through and scientific approach.

Very few, if anyone, can describe themselves as having mastered the use of data. As tools and techniques evolve new potential is unlocked. What is more important is an appreciation of the traps that lie in wait for those that don't understand the nature of data.

Big Data

The concept of big data is now so prevalent that it is almost taken as a given when discussing data related topics. I think the fact that the term makes reference to the amount of data has always been problematic. Big data is about more than simply the amount of data that has been acquired, it is about volume, variety and velocity.

Of course volume plays a part, never has our capability to store and process large amounts of data been higher. As the amount of data you collect rises the potential for insight grows.

Equally important is variety. This includes variety of sources, domains and context, if you only ever collect one particular dimension of data from one particular source the insights you can gain will be limited.

Finally velocity, insights that only become visible after long periods of data collection are likely to be harder to action effectively. Being able to accumulate data at a fast rate opens up the possibility to put its learnings to use before others can gain the same understanding.

Lake vs Warehouse

At a high level data is stored in either a structured or unstructured way. Generally after it is collected it is in an unstructured form, at this stage it is a raw material requiring refinement. In order to be used to drive insight it must transition into a structured form to support querying, exporting and socialisation.

We have come to refer to these two forms of storage as data lakes (unstructured) and data warehouses (structured). The issue with moving between structured and unstructured forms is it involves bias. This is driven from the current understanding of the nature of the data, the nature of the problem that is trying to be solved and second guessing what the answers might be.

To a large extent this bias is unavoidable. What must be minimised, and ideally eliminated, is the loss of raw data during the refinement process. As your understanding of the data, the collection process and the insights grows you will often realise your initial bias. If your raw material is lost then so is your opportunity to resolve it.

When data transitions too warehouses it should remain in the lake ready for the day when you discovery better and more insightful ways to process it.

Machine Learning

A practical example of what is possible with data is machine learning. By using this technique artificial intelligence can be created that can utilise otherwise hidden patterns in data.

Machine learning involves a model being trained to make decisions based on the analysis of a large and potentially varied data set. Traditional rules based analysis, with its inherent bias towards what it expects the data to show, doesn't offer the flexibility required to simply go where the data points. Once this training has taken place the system can be given new data as input, and based on the learnings from previous data, make judgements on the meaning behind this new data.

Machine learning can look miraculous as it appears to be based on values we don't traditionally associate with software such as reason, judgement and intuition. But actually it is simply a demonstration of the power of data and what can happen when its analysis is approached with an open mind.

Technology can sometimes be prone to fads or fashions but the appreciation of data and its uses isn't one of them. The amount and variety of data we can collect and process is only going to increase, as will our ability to derive value and insight from it. Technologies like machine learning are a demonstration of this trend, now that we've discovered the genie it isn't going back in the bottle.

Monday, 1 July 2019

Micro Anti Patterns

The premise behind micro-services is a tried and tested one, divide and conquer via the strong application of single responsibility. Joining together elements that do one thing and do it will leads to a robust and flexible architecture.

Something however that can complicate the adoption of a micro-services architecture is the slightly abstract definition of when a service is a micro-service. One of the reasons for this is that it can depend on many factors, the context of your problem domain, the nature of your existing codebase and your required performance.

Rather than trying to address this thorny problem this article instead takes the opposite approach by describing some of the pitfalls that you can encounter when implementing micro-services.

Temporal Coupling

The glue most commonly used to bind micro-services is REST. In many circumstances this is the correct choice but not always. Equally common when this pattern is applied is the formation of chains of API calls. Service A calls Service B that then needs to call Service C and D in order to fulfil the original request.

This creates temporal coupling between all the services. Service A must wait for all services to return, a failure from any service results in a failed user operation from Service A. This leaves the system intolerant to even minor interruptions in the availability of downstream services.

The answer to this can be too decouple services via the introduction of async messaging or eventing. This potentially not only allows Service A to move on with the user journey whilst backend processes continue to run but also leaves it unaffected by temporary changes in the availability of any of the downstream services.

Implementing eventing can come with its own pitfalls and this shouldn't be used to replace all REST calls, but equally it should be a weapon in the armoury that is deployed when temporal coupling begins to present itself.

Unbounded Context

Data is the lifeblood of any system and therefore many micro-services will be concerned with the querying and manipulation of data. Given this fact it is important to consider how your domain will map to your micro-services and how you will ensure clear ownership over different aspects of it.

The goal of this is to ensure that services don't become coupled by the domain such that the complexity of the model is kept to a minimum and the implementation of new features doesn't involve widespread change.

Imagine in a retail based system many micro-services will work with the notion of a product but the view of a product model is likely to be very different between the micro-service that supports the searching of a product catalogue and one that is responsible for shipping it once an order has been placed.

By using bounded contexts, a core tenant of domain driven design, it is possible to support these different views whilst allowing the micro-services involved to remain decoupled, it simply requires the domain to be thought through and divided up along these lines.

Distributed Monolith

A common starting point for wishing to implement a micro-services pattern is a monolithic application. There is actually a school of thought that this is a good starting point because you have a good working knowledge of your problems domain that can aid the segregation of your existing codebase.

However when you do this care must be take such that you don't end up with the same monolithic application simply distributed across a network of micro-services. This can happen for many reasons, but can be detected by changes and enhancements continually requiring many micro-services to all be changed and re-deployed.

This may because of attempts at over sharing code, not having clear functional boundaries between micro-services or an ineffective deployment strategy. A strategy to deal with this situation is the Common Closure Principle (CCP), simply put things that tend to change together should be packaged together.

Looking at how a codebase reacts to and deals with change is usually a very good indicator as to its health. A well defined but flexible architecture will smooth the wheels of change by narrowing its impact. This increases the likelihood that automation can be relied upon to both test and deploy change and also makes it much easier to roll back the change should the unexpected arise.

As previously stated defining exactly what constitutes a micro-service is not as easy as one might think. However an easier prospect is to define what it isn't, this can aid iterations of trying to successfully segregate a system into micro-services by providing reference points to indicate when the architecture is heading in the wrong direction. Recognising these signs will allow you to analyse where the false step came and re-evaluate the decision to try and find a better way.

It is likely there are many right answers alongside the equally lengthy list of wrong ones.

Sunday, 16 June 2019

UI Do Declare

The applicability of software development as an engineering discipline is often debated. Certain areas of coding do exhibit the necessary scientific and structured approach but other areas veer more towards an art form.

User Interface (UI) development definitely falls into the second camp. Not just because it is concerned with the production of attractive and pleasing UI for users to interact with, but as any developer will tell you, UI development often consists of more of a hit and miss or trial and error approach then other more precise areas of coding.

But is this inherent to UI development or have we simply not found the correct tools to be to move this area of coding closer to being true engineering? The attempt to try and find an answer to that question is behind recent trends to try and move to a more declarative approach to UI coding, in this post we will look at what under pins this approach and it's possible advantages.

Declarative vs Imperative

Most software platforms have traditionally chosen an imperative programming model to implement UI. This often involves each screen or page that the user sees being defined some using kind of layout construct formed of a mark-up language.

This means the programmer, before the code executes, fully defines the UI that should be rendered including styling and positioning etc, this is then mutated at run time to include the necessary data and except input from the user.

A declarative approach attempts to defer many of these decisions to runtime rather than having all aspects of the UI being defined ahead of time. Although not exclusively part of the declarative approach, moving to more of a runtime model means that UI can be defined using the same coding primitives and technologies. Rather than having a mark-up language specifically for UI it is defined in the same language as the rest of the application.

Separation of Concerns

Many aspects of good software engineering practices are related to the separation of concerns, as relayed by the Single Responsibility Principle (SRP), a piece of code should do one thing and do it well.

What underlies the adoption of a declarative approach for UI is the desire to separate the two concerns of UI presentation. That is, one area of code should be in charge of what needs to be rendered i.e. we need to except text from the user here and a click here, whilst a separate area of code decides on visual style, positioning and layout.

This allows business logic and data context to influence the structure of the UI without it having to become melded with code concerned with colours, fonts or padding. It also allows a different visual presentation to be applied to similar UI constructs whilst reducing duplication.

Once again isolating and separating different areas of code which are concerned with different aspects of its execution has lead to ongoing benefits when it comes to dealing with future change.

Just Another Area of Code

As already stated, defining UI in code is not an intrinsic requirement of a declarative approach but it does open up the possibility. Aside from the architectural benefits we've already stated this means the code generating the UI can be crafted using the same tools as the logic that drives it.

This opens up the possibility to write effective unit tests that verify the construction of an applications UI in a way that just isn't possible when defining UI using mark-up, the simple nature of these tests also allows them to part of a shift left strategy ensuring potential visual bugs are blocked before they become a problem.

Even when bugs do make it into an application, having the ability to debug code concerned with UI construction using breakpoints or any other standard mechanism is enormously valuable and will reduce the amount of time needed to find and fix defects.

Software engineering has always been a mixture of different paradigms and approaches, some come and go and are at different times fashionable. A declarative approach to UI won't fit all situations and applications but its potential benefits are undeniable. For this reason it should form part of the tool box of any engineer. Gain an understanding of its principles and practices, experiment with its implementation and recognise opportunities for it to be used in your application.

Sunday, 9 June 2019

RESTful Maturity

Whenever we need to allow different deployments of software to talk to each other the glue that we fall back on is APIs. Many options exist for their form and structure but REST is now so prevalent that it is the only approach many software engineers know or have been exposed to.

The flexibility offered by a RESTful approach means that whether or not an API is RESTful is more of a spectrum than a binary option. To deal with this Leonard Richardson developed the Richardson Maturity Model, this model attempts to categories APIs based on four levels of maturity.

These levels range from an API being RESTful in name only to the full realisation of what a RESTful approach can offer. If you are a developer or a consumer of REST APIs where do you think you sit on the scale?

Level 0

APIs at this level are categorised by using a single URI and a single HTTP Verb (usually POST). These APIs are simply using HTTP as a transport mechanism for interacting with code remotely.

This level is sometimes referred to as The Swamp of POX (Plain Old XML) since SOAP based Remote Procedure Calls (RPC) using XML occupy Level 0.

There is a lack of structure at this level with APIs simply being a collection of operations.

Level 1

The first step along the road to becoming RESTful is to introduce the concept of Resources to model the business domain. This has the effect of splitting functionality across multiple URIs although generally still using a single HTTP Verb (again usually POST).

If we imagine a system of APIs for interacting with a library a Level 0 API would use the same URI for interacting with authors or books, a Level 1 API would have a URI for interacting with authors and a separate URI for interacting with books.

In this way authors and books become resources in the system, when you receive data about an author you can also retrieve that authors books.

Level 2

Now that we have the concept of Resources Level 2 APIs introduce the concept that they can be manipulated. They do this by using HTTP Verbs to from a CRUD interface for interaction:

Create: POST, Retrieve: GET, Update: PUT, Delete: DELETE

Generally these APIs will also start to make use of HTTP Response codes to provide feedback on the manipulation:

201: Created, 404: Not Found 202: Accepted

At this level interactions with the API become expressive, simply by looking at a clients interaction with the API, even if you weren't previously aware of its structure, inferences about the nature of the interaction can start to be made.

Level 3

Level 3 is often categorised by the term Hypertext As The Engine Of Application State (HATEOAS). Up until this point in order to make full use of the API you had to be aware of all the available APIs and the operations that can be performed.

Level 3 APIs speed up this discovery process by allowing resources to describe the operations that can be performed on them by providing information on the APIs that should be invoked.

To return to library example when a book is queried the data returned would also express that a book can be checked out along with the API that should be called to do so, it would also refer to the API to retrieve more books by that author or possible even more books on that subject.

Developing REST APIs is a journey, as a domain and its resources grow and develop the APIs used to interact with them will move along this maturity model. It is by no means the case that because an API isn't at Level 3 that it is somehow bad, it simply means that its use up until now hasn't matured to a point where the opportunity to move up the levels has presented itself.

The main lesson to draw from this model is that for an API to evolve it needs to become descriptive and expressive. As APIs mature consumers can start making assumptions about how to perform operations because the expressiveness of the APIs allows them to learn more about the resources and operations that make up the domain being modelled.

This helps increase adoption and allows consumers to quickly develop more innovative ways to utilise them. There are various guides and pointers that can be used to ensure APIs are RESTful, these guides will help you move up the maturity model towards the nirvana of Level 3. Don't beat yourself up if you aren't there yet, instead relish the journey and understand the nature of the destination.

Monday, 27 May 2019

Effective Test Automation

As a DevOps mentality has taken hold in the industry teams have increasingly focused on test automation. This is mainly because an effective test automation strategy can be a driver for an increased release cadence by reducing the amount of manual effort involved in declaring code ready for production.

On the face of it this may seem like a straight forward endeavour. Write the tests, run the tests and release the software. However like many aspects of software engineering there is a subtly to its proper application and it is definitely the case that when badly implemented it can lend no value, and at worst can actively degrade quality.

Effective test automation cannot be defined or explained in a single blog post but presented below are a few things to look out for when judging the effectiveness of test automation in your code base.

Red Means Stop

The number one mistake that teams make when implementing automated testing is for there to be no consequences when tests fail. There are often various reasons given for this, there is a bug in that test, it's a timing issue, that wouldn't happen when the application runs for real.

In all those situations if the underlying issue can't be addressed then those tests shouldn't be run because they offer little value. The whole purpose of an automated test suite is to act as a traffic light for the code base, if we're all green we are good to go, if anything is red we need to stop and fix it. Having a situation where software is still released despite tests failing will create a culture of ignoring tests, this will either increase the amount of manual testing you need to do to get comfortable with the codebase or you will knowingly ship with defects.

Much more value can be derived from a smaller set of automated tests that are robust and meaningful than trying to create a larger set of tests that are fragile and where the results require significant interpretation.

Fix Early or Release Broken

Once you have a set of reliable tests you are prepared to put faith in the next most important factor is when you run them. The further right in the development timeline the tests are run the more pressure there will be to ship code regardless of the results.

Depending on the timespan of your project it is also likely the cost of fixing defects found further right in the process will be higher, this will likely be in spite of the fact that any fix may be more like a patch than a well engineered solution.

The further left the tests run, by which we mean the nearer to the time the developer first writes the code, the more time is available to find a fix and the cheaper the fix is likely to be. The closer the tests are run to the proposed release date the more it will become a box ticking exercise due to the significant pressure of continuing regardless of the results.

Variable Scope and Focus

The release of any piece of software is often focussed on particular areas of functionality, this naturally means the potential for bugs or issues is higher in these areas. Whilst there are likely to be core areas of functionality that always need to be verified we can maximise the effectiveness of an automated test suite by allowing it to adapt to the current focus of developers attention.

This shift in focus may be automatic based on analysis of developer commits or it may be via configuration or any such mechanism that allows manual changes in emphasis. Knowing that the available automation resources have been focused on the areas of code most likely to have regressed will go a long way to increasing confidence that these new or adapted features are ready to ship.

The building of an automated test suite is always done with the best of intentions but implementations often end up being sidelined and not given particular relevance in the release process. This usually comes from a view point that simply writing the tests is enough, this isn't the case. Tests that don't relay unequivocal information about the state of the code base, or that are run too late for any information to be effectively acted upon represented wasted effort.

To avoid this decide on your areas of nervousness when releasing and try and develop strategies for these concerns to be addressed via automation. Also treat this automation like any other code, expect it to need refactoring and developing as the code base moves on. Treat it like a living breathing area of code that is your ally in making the important decision over when something is ready to ship.

Sunday, 19 May 2019

Next Level Over Optimisation

Over optimisation is a common pitfall for software engineers. It is frequently categorised by a disproportionate amount of effort being spent on striving for performance levels that have yet to be proven necessary.

However a similar mindset, causing similar diminishing returns, can be seen in other areas of software engineering when too much emphasis is placed on one particular aspect of the code. Software engineers are by nature problem solvers and can often demonstrate compulsive tendencies, this combination can sometimes cause an obsession with providing the ultimate solution to a problem.

Which aspects of code, other than performance, can be over optimised? What are the down sides to the emphasis becoming obsessive.

Code Size

Engineers can full into a trap of trying to devise ways to provide the same solution with an ever decreasing amount of code. Code is not a business asset, our goal isn't to strive to increase the amount of it we have, it is to devise efficient solutions to problems using code as a tool. However measures of efficiency should also include maintainability and extensibility.

To this end there is a tipping point where decreasing the amount of code also decreases the readability. At the time this is first manifested this can seem a minor concern, you still understand how the code works, but maintainability and extensibility and long term goals they can't be derived in the moment.

The over emphasis on reducing code size can be driven from an assumption that it will drive performance and a desire to prove engineering skill by being able to do the same with less. A more pragmatic approach would recognise that the performance benefits, if they do actually exist, are likely to have a negligible impact on users. An engineers skill shouldn't necessarily be judged on any one piece of code, we must also factor in whether that code continues to offer value over a prolonged period of time, both in the form it was originally written and by being adapted and extended over time without being re-written.

Re-Use

You will often see an engineers face light up when they realise they can re-use some previously written code in a new situation. Re-usability is a fine trait for code to have, it increases the speed solutions can be delivered and as previously stated we aren't in the code generation business so opportunities to deliver using less of it should be embraced.

But, sometimes this drives code to be declared re-useable prematurely. When this turns out not to be the case it can lead to the code that is attempting to r-use it being unnaturally bent to fit in with the underlying code, or the interface to the re-useable code degenerating over time as it is mangled to suit slightly different use cases. The fear generated by these potential outcomes can also breed a reluctance the make changes to shared code because of the potential knock on effects.

This is not an argument against code re-use, it is something that should always be strived for. But it's important to develop a sense of when code is truly re-usable and when it would create a rigidness in the code base that will ultimately serve to degrade the intended benefits of re-use. One approach to dealing with this is to not be afraid of refactoring code once the re-usability of code has become apparent, conversely when re-use is starting to cause problems don't be afraid to recognise this fact and allow responsibility for the functionality to be handed back to the consumers.

Configurability

Related to an over emphasis on re-use can be a drive to make every aspect of an area of code configurable. In order to increase the flexibility of code, and therefore open up more opportunities for its re-use, we make every aspect of its functionality configurable.

As with most topics we have discussed, configurability is an admirable quality but if its overdone it can have a negative impact on readability and maintainability. Potential users of the code are presented with a sea of possible options that they may struggle to choose from or appreciate the subtleties of. When it comes to changing the code in question this can be hampered by having to maintain support for a large number of use cases and combinations.

This problem can be addressed by not forcing users to deal with the array of options being presented by offering sensible defaults for particular use cases. If an experienced engineer, who has a good understanding of the problem domain, chooses to dig deeper they can. But consumers who simply want an "out of the box" solution need not dig this deep.

Pragmatism and practicality are important qualities for engineers. Certain aspects of code, most notably the SOLID principles, are important enough to evangelise but good code is not defined by any one measure. Code quality is also not always possible to measure in the moment, decisions made at the time of writing can have unintended consequences further down the line.

It won't be possible to avoid this kind of over optimisation, the approach to a piece of code is based on understanding of the problem at the time of writing. As this knowledge grows mistakes will be surfaced as things go in a different direction to initially anticipated. Don't fear these situations, recognise that they exist, learn to recognise the signs, and develop strategies for finding reverse gear.