A Two Coffee Problem: January 2021

Sunday, 10 January 2021

Everything in the Repo

Interaction with source control is a daily task for most developers, the idea of not managing source code in this way would seem unthinkable. The advantages that effective source control can give have lead many to look to include more of the material and information required to write, deploy and run software to be part of the same standard development practices.

This idea has gone by many names, at WeaveWorks they have coined the term GitOps. Although in their description of the process they assume a container based deployment using Kubernetes, the principles they define for an effective GitOps strategy could be applied too many different deployment scenarios.

The Entire System Described In The Repository

No matter the nature of the software you are writing it will need to be built and deployed. To achieve this most teams will have defined CI/CD pipelines to build and deploy the code to various deployment environments.

A GitOps strategy ensures that these pipelines, and the infrastructure they serve, are declared alongside the source code. By cloning the repo you should have access to all the information required to understand the system.

The Canonical Desired System State Versioned in Git

Once your entire system is under source control then you have a single source of truth for its current state and also for any previous state in the past. Changes to CI\CD and infrastructure are tracked alongside the code of the application allowing you to move back and forth in time and maintain a working system.

The most obvious advantage this gives is in dealing with an unintended breaking change to the application related to CI\CD or infrastructure changes. Without these things being under source control you have to follow a painful process of trying to understand the changes that have been made and defining a plan for undoing these changes or trying to fix forward. A GitOps strategy reduced this task to something as simple as a Git Revert command or redeploying from a previous release branch.

Approved Changes That Can Be Automatically Applied To The System

When applying changes to an applications source code developers are used to going through a review process before changes are applied. This may involve a peer review by another developer and\or by following a shift left strategy it may involve a series of automated tests to ensure correctness.

By following a GitOps strategy these process can be applied to changes to CI\CD and infrastructure as well as code. As with any shift left strategy this reduces the chances of the team being impacted by changes that may inadvertently break pipelines, result in a non-working application after deployment, or unintentionally increase costs due to a misconfigured infrastructure change.

Software Agents to Ensure Correctness and Alert on Divergence

Your ability to follow this principle will vary based in your deployment model, but in essence by having source control be the source of truth for your system it enables software to automatically detect when this doesn't match the reality of your deployment and make the appropriate changes.

Not only do this mean you get to see your changes reflected in your environments at a faster pace it also decreases the time to recover from human error once the bad change set has been reversed.

When looking to apply these principles you will have to analyse how they can best be implemented for your application and the environments you deploy into. As with most philosophies there is no one size fits all approach, the degree to which you are applying these principles maybe an intangible measure rather than an absolute. But as always an appreciation for the benefits is the key, and using this to guide your approach and maximise your effectiveness.

Sunday, 3 January 2021

Cryptographic Basics

Cryptography while essential in modern software engineering is a complicated subject. While there is no need to gain an understanding of the complex mathematics that underly modern cryptographic techniques, a well rounded engineer should understand the available tools and the situations in which they should be used.

What is presented below is by no means an in depth examination of cryptography but is a primer into the topics that are likely to come up as you try to ensure your code base is well protected.

Encryption vs. Hashing

Encryption and hashing are probably the two primary applications of cryptography but the use case for each is different.

Encryption is a two-way i.e. reversible process. In order to protect data either at rest or in transit encryption can be applied such that only those that have the corresponding key can view the underlying data. Encryption is therefore used to protect data in situations where access to the data needs to be maintained but also protected from unauthorised disclosure.

Hashing is a one-way i.e. irreversible process. Taking data as an input a hashing algorithm produces a unique digest that cannot be used to get back to the original data source. Hashing is therefore used in situations where either the integrity of data needs to be verified or where the data being stored is very sensitive and therefore only a representation of the data should be stored rather than the data itself. A common example of the latter would be the storage of passwords.

Stream vs Block Ciphers

Encryption is implemented by the application of ciphers, algorithms that given an input (referred to as plain text) will output the same data in an encrypted form (referred to as cipher text).

These ciphers are often categorised based on how they view the input data.

Stream ciphers view the data as a constant stream of bits and bytes, they produce a corresponding stream of pseudo random data that is combined with the input data to produce the encrypted output. A block cipher divides the data up into fixed size blocks, using padding to ensure the overall size of the encrypted data is a whole number of these fixed sized blocks.

Stream ciphers have proven to be complicated to implement correctly mainly because of their reliance on the true randomness of the generated key stream. Because of this the most popular ciphers are mostly block ciphers such as the Advanced Encryption Standard (AES).

While block ciphers are now the most widely used attention also needs to be paid to the mode they are used in. The mode largely controls how the blocks are combined during the encryption process. When using Electronic Code Book (ECB) mode then each block is encrypted separately and are simply concatenated to form the encrypted output. While this may seem logical it leads to weaknesses, when separate blocks contain the same data they will lead to the same output which can present an advantage to a possible attacker. For this reason other modes such as Cipher Block Chaining (CBC) combine each block as the algorithm progresses to ensure even if blocks contain the same data they will produce different encrypted output.

Cryptographic Hashing

As we discussed earlier a hashing function is a one-way function that produces a unique digest of a message. Not all hashing algorithms are explicitly designed for cryptographic purposes.

A cryptographic hashing function should have the following properties:

It should be deterministic, meaning the same input message will always lead to the same digest.
It should be a fast operation to compute the digest of a message.
It should be computationally infeasible to generate a message that gives a specific digest.
It should be computationally infeasible to find two messages that produce the same digest.
A small change in the input message should produce a large change in the corresponding digest.

When an algorithm has these qualities it can be applied to provide digital signatures of Message Authentication Codes (MACs) to protect the integrity and authenticity of data either at rest or in transit.

We said earlier that there is no need to to understand the complex mathematics behind these cryptographic techniques, to take this a step further it's important that you don't attempt to understand or implement these techniques yourselves. The complexity involved means the likelihood of making a mistake in the implementation is high, this can lead to bugs that can be exploited by attackers to undermine the security you are trying to implement.

Instead you should view cryptography as a tool box providing implements you can use to protect you and your users, the important thing to learn is which tool should be used for which job and become and expert in its application.