A blog for musings about what good code looks like and the lives of the people who produce it.
Saturday, 14 September 2024
Terraforming Your World
Tuesday, 3 September 2024
Being at the Helm
Monday, 26 August 2024
Avoiding Toiling
Site Reliability Engineering (SRE) is the practice of applying software engineering principles to the management of infrastructure and operations.
Originating at Google in the early 2000s the sorts of things an SRE team might work on include system availability, latency and performance, efficiency, monitoring and the ability to deliver change.
Optimising these kinds of system aspects covers many different topics and areas, one of which is the management of toil.
Toil in this context is not work we don't particularly enjoy doing or don't find stimulating, it has a specific meaning defined by aspects other than our enjoyment of the tasks it involves.
What is Toil?
Toil is work that exhibits some or all of the following attributes.
It is Manual in nature, even if a human isn't necessarily doing the work it requires human initiation, monitoring or any other aspect that means a team member has to oversee its operation.
Toil is Repetitive, the times work has to be done may vary and may not necessarily be at regular intervals, but the task needs to be performed multiple times and will never necessarily be deemed finished.
It is Tactical meaning it is generally reactive, it has to be undertaken either in relation to something happening within the system for example when monitoring highlights something is failing or is sub-optimal.
It has No Enduring Value, this means it leaves the system in the same state as before the work happened. It hasn't improved any aspect of the system or eliminated the need for the work to happen again in the future.
It Scales with Service Growth. Some work items need to happen regardless of how much a system is used. This tends to be viewed as overhead and is simply the cost of having the system in the first place. Toil scales with system use meaning the more users you attract the greater the impact of the toil on your team.
Finally toil can be Automated, some tasks will always require human involvement, but for a task to be toil it must be possible for it to be automated.
What is Toils Impact?
It would be wrong to suggest that toil can be totally eliminated, having a production system being used by large numbers of people is always going to incur a certain amount of toil, and it is unlikely that the whole engineering effort of your organisation can be dedicated to removing it.
Also, much like technical debt, even if you do reach a point where you feel its eliminated the chances are a future change in the system will likely re-introduce it.
But also like technical debt the first step is to acknowledge toil exists, develop ways to be able to detect it and have a strategy for managing it and trying to keep it to a reasonable minimum.
Toils impact is that it engages your engineering resource on tasks that don't add to or improve your system. It may keep it up and running but that is a low ambition to have for any system
It's also important to recognise that large amounts of toil is likely to impact a teams morale, very few engineers will embark on their career looking to spend large amounts of time on repetitive tasks that lead to no overall value.
The Alternative to Toil
The alternative to spending time on toil is to spend time on engineering. Engineering is a broad concept but in this context it means work that improves the system itself or enables to to be managed in a more efficient way.
As we said previously completely eliminating toil is probably an unrealistic aim. But it is possible to measure how much time your team is spending on toil related tasks. Once you are able to estimate this then it is possible both to set a sensible limit on how much time is spent on these tasks but also measure the effectiveness of any engineering activities designed to reduce it.
This engineering activity might relate to software engineering, refactoring code for performance or reliability, automating testing or certain aspects of the build and deployment pipeline. It might also be more aimed at system engineering, analysing the correctness of the infrastructure the system is running on, analysing the nature of system failures or automating the management of infrastructure.
As previously stated we can view toil as a form of technical debt. In the early days of a system we may take certain shortcuts that at the time are manageable but as the system grows come with a bigger and bigger impact. Time spent trying to fix this debt will set you on a path for gradual system improvement, both for your users and the teams that work on the system.
Saturday, 13 July 2024
The Language of Love
For me that language is C and its successor C++.
Potentially my view is biased because of the things I've outlined above but I believe C is a very good language for all potential developers to start with. If you learn how to code close to the metal it will develop skills and a way of thinking that will be of benefit to you as you progress onto high level languages with greater levels of abstraction from how your code is actually running.
In The Beginning
In the late 1960s and early 1970s as the Unix operating system was being developed engineers realised that they needed a program language that could be used to write utilities and programs to run on the newly forming platform.
One of the initial protagonists in this field was Ken Thompson.
After dismissing existing programming languages such as Fortran he started to develop a variant of an existing language called BCLP. He concentrated on simplifying the language structures and making it less verbose. He called this new language B with the first version being released around 1969.
In 1971 Dennis Ritchie continued to develop B to utilise features of more modern computers as well as adding new data types. This culminated in the release of New B. Throughout 1972 the development continued adding more data types, arrays and pointers and the language was renamed C.
In 1973 Unix was re-written in C with even more data types being added as C continued to be developed through the 1970s. This eventually resulted in the release of what many consider the be the definitive book on the C programming language, written by Brian Kernighan and Dennis Ritchie The C Programming Language became known as K&R C and became the unofficial specification for the language.
C has continued to be under active development right up until the present with C23 expected to be released in 2024.
C with Classes
In 1979 Bjarne Stroustrup began work on what he deemed "C with Classes".
Adding classes to C turned into an object oriented language, where C had found a home in embedded programming running close to the metal, adding classes made it more suitable for large scale software development.
In 1982 Stroustrup began work on C++ adding new features such as inheritance, polymorphism, virtual functions and operator overloading. In 1985 he released the book The C++ Programming Language which become the unofficial specification for the language with the first commercial version being released later that year.
Much like C, C++ has continued to be developed with new versions being released up until the present day.
Usage Today
Software Engineering is often considered to be a fast moving enterprise, and while many other programming languages have been developed over the lifetime of C and C++ both are still very widely used.
Often being used when performance is critical, the fact they run close to the metal allows for highly optimised code for use cases such as gaming, network appliances and operating systems.
Usage of C and C++ can often strike fear into the heart of developers who aren't experienced in their use. However the skills that usage of C and C++ can develop will prove invaluable even when working with higher level languages so I would encourage all software engineers to spend some time expose themselves to the languages.
Good engineers can apply their skills using any programming language, the principles and practices of good software development don't vary that much between languages or paradigms. But often there are better choices of language for certain situations, and C and C++ are still the correct choice for many applications.
Friday, 28 June 2024
Vulnerable From All Sides
Saturday, 15 June 2024
The World Wide Internet
Sunday, 9 June 2024
What's In a Name
Saturday, 13 April 2024
The Web of Trust
- A user types a web address into the browser or clicks a link provided by a search engine.
- The user's browser issues a request to the web site to establish a secure connection.
- The server in response sends the browser it's certificate.
- The browser validates the certificate authenticity by verifying the signature of the Root CA that the certificate is issued from using the public key of the CA that has been pre-installed on the users machine.
- Once the certificate is validated, the browser creates a symmetric encryption key that will be used to secure future communication between the browser and the web site. It encrypts the symmetric key using the servers public key and sends it to the server.
- The users browser has now established the identity of the web site, based on the data contained in its validated certificate, and both parties now have a shared symmetric key that can be used to secure the rest of their communication in the session.
Monday, 1 April 2024
Imagining the Worst
In the modern technological landscape the list of possible security threats can seem endless. The breadth of potential attackers and potential vectors for their attacks has never been so large, does this mean we are all just helpless waiting for an attack and the terrible consequences to befall us?
One way to be proactive in the face of these dangers is to try and anticipate what form these treats might take, what damage they could do and what countermeasures it might be possible to take.
Threat modelling is a technique for enumerating the threats a system might face, identifying whether or not safeguards might exist and analysing the consequences of these attacks succeeding.
To help developers and engineers with the threat modelling process Microsoft developed the STRIDE mnemonic in 1999 to serve as a checklist of things for teams to consider when analysing the potential impact of threats to their system.
STRIDE
The STRIDE mnemonic attempts to categorise potential threats in terms of the impact they may have, this allows teams to analyse if any part of a system may be susceptible, and if so how this might be mitigated.
Spoofing is the process of falsely identifying yourself within a system. This might be by using stolen user credentials, leaked access tokens or cookies and any other form of session hijacking.
Tampering involves the malicious manipulation of data either at rest, for example altering data within a database, or while in transit, for example by acting as a main in the middle.
Repudiation relates to an attacker being able to cover their tracks by exploiting any lack of logging or ability to trace actions within a system, this might also include an attacker having the ability to falsify an audit trail to hide malicious activity.
Information Disclosure occurs when information is available to users who shouldn't be able to view it. This might cover a system returning database records a user has no entitlement to view, or the ability of an attacker to intercept data in transit, again for example by acting as a man in the middle.
Denial of Service is any attack that denies users the ability to legitimately use a system, of which the most common form of attack is to overwhelm a system with requests or otherwise cause the system to become unresponsive or unusable.
Elevation of Privilege occurs when an attacker is able to elevate their permissions within a system under attack, normally this would mean obtaining administrator privileges or otherwise penetrating a network sufficiently to be trusted more than a normal external user.
Threat Analysis
Many tools and processes exist for implementing threat modelling, but most will revolve around a team of system experts brainstorming potential threats that a system or sub-system might be susceptible too.
This involves using analysis helpers such as STRIDE to put yourself in the mindset of an attacker. For example you may asses if an authentication system could be exploited via spoofing. The answer might be no because of certain mitigations, or yes because of certain flaws.
When applying this style of analysis to all the aspects of STRIDE it is unlikely that you will find the system is completely protected against all possible attacks. Instead you're a looking to demonstrate that it is adequately protected given the likelihood of an attack being successful and the benefit that would be gained by an attacker if they were successful.
Security is not a design activity that is ever truly complete and instead will be something that evolves over time. You can either choose to learn by mistakes when attacker are successful or you can attempt to pro-actively preempt this by performing some self critical internal analysis to ensure security levels are the highest they can be.
Sunday, 24 March 2024
Having a Distrustful Mindset
I've talked previously about the concept of zero trust as it relates to security. When trying to apply an over arching philosophy like this it can feel quite daunting as to where to start.
Rather than having to create a comprehensive design in how to re-engineer your existing applications and network it can help to start with a shift in mindset in how you view the security properties of your system.
Security is an ever evolving discipline with many different facets so I wouldn't want to give the impression that the items I mention below are the only things you need to think about, but I think they do help foster the kind view that you need to have a healthy distrust of the world around you.
Authenticate
Authentication is the process of identifying the actors within a system. Traditionally this means authenticating users via them supplying a username/password or other such shared secret.
But this can also be extended to cover many other links in the chain.
This might include identifying the client or application the user is using to make the requests, the network the requests are coming from or even the physical device that is being used to communicate.
It's possible for all of these aspects of the flow of data to be used to indicate that something malicious maybe happening, and therefore properly identifying all these elements will enable you to asses if you can trust them.
Properly identifying all elements also enables comprehensive logging of any actions performed.
Authorise
Once you've identified these elements you can move onto to authorisation. This is the process of deciding if the requested operation should be allowed. Normally this would mean is this user allowed to view this data or perform this action.
But again the concept can be extended to cover more aspects of the system. Should this client or application be allowed to perform this action? Should these kinds of requests be allowed from this area of the network? Should this physical device be used to perform this action?
Going beyond authorising just the user again increases your ability to both detect malicious actions and prevent them.
Authorisation can also be linked to the method being used for authentication. Some actions a user may want to perform might be linked to needing a higher level of authentication. As an example some admin actions might require multi-factor authentication onto top of a username and password.
Encrypt
Pretty much all applications involve the movement and manipulation of data, and a large number of threats will relate to trying to expose that data to parties that shouldn't be able to view it.
One defence against this is to make sure that at the points where the data doesn't need to be viewed or worked with it is encrypted.
This will broadly come down to encrypting the data at rest and in transit. When the data is being stored and when it is being moved it should always be encrypted, the only time it should be in plain text is when it is being shown to the user. A user that has been properly authentication and authorised to view that data, via that application, from that network using that device.
This mindset should also be applied to other aspects of data sensitivity, identifying which data items are sensitive and how they are being transported. An example of this would be considering what data items are being included in the query string of requests, because this might be logged in various places data within it should be assessed for sensitivity.
Authentication, authorisation and encryption aren't the only security related factors you need to be thinking about. But embracing them and thinking about them in a greater depth will help you dive deeper into the security of your software and system and be aware of a greater breadth of possible threats.









