A Two Coffee Problem: December 2018

Sunday, 16 December 2018

Language Lessons

Developers will very often define themselves by the programming language they are using to apply their skills. Based on the perceived strengths and weaknesses of different languages they will also very often judge their counterparts along similar lines.

To a large extent this view point is misguided. The differences between languages in the same paradigm are often superficial and the majority of developers will use multiple languages during their career.

The reason developers will utilise many different languages is because they are an evolving tool, this evolution can be tracked through various generations as new concepts have been developed and different approaches postulated.

First Generation (1GL)

1GL languages, also known as machine language, is code that can be directly stored and executed by the CPU, in other words 1's and 0's.

Programming in a 1GL language would often involve the manipulation of switches or other hardware and is almost unrecognisable from modern programming techniques.

Second Generation (2GL)

2GL languages start to introduce constructs to allow developers to read and make sense of the code. They are formed of combing instructions that are executed directly by the CPU being targeted. However they provide little to no abstraction from the steps taken to run the program, developers operating at this level are manipulating registers via simple mathematical and logical operations.

Coding at this level has the potential to be very fast and very efficient, but equally is extremely complex, error prone and difficult to debug.

Code written at this level is also not portable, it will only run on the generation of processors it has been written for. While most developers will not operate at this level the code they write using higher level languages is often compiled down to a 2GL program with the subsequent binary file that is deployed representing a 1GL language.

Third Generation (3GL)

3GL languages introduced many of the basic programming constructs that all modern day developers are familiar with. This includes logical constructs such as if, else-if and looping constructs such as for, while, do-while.

Whilst still describing what needs to happen when the program runs it describes this behaviour in a more algorithmic fashion as opposed to describing how the CPU should manipulate bits and bytes.

A certain division of 3GL languages also gave rise to the concept of Object Oriented (OO) languages which attempted to represent programs in terms of collections of data and functionality that interact in a manner designed to model the problem space being solved.

Some 3GL languages also attempted to make code portable by being compiled to a 2GL language that runs on a virtual machine and therefore not tied to a particular physical CPU.

Fourth Generation (4GL)

4GL languages attempt to improve upon 3GL by operating upon larger data constructs. Sometimes this distinction can be subtle but 3GL languages often operate on relatively simple and low level data structures such has strings, booleans and value types.

Operating with higher level data constructs can make these languages less general purpose and often leads to 4GL languages being specialised around a particular purpose such as graphics or database manipulation.

Fifth Generation (5GL)

5GL languages attempt to shift the view point of code from how an outcome should be achieved to describing what that outcome should be. Rather than a programmer coding an algorithm they describe the constraints the program should operate in and the desired outcome.

The boundary between 5GL and 4GL languages is often blurred. Some 4GL languages also try to operate along the lines of what as opposed to how and are sometimes miscategorised as 5GL languages.

It would be wrong to view this evolution of languages as each subsequent generation being better than its predecessor. Mostly the changes between generations are focussed around making the programmers life easier and allowing them to describe algorithms in higher level terms, making code more readable to someone that understands the problem that is being solved.

Many coders still operate at 2GL or lower 3GL level because of the potential to achieve higher performance and efficiency. A good example of this is the prevalence that languages such as C and C++ still have in fields such as graphics and gaming.

There is no language to rule them all, eventually you may be required to solve a problem that requires a different approach, this requires you to be comfortable to go to the toolbox and select the correct tool for the job in hand no matter what generation that may be.

Whatever generation of language your are using don't define yourself by it. A programming language is a tool and you would likely be as adept at applying your coding skill in many other languages.

Monday, 10 December 2018

Threading Things Together

I would predict with a high degree of certainty that all software engineers reading this could relay war stories of issues caused by multi-threading.

Cast very much under the guise of a necessary evil, multi-threading seems to have a unique ability to cause confusion with difficult to diagnose defects and perplexing behaviour. Because of this features of many popular languages are evolving to try and help developers deal with this complexity and find better routes around it.

So if we can't live without threads how can we learn to live with them?

Recognising Dragons

One of the best ways to deal with potential issues is to recognise them coming over the horizon and try and steer the code base away from them.

Chief among these potential issues are race conditions, events happening out of the expected order and therefore driving unexpected outcomes. These issues are nearly always caused by the mutation of data by multiple threads. Whenever you decide to transition an area of code to be multi-threaded particular attention needs to be paid to the reading and writing of data, left unchecked these areas will almost certainly cause you problems.

Once your system goes multi-threaded you will very often end up in a situation where there is some dependency between them, this situation can all to easily end up with a deadlock, Thread A requires Thread B to complete its action but Thread B cannot complete because it requires resources currently being held by Thread A, nobody can move forward and the system is deadlocked.

It's also important to realise that threads are not entirely an abstract construct, they do have an underlying reliance on infrastructure and are therefore finite. It is therefore possible for your system to become starved of threads and hit a limitation in throughput.

Learning the Language

Multi-threading is definitely an area of software engineering where going your own way is not advisable. The complexity of implementing an effective implementation combined with the potential pitfalls aren't conducive to devising custom solutions.

Let others take the strain by learning the conventions for multi-threading within your chosen language, these are likely the result of a large number of man hours by experts and will be demonstrably effective and robust.

It's likely that various approaches are possible operating with varying degrees of complexity. As with many areas of software development a healthy aversion to complexity will not steer you far wrong, always strive for the simplest possible solution.

Many languages are developing features to remove the concern around thread creation and management from the shoulders of developers, a prime example of this would be the introduction of async\await within .NET or Completablefuture in Java. Whilst it will still be possible to cause yourself problems with these kinds of features hopefully it's harder to blow your foot off.

Strategic Decisions

Even once you understand the constructs of your language it's still imperative that your codebase has a defined strategy around multi-threading.

Trying to analyse the behaviour of a code base where new threads maybe put into operation at any moment is infinitely harder than assessing code that will transition to be multi-threaded in a predictable and prescribed manner.

If this strategy is well thought through then it can look to protect developers from some of the common pitfalls.

Any sufficiently complex software will require the use of multi-threading to deal with the throughput demands being placed on it. A healthy dose of fear around this aspect of code usually comes to all developers as their experience grows.

Use this fear as a catalyst for defining a strategy and a healthy laziness in looking for language features to take the strain. Multi-threading may on occasion cause you to have sleepiness nights but it is possible to find a way to let the magic happen without disaster being round corner.