How to read water

Jon Jagger from less code, more software

is an excellent book by Tristan Gooley (isbn 978-1-473-61522-9). As usual I'm going to quote from a few pages:
One of the universal truths of human observation is that we see more of what we expect to see and less of what we don't expect to see.
Much of my work is not about teaching people to see things that are hard to see, but in showing them how to notice the things that hide in plain sight.
It did not take sailors long to work out that a ship that carries too much may be vulnerable in heavy seas, but sailors were rarely the ones to make the decision about how much cargo a ship could safely carry. The merchants making the profit would have had a different view to the deckhand, especially if the trader never set foot on the vessel. This led to a wrestle between greedy traders and cautious captains that lasted centuries. The first attempts to regulate how much a ship could carry go back over four thousand years to ancient Crete.
Samuel Plimsoll, a nineteenth-century English politician, realized that a low freeboard height can present a problem, but he also appreciated that it becomes the solution if we take a very keen interest in it. In other words, we can tell if there is too much cargo in the boat by looking much more carefully at how high the water rises up the side of the hull. And the easiest way to do this is by drawing a ruler on the side of the ship, calibrated according to an architect's or engineer's understanding of the boat. These lines, which became known as Plimsoll Lines, were such a simple and brilliant success that they became law and proliferated around the world.
From 1833, when the first tide tables were produced by the Admiralty, the emphasis shifted from looking, thinking and understanding, to depending on tables of others' measurements.
There is a stange truth in the profile of beaches: they have evolved in a physical sense to be a near ideal shape to defend themselves against the onslaught of the sea. This means that almost any attempt to engineer a 'solution' to what nature is trying to achieve has as much chance of backfiring as working.
Many sailors use little pieces of fabric, nicknamed 'tell-tails', that are tied to the sails and stays (the wires that give the mast stability), to offer a constant visual reminder of what the wind is doing.
Once the depth of the water is half the wavelength of the waves, it effectively cramps the motion of the waves and it is this that slows them down.
Sailors dislike precision almost as much as they like bureaucracy.
Rivers do not run straight for more than ten times their own width.
There will be an alternating combination of quick water and much slower water and this always happens in a certain way. The quick patches are knowns, perhaps onomatopoeically, as 'riffles' and the slower areas are known as pools. If there is no human tinkering with a river's flow, then there will be a riffle-pool sequence for every stretch of river that is fives times its width.
It is typical for the water at the sides of a river to be travelling at only a quarter of the speed of the water in the centre. The river is being slowed by two things at its sides; when it comes into contact with banks it is slowed by friction and it is also slowed by the shallowing at the sides.
A stream is just a river you can step over.
Swell is the name of the waves that have enough energy to travel beyond the area of wind.


The User-Agent is not Just for Browsers

Chris Oldwood from The OldWood Thing

One of the trickiest problems when you’re building a web service is knowing who your clients are. I don’t mean your customers, that’s a much harder problem, no, I literally mean you don’t know what client software is talking to you.

Although it shouldn’t really matter who your consumers are from a technical perspective, once your service starts to field requests and you’re working out what and how to monitor it, knowing this becomes far more useful.

Proactive Monitoring

For example the last API I worked on we were generating 404’s for a regular stream of requests because the consumer had a bug in their URL formatting and erroneously appended an extra space for one of the segments. We could see this at our end but didn’t know who to tell. We had to spam our “API Consumers” Slack channel in the hope the right person would notice [1].

We also had consumers sending us the wrong kind of authorisation token, which again we could see but didn’t know which team to contact. Although having a Slack channel for the API helped, we found that people only paid attention to it when they noticed a problem. It also appeared, from our end, that devs would prefer to fumble around rather than pair with us on getting their client end working quickly and reliably.

Client Detection

Absent any other information a cloud hosted service pretty much only has the client IP to go on. If you’re behind a load balancer then you’re looking at the X-Forwarded-For header instead which might give you a clue. Of course if many of your consumers are also services running in the cloud or behind the on-premise firewall they all look pretty much the same.

Hence as part of our API documentation we strongly encouraged consumers to supply a User-Agent field with their service name, purpose, and version, e.g. MyMobileApp:Test/1.0.56. This meant that we would now have a better chance of talking to the right people when we spotted them doing something odd.

From a monitoring perspective we can then use the User-Agent in various ways to slice-and-dice our traffic. For example we can now successfully attribute load to various consumers. We can also filter out certain behaviours from triggering alerts when we know, for example, that it’s their contract tests passing bad data on purpose.

By providing us with a version number we can also see when they release a new version and help them ensure they’ve deprecated old versions. Whilst you would expect service owners to know exactly what they’ve got running where, you’d be surprised how many don’t know they have old instances lying around. It also helps identify who the laggards are that are holding up removal of your legacy features.

Causality

A somewhat related idea is the use of “trace” or “correlation” IDs, which is something I’ve covered before in “Causality - A Mechanism for Relating Distributed Diagnostic Contexts”. These are unique IDs for diagnosing problems with requests and it’s useful to include a prefix for the originating system. However that system may not be your actual client if there are various other services between you and them. Hence the causality ID covers the end-to-end where the User-Agent can cover the local client-server hop.

You would think that the benefit of passing it was fairly clear – it allows providers to proactively help consumers fix their problems. And yet like so many non-functional requirements it sits lower down their backlog because it’s only optional [2]. Not only that but by masking themselves it actually hampers delivery of new features because you’re working harder than necessary to keep the existing lights on.

 

[1] Ironically the requests were for some automated tests which they didn’t realise were failing!

[2] We wanted to make the User-Agent header mandatory on all non-production environments [3] to try and convince our consumers of the benefits but it didn’t sit well with the upper echelons.

[3] The idea being that its use in production then becomes automatic but does not exclude easy use of diagnostic tools like CURL for production issues.

Visual Lint 6.0.6.287 has been released

Products, the Universe and Everything from Products, the Universe and Everything

Visual Lint 6.0.6.287 is now available. This is a recommended maintenance update for Visual Lint 6.0, and includes the following changes:
  • Added PC-lint Plus specific compiler indirect files co-rb-vs6.lnt (Microsoft Visual Studio 6.0), co-rb-vs2002.lnt (Microsoft Visual Studio .NET 2002), co-rb-vs2003.lnt (Microsoft Visual Studio .NET 2003) and co-rb-vs2005.lnt (Microsoft Visual Studio 2005) to the installer.
  • Modified the installer to correctly recognise Microsoft Visual Studio 2017 version 15.3 installations.
  • Fixed a bug in the installer which affected the installation of the Atmel Studio plug-in into AVR Studio 5.x.
  • Added PC-lint Plus specific warning 550 suppression directives for the UNREFERENCED_PARAMETER, DBG_UNREFERENCED_PARAMETER and DBG_UNREFERENCED_LOCAL_VARIABLE macros.

    These suppressions are implemented in a new lib-rb-win32.h header file referenced by the existing lib-rb-win32.lnt indirect file supplied within the installer.
Download Visual Lint 6.0.6.287

Don’t Hide the Solution Structure

Chris Oldwood from The OldWood Thing

Whenever you join an existing team and start work on their codebase you need to orientate yourself so that you have a feel for the system’s architecture and design. If you’re lucky there is some documentation, perhaps nice diagrams to give you an overview. Hopefully you also have an extensive suite of tests to tell you how the system behaves.

More than likely there is nothing or very little to go on, and if it’s a truly legacy system any documentation could well be way out of date. At this point you pretty much only have the source code to work from. Whilst this is the source of truth, the amount of code you need to read to become au fait with all the various high-level concepts depends in part on how well it’s laid out.

Static Structure

Irrespective of whether you like to think of your layers in terms of onions or brick walls, all code essentially gets organised on disk and that means the solution structure is hierarchical in nature. In the most popular languages that support namespaces, these are also hierarchical and are commonly laid out on disk to reflect the same hierarchy [1].

Although the compiler is happy to just hoover up source code from the entire solution and largely ignore the relative position of the callers and callees there are useful conventions, which if honoured, allow you to reason and refactor the code more easily due to lower coupling. For example, defining an interface in the same source file as a class that implements it suggests a different inheritance use than when the interface sits externally further up the hierarchy. Also, seeing code higher up the hierarchy referencing types deeper down in an unrelated branch is another smell, of an abstraction potentially depending on an implementation detail.

Navigating the Structure

One of the things I’ve noticed in recent years whilst pairing is that many developers appear to navigate the source code solely through their IDE, and within the IDE by using features like “go to definition (implementation)”. Some very rarely see the solution structure because they hide it to gain more screen real estate for the source file of current interest [2].

Hence the only time the solution structure is visible is when there is a need to add a new source file. My purely anecdotal evidence suggests that this will be added without a great deal of thought as the code can be easy located in future directly by the author through its class name or another reference; they never have to consider where it “logically” resides.

Sprawling Suburbs

The net result is that namespaces and packages suffer from urban sprawl as they slowly accrete more and more code. This newer code adds more dependencies and so the package as a whole acquires an ever increasing number of dependencies. Left unchecked this can lead to horrible cyclic dependencies that are a nightmare to resolve.

I recently had the opportunity to revisit the codebase for a greenfield system I had started a few years before. We initially partitioned the code into a few key assemblies to get ourselves going and so I was somewhat surprised to still see the same assemblies a few years later, albeit massively overgrown with extra responsibilities. As a consequence even their simple home-grown tools had bizarre dependencies dragged in through bloated shared libraries [3].

Take a Stroll

So in future, instead of taking the Underground (subway) through your codebase every day, stop, and take a stroll every now-and-then around the paths. The same rules about cohesion within the methods of a class also apply at the higher levels of design – classes in a namespace, namespaces in an assembly, assemblies in a solution, etc. Then you’ll find that as the system grows it’s easier to refactor at the package level [3].

(For more on this topic see my older post “Who’s Maintaining the 100 Foot View?”.)

 

[1] Annoyingly this is not a common practice in the C++ codebases I’ve worked on.

[2] If I was being flippant I might suggest that if you really need the space the code may be too complicated, as I once did on Twitter here.

[3] I once dragged in a project’s shared library for a few useful extension methods to use in a simple console app and found I had pulled in an IoC container and almost a dozen other NuGet dependencies!

[4] In C# the internal access modifier has zero effect if you stick all your code into one assembly.

Every Commit Needs the Rationale to Support It

Chris Oldwood from The OldWood Thing

Each and every change to a codebase should be performed for a very specific reason – we shouldn’t just change some code because we feel like it. If you follow a checklist (mental or otherwise), such as the one I described in “Commit Checklist”, then each commit should be as cohesive as possible with any unintentional edits reverted to spare our blushes.

However, whilst the code can say what behaviour has changed, we also need to say why it was changed. The old adage “use the source Luke” is great for reminding us that the only source of truth is the code itself, but changes made without any supporting documentation makes software archaeology [1] incredibly difficult in the future.

The Commit Log

Take the following one line change to the JSON serialization settings used when persisting to a database:

DateTimeZoneHandling = DateTimeZoneHandling.Utc;

This single-line edit appeared in a commit all by itself. Now, any change which has the potential to affect the storage or retrieval of the system’s data is something which should not be entered into lightly. Even if the change was done to make what is currently a default setting explicit, this fact still needs to be recorded – the rationale is important.

The first port of call for any documentation around a change is probably the commit message. Given that it lives with the code and is (usually) immutable it stands the best chance of remaining intact over time. In the example above the commit message was simply:

“Bug Fix: added date time zone handling to UTC for database json serialization”

In the same way that poor code comments have a habit of simply stating what the code does, the same malaise can affect commit messages by merely restating what was changed. Our example largely suffers from this, but it also teases us by additionally mentioning that it was done to fix a bug. Suddenly we have so many more unanswered questions about the change.

Code Change Comments

In the dim and distant past it was not unusual to use code comments to annotate changes as well as to describe the behaviour of the code. Before the advent of version control features like “blame” (aka annotate) it was non-trivial to track down the commit where any particular line of code changed. As such it seemed easier to embed the change details in the code itself rather than the VCS tool, especially if the supporting documentation lived in another system; you could just use the Change Request ID as the comment.

As you can imagine this sorta worked okay at first but as the code continued to change and refactoring became more popular these comments became as distracting and pointless as the more traditional kind. It also did nothing to help reduce the overheard of tracking the how-and-why in different places.

Feature Trackers

The situation originally used to be worse than this as new features might be tracked in one place by the business whilst bugs were tracked elsewhere by the development team. This meant that the “why” could be distributed right across time and space without the necessary links to tie them all together.

The desire to track all work in one place in an Enterprise tool like JIRA has at least reduced the number of places you need to look for “the bigger picture”, assuming you use the tool for more than just recording estimates and time spent, but of course there are lightweight alternatives [2]. Hence recording the JIRA number or Trello card number in the commit message is probably the most common approach to linking these two sides of the change.

As an aside, one of the reasons many teams haven’t historically put all their documentation in their source code repo is because it’s often been inaccessible to non-developer colleagues, either due to lack of permissions or technical ability. Fortunately tools like GitHub have started to bridge this divide.

Executable Specifications

One of the oldest problems in software development has been keeping the supporting documentation and code in sync. As features evolve it becomes harder and harder to know what the canonical reason for any change is because the current behaviour may be the sum of all previous related requirements.

An ever-growing technique for combating this has been to express the documentation, i.e. the requirements, in code too, in the form of tests. At a high level these are acceptance tests, with more technical behaviours expressed as unit or integration tests.

This brings me back to my earlier example. It’s incredibly rare that any code change would be committed without some kind of corresponding change to the automated tests. In this instance the bug must have manifested itself in the persistence layer and I’d expect at least one new test to be added (or an existing one fixed) to illustrate what the bug is. Hence the rationale for the change is to fix a bug, and the rationale can largely be described through the use of one or more well written tests rather than in prose.

Exceptions

There are of course no absolutes in life and fixing a spelling mistake should not require pages of notes, although spelling incorrectly on purpose probably does [3].

The point is that there is a balance to be struck if we are to trade-off the short and long term maintenance of the system. It might be tempting to rely on tribal knowledge or the product owner’s notes to avoid thinking about how the rationale is best expressed, but finding a way to encode that information in executable form, such as through tests, provides both the present reviewer and the future software archaeologist with the most usable representation.

 

[1] See my “Software Archaeology” article for more about spelunking a codebase’s history.

[2] I’ve written about the various tools I’ve used in the past in  “Feature Tracking”.

[3] The HTTP “referer” header being a notable exception, See Wikipedia.

Organize for Complexity

Jon Jagger from less code, more software

is an excellent book by Niels Pflaeging (isbn 978-0-9915376-0-0). As usual I'm going to quote from a few pages:
What Taylor pioneered was the idea of consistently dividing an organization between thinking people (managers) and executing people (workers).
Problem-solving in a life-less system is about instruction.
Problem-solving in a living system is about communication.
Any attempt to motivate can only lead to de-motivation.
Ultimately, organizing for complexity and self-organization are always about empowering teams ... not about empowering individuals.
Actual teams of people working for and with each other.
Nobody is in control. Everybody is in charge.
To be intensively involved in selection [recruiting] is a matter of honor.
A hallmark of great selection is that it is highly time-consuming.
Management is a mindset that will not just go away all by itself.
When employees think for themselves and make entrepreneurial decisions automonomously, you must at all times bear joint reponsibility for those decisions, even if you or other members of the organization might have decided differently.
A "beta" kind of organization produces many such stories: Peculiarities, unusual practices, by which they can be instantly recognized among so many over-managed and under-led organizations.
People do not need to be forced to work. However, this deeply-seated prejudice about people and their relationship to work is what keeps management alive.


TECH(K)NOW Day workshop on “Writing a programming language”

Andy Balaam from Andy Balaam's Blog

My OpenMarket colleagues and I ran a workshop at TECH(K)NOW Day on how to write your own programming language:

A big thank you to my colleagues from OpenMarket who volunteered to help: Rowan, Jenny, Zach, James and Elliot.

An extra thank you to Zach and Elliott for their impromptu help on the information desk for attendees:

Hopefully the attendees enjoyed it and learned a bit:

You can find the workshop slides, the full code, info about another simple language called Cell, and lots more links here: github.com/andybalaam/videos-write-your-own-language, my blog at artificialworlds.net/blog, and follow me on twitter @andybalaam.

Thanks to OpenMarket for supporting us in running this workshop!

Installing specific major Java JDK versions on OS X via Homebrew

Timo Geusch from The Lone C++ Coder's Blog

In an earlier post, I described how to install the latest version of the Oracle Java JDK using homebrew. What hadn’t been completely obvious to me when I wrote the original blog post is that the ‘java’ cask will install the latest major version of the JDK. As a result, when I upgraded my JDK […]

The post Installing specific major Java JDK versions on OS X via Homebrew appeared first on The Lone C++ Coder's Blog.

Installing specific major Java JDK versions on OS X via Homebrew

The Lone C++ Coder's Blog from The Lone C++ Coder's Blog

Update 2019-05-07: The java8 cask is affected by recent licensing changes by Oracle. There’s a discussion over on github about this. I’m leaving the post up partially for historic context, but the java8 cask is no longer available, at least at the time of writing.

In an earlier post, I described how to install the latest version of the Oracle Java JDK using homebrew. What hadn’t been completely obvious to me when I wrote the original blog post is that the ‘java’ cask will install the latest major version of the JDK. As a result, when I upgraded my JDK install today, I ended up with an upgrade from Java 8 to Java 9. On my personal machine that’s not a problem, but what if I wanted to stick with a specific major version  of Java?