Delivery Anti-Pattern: Local Optimisations

Chris Oldwood from The OldWood Thing

The daily stand-up mostly went along as usual. I wasn’t entirely sure why there was a stand-up as it wasn’t so much a team as a bunch of people working on the same codebase but with more-or-less individual goals. Applying the microservices analogy to the team you could say we were a bunch of micro-teams – each person largely acting with autonomy rather than striving to work together to achieve a common goal [1].

Time

But I digress, only slightly. What happened at the stand-up was a brief discussion around the delivery of a feature with the suggestion that it should be hidden behind a feature toggle. The implementer explained that they weren’t going to add a feature toggle because “it was a waste of their time”.

This surprised me. Knowing what I do now about how the team operates it isn’t that surprising but at the time I was used to working in teams where every member works towards common goals. One of those common goals is to try and ensure the delivery of features is a continuous flow and is not disrupted by a bad change which then has to be backed out because rolling back has the potential to create all sorts of disruption, not least the delay of those unaffected changes.

You should note that I’m not disagreeing with their choice of whether or not to use a feature toggle – I did not know anywhere near enough about the change or the system at that time to contribute to that decision. No, what disturbed me was the reason why they chose not to take that approach – that their time is more valuable than that of anyone else in the team, or the business for that matter.

In isolation that paints an unpleasant picture of the individual in question and that simply is not the case. However their choice of words, even if done without real consideration, does appear to reinforce the culture that surrounds them. In essence, with a feeling that the focus is on them and their performance, they are naturally going to behave in a way that favours optimising delivering their own features over that of the team at large.

Quality

Another example of favouring a local optimisation over the longer term goal of sustained delivery occurred when I was assigned my first piece of work. This was not so much a story as a couple of epics funded as an entire project (over 4 months solid work in the end). My instinct, after being shown roughly where in the code I needed to dig, was to ask where the existing tests were so that I knew where to add mine. The tech lead’s immediate response was “you won’t have time to write tests”.

My usual response to this statement is a jovially phrased “how will I know if it works then?” which often has the effect of opening a line of dialogue around the testing strategy and where it’s heading. Unfortunately this time around it only succeeded in the tech lead launching into a diatribe about how important delivery was, how much the business trusted us to deliver on time, blah, blah, blah, in fact almost everything that a good test suite enables!

Of course I still went ahead and implemented the entire project TDD-style and easily delivered it on time because I knew the approach was sound and the investment was more than worth it. The subsequent enhancements and repaying of some technical debt also became trivial at that point and meant that anyone, not just me, could safely and quickly make changes to that area of code. It also showed how easy it was to add new tests to cover changes to the older parts of the component when required later.

In the end over 10% of unit tests of the entire system had been written by me during that project for a codebase of probably over 1/2 million lines of code. I also added a command line test harness and a regression testing “framework” [2] in that time too all in an effort to reduce the amount of hoops you needed to go through to diagnose and safely fix any edge cases that showed up later. None of this was rocket science or in the least bit onerous.

Knowledge

I would consider a lack of supporting documentation one further local optimisation too. When only a select few have the knowledge to help support a system you have to continually rely on their help to nurse it through the bad times. This is especially true when the system has enough quirks that the cost of taking the wrong action is quite high (in terms of additional noise). If you need to remember a complex set of conditions and actions you’re going to get it wrong eventually without some form of checklist to work from. Relying on tribal knowledge is a great form of optimisation until core members of the team leave and you unearth the gaping holes in the team’s knowledge.

Better yet, design away the problems entirely, but that’s a different can of worms…

Project Before Product

I believe this was another example of how “projects” are detrimental to the development of a complex system. With the team funded by various projects and those projects being used as a very clear division on the task board through swim lanes [3] it killed the desire to swarm on anything but a production incident because you felt beholden to your specific stakeholders.

For example there were a number of conversations about fixing issues with the system that were slowing down delivery through unreliability that ended with “but who’s going to pay for that?” Although improvements were made they had to be so small as to not really affect the delivery of the project work. Hence the only real choice was to find easier ways to treat the symptoms rather than cure the disease.

Victims of Circumstance

Whenever I bump into this kind of culture my gut instinct is not to assume they are “incompetent” people, on the contrary, they’re clearly intelligent so I’ll assume they are shaped by their environment. Of course we all have our differences, that’s what makes diversity so useful, but we have to remember to stop once in a while and reflect on what we’re doing and question whether it’s still the right approach to take. What works for building Fizz Buzz does not work for a real-time, distributed calculation engine. And even if that approach did work once upon a time the world keeps moving on and so now we might be able to do even better.

 

[1] Pairing was only something you did when you’d already been stuck for some time, and when the mistake was found you went your separate ways again.

[2] I say “framework” because it was really just leveraging a classic technique: a command line tool reading CSV format data which fired requests into a server, the results of which are then diff’d against a known set of results (Golden Master Testing).

[3] The stand-up was originally run in project order, lead by the PM. Unsurprisingly those not involved in the other projects were rarely engaged in the meeting unless it was their turn to speak.

It Compiles, Ship It!

Chris Oldwood from The OldWood Thing

The method was pretty simple and a fairly bog standard affair, it just attempted to look something up in a map and return the associated result, e.g.

public string LookupName(string key)
{
  string name;

  if (!customers.TryGetValue(key, out name)
    throw new Exception(“Customer not found”);

  return name;
}

The use of an exception here to signal failure implied to me that this really shouldn’t happen in practice unless the data structure is screwed up or some input validation was missed further upstream. Either way you know (from looking at the implementation) that the outcome of calling the method is either the value you’re after or an exception will be thrown.

So I was more than a little surprised when I saw the implementation of the method suddenly change to this:

public string LookupName(string key)
{
  string name;

  if (!customers.TryGetValue(key, out name)
    return null;

  return name;
}

The method no longer threw an exception on failure it now returned a null string reference.

This wouldn’t be quite so surprising if all the call sites that used this method had also been fixed-up to account for this change in behaviour. In fact what initially piqued my interest wasn’t that this method had changed (although we’ll see in a moment that it could have been expressed better) but how the calling logic would have changed.

Wishful Thinking

I always approach a change from a position of uncertainty. I’m invariably wrong or have something to learn, either from a patterns perspective or a business logic one. Hence my initial assumption was that I now needed to think differently about what happens when I need to “lookup a name” and that lookup fails. Where before it was truly exceptional and should never occur in practice (perhaps indicating a bug somewhere else) it’s now more likely and something to be formally considered, and resolving the failure needs to be handled on a case-by-case basis.

Of course that wasn’t the case at all. The method had been changed to return a null reference because it was now an implementation detail of another new method which didn’t want to use catching an exception for flow control. Instead they now simply check for null and act accordingly.

As none of the original call sites had been changed to handle the new semantics a rich exception thrown early had now been traded for (at best) a NullReferenceException later or (worse case) no error at all and an incorrect result calculated based on bad input data [1].

The TryXxx Pattern

Coming back to reality it’s easy to see that what the author really wanted here was another method that allowed them to attempt a lookup on a name, knowing that in their scenario it could possibly fail but that’s okay because they have a back-up plan. In C# this is a very common pattern that looks like this:

public bool TryLookupName(string key, out string name)

Success or failure is indicated by the return value and the result of the lookup returned via the final argument. (Personally I’ve tended to favour using ref over out for the return value [2].)

The Optional Approach

While statically types languages are great at catching all sorts of type related errors at compile time they cannot catch problems when you smuggle optional reference-type values in languages like C# and Java by using a null reference. Any reference-type value in C# can inherently be null and therefore the compiler is at a loss to help you.

JetBrains’ ReSharper has some useful annotations which you can use to help their static analyser point out mistakes or elide unnecessary checks, but you have to add noisy attributes everywhere. However expressing your intent in code is the goal and it’s one valid and very useful approach.

Winding the clock into the future we have the new “optional reference” feature to look forward to in C# (currently in preview). Rather than bury their heads in the sand the C# designers have worked hard to try and right an old wrong and reduce the impact of Sir Tony Hoare’s billion dollar mistake by making null references type unsafe.

In the meantime, and for those of us working with older C# compilers, we still have the ability to invent our own generic Optional<> type that we can use instead. This is something I’ve been dragging into C# codebases for many years (whilst standing on my soapbox [3]) in an effort to tame at least one aspect of complexity. Using one of these would have changed the signature of the method in question to:

public Optional<string> LookupName(string key)

Now all the call sites would have failed to compile and the author would have been forced to address the effects of their change. (If there had been any tests you would have hoped they would have triggered the alarm too.)

Fix the Design, Not the Compiler

Either of these two approaches allows you to “lean on the compiler” and leverage the power of a statically typed language. This is a useful feature to have but only if it’s put to good use and you know where the limitations are in the language.

While I would like to think that people listen to the compiler I often don’t think they hear it [4]. Too often the compiler is treated as something to be placated, or negotiated with. For example if the Optional<string> approach had been taken the call sites would all have failed to compile. However this calling code:

var name = LookupName(key);

...could easily be “fixed” by simply doing this to silence the compiler:

var name = LookupName(key).Value;

For my own Optional<> type we’d just have switched from a possible NullReferenceException on lookup failure to an InvalidOperationException. Granted this is better as we have at least avoided the chance of the null reference silently making its way further down the path but it doesn’t feel like we’ve addressed the underlying problem (if indeed there has even been a change in the way we should treat lookup failures).

Embracing Change

While the Optional<> approach is perhaps more composable the TryXxx pattern is more invasive and that probably has value in itself. Changing the signature and breaking compilation is supposed to put a speed bump in your way so that you consider the effects of your potential actions. In this sense the more invasive the workaround the more you are challenged to solve the underlying tension with the design.

At least that’s the way I like to think about it but I’m afraid I’m probably just being naïve. The reality, I suspect, is that anyone who could make such a change as switching an exception for a null reference is more concerned with getting their change completed rather than stopping to ponder the wider effects of what any compiler might be trying to tell them.

 

[1] See Postel’s Law and  consider how well that worked out for HTML.

[2] See “Out vs Ref For TryXxx Style Methods”.

[3] C# already has a “Nullable” type for optional values so I find it odd that C# developers find the equivalent type for reference-type values so peculiar. Yes it’s not integrated into the language but I find it’s usually a disconnect at the conceptual level, not a syntactic one.

[4] A passing nod to the conversation between Woody Harrelson and Wesley Snipes discussing Jimi Hendrix in White Men Can’t Jump.

The Perils of DateTime.Parse()

Chris Oldwood from The OldWood Thing

The error message was somewhat flummoxing, largely because it was so generic, but also because the data all came from a database extract rather than manual input:

Input string was not in a correct format.

Naturally I looked carefully at all the various decimal and date values as I knew this was the kind of message you get when parsing those kind of values when they’re incorrectly formed, but none of them appeared to be at fault. The DateTime error message is actually slightly different [1] but I’d forgotten that at the time and so I eyeballed the dates as well as decimal values just in case.

Then I remembered that empty string values also caused this error, but lo-and-behold I was not missing any optional decimals or dates in my table either. Time to hit the debugger and see what was going on here [2].

The Plot Thickens

I changed the settings for the FormatException error type to break on throw, sent in my data to the service, and waited for it to trip. It didn’t take long before the debugger fired into life and I could see that the code was trying to parse a decimal value as a double but the string value was “0100/04/01”, i.e. the 1st April in the year 100. WTF!

I immediately went back to my table and checked my data again, aware that a date like this would have stood out a mile first time around, but I was happy to assume that I could have missed it. This time I used some regular expressions just to be sure my eyes were not deceiving me.

The thing was I knew what column the parser thought the value was in but I didn’t entirely trust that I hadn’t mucked up the file structure and added or removed an errant comma in the CSV input file. I didn’t appear to have done that and so the value that appeared to be causing this problem was the decimal number “100.04”, but how?

None of this made any sense and so I decided to debug the client code, right from reading in the CSV data file through to sending it across the wire to the service, to see what was happening. The service was invoked via a fairly simple WCF client assembly and as I stepped into that code I came across a method called NormaliseDate()...

The Mist Clears

What this method did was to attempt to parse the input string value as a date and if it was successful it would rewrite it in an unusual (to me) “universal” format – YYYY/MM/DD [3].

The first two parsing attempts it did were very specific, i.e. it used DateTime.ParseExact() to match the intended output format and the “sane” local time format of DD/MM/YYYY. So far, so good.

However the third and last attempt, for whatever reason, just used DateTime.Parse() in its no-frills form and that was happy to take a decimal number like “100.04” and treat it as a date in the format YYY.MM! At first I wondered if it was treating it as a serial or OLE date of some kind but I think it’s just more liberal in its choice of separators than the author of our method intended [4].

Naturally there are no unit tests for this code or any type of regression test suite that shows what kind of scenarios this method was intended to support. Due to lack of knowledge around deployment and use in the wild of the client library I was forced to pad the values in the input file with trailing zeroes in the short term to workaround the issue, yuck! [5]

JSON Parsers

This isn’t the first time I’ve had a run-in with a date parser. When I was working on REST APIs I always got frustrated by how permissive the JSON parser would be in attempting to coerce a string value into a date (and time). All we ever wanted was to keep it simple and only allow ISO-8601 format timestamps in UTC unless there was a genuine need to support other formats.

Every time I started writing the acceptance tests though for timestamp validation I’d find that I could never quite configure the JSON parser to reject everything but the desired format. In the earlier days of my time with ASP.Net even getting it to stop accepting local times was a struggle and even caused us a problem as we discovered a US/UK date format confusion error which the parser was hiding from us.

In the end we resorted to creating our own Iso8601DateTime type which used the .Net DateTimeOffest type under the covers but effectively allowed us to use our own custom JSON serializer methods to only support the exact format we wanted.

More recently JSON.Net has gotten better at letting you control the format and parsing of dates but it’s still not perfect and there are unit tests in past codebases that show variants that would unexpectedly pass, despite using the strictest settings. I wouldn’t be surprised if our Iso8601DateTime type was still in use as I can only assume everyone else is far less pedantic about the validation of datetimes and those that are have taken a similar route to ensure they control parsing.

A Dangerous Game

One should not lose sight though of the real issue here which the attempt to classify string values by attempting to parse them. Even if you limit yourself to a single locale you might get away with it but when you try and do that across arbitrary locales you’re just asking for trouble.

 

[1] “String was not recognized as a valid DateTime.”

[2] This whole fiasco falls squarely in the territory I’ve covered before in my Overload article “Terse Exception Messages”. Fixing this went to the top of my backlog, especially after I discovered it was a problem for our users too.

[3] Why they didn’t just pick THE universal format of ISO-8601 is anyone’s guess.

[4] I still need to go back and read the documentation for this method because it clearly caters for scenarios I just don’t normally see in my normal locale or user base.

[5] That’s what happens with tactical solutions, no one ever quite gets around to documenting anything because they never think it’ll survive for very long...

My Dislike of GOPATH

Chris Oldwood from The OldWood Thing

[This post was written in response to a tweet from Matt Aimonetti which asked “did you ever try #golang? If not, can you tell me why? If yes, what was the first blocker/annoyance you encountered?”]

I’m a dabbler in Go [1], by which I mean I know enough to be considered dangerous but not enough to be proficient. I’ve done a number of katas and even paired on some simple tools my team has built in Go. There is so much to like about it that I’ve had cause to prefer looking for 3rd party tools written in it in the faint hope that I might at some point be able to contribute one day. But, every time I pull the source code I end up wasting so much time trying to get the thing to build because Go has its own opinions about how the source is laid out and tools are built that don’t match the way I (or most other tools I’ve used) work.

Hello, World!

Go is really easy to get into, on Windows it’s as simple as:

> choco install golang

This installs the compiler and standard libraries and you’re all ready to get started. The obligatory first program is as simple as doing:

> pushd \Dev
> mkdir HelloWorld
> notepad HelloWorld.go
> go build HelloWorld.go
> HelloWorld

So far, so good. It’s pretty much the same as if you were writing in any other compiled language – create folder, create source file, build code, run it.

Along the way you may have run into some complaint about the variable GOPATH not being set. It’s easily fixed by simply doing:

> set GOPATH=%CD%

You might have bothered to read up on all the brouhaha, got side-tracked and discovered that it’s easy to silence the complaint without any loss of functionality by setting it to point to anywhere. After all the goal early on is just to get your first program built and running not get bogged down in tooling esoterica.

When I reached this point I did a few katas and used a locally installed copy of the excellent multi-language exercise tool cyber-dojo.org to have a play with the test framework and some of the built-in libraries. Using a tool like cyber-dojo meant that GOPATH problems didn’t rear their ugly head again as it was already handled by the tool and the katas only needed standard library stuff.

The first non-kata program my team wrote in Go (which I paired on) was a simple HTTP smoke test tool that also just lives in the same repo as the service it tests. Once again their was nary a whiff of GOPATH issues here either – still simple.

Git Client, But not Quite

The problems eventually started for me when I tried to download a 3rd party tool off the internet and build it [2]. Normally, getting the source code for a tool from an online repository, like GitHub, and then building it is as simple as:

> git clone https://…/tool.git
> pushd tool
> build

Even if there is no Windows build script, with the little you know about Go at this point you’d hope it to be something like this:

> go build

What usually happens now is that you start getting funny errors about not being able to find some code which you can readily deduce is some missing package dependency.

This is the point where you start to haemorrhage time as you seek in vain to fix the missing package dependency without realising that the real mistake you made was right back at the beginning when you used “git clone” instead of “go get”. Not only that but you also forgot that you should have been doing all this inside the “%GOPATH%\src” folder and not in some arbitrary TEMP folder you’d created just to play around in.

The goal was likely to just build & run some 3rd party tool in isolation but that’s not the way Go wants you to see the world.

The Folder as a Sandbox

The most basic form of isolation, and therefore version control, in software development is the humble file-system folder [3]. If you want to monkey with something on the side just make a copy of it in another folder and you know your original is safe. This style of isolation (along with other less favourable forms) is something I’ve written about in depth before in my C Vu In the Toolbox column, see “The Developer’s Sandbox”.

Unfortunately for me this is how I hope (expect) all tools to work out of the box. Interestingly Go is a highly opinionated language (which is a good thing in many cases) that wants you to do all your coding under one folder, identified by the GOPATH variable. The rationale for this is that it reduces friction from versioning problems and helps ensure everyone, and everything, is always using a consistent set of dependencies – ideally the latest.

Tool User, Not Developer

That policy makes sense for Google’s developers working on their company’s tools, but I’m not a Google developer, I’m a just a user of the tool. My goal is to be able to build and run the tool. If there happens to be a simple bug that I can fix, then great, I’d like to do that, but what I do not have the time for is getting bogged down in library versioning issues because the language believes everyone, everywhere should be singing from the same hymn sheet. I’m more used to a world where dependencies move forward at a pace dictated by the author not by the toolchain itself. Things can still move quickly without having to continually live on the bleeding edge.

Improvements

As a Go outsider I can see that the situation is definitely improving. The use of a vendor subtree to house a snapshot of the dependencies in source form makes life much simpler for people like me who just want to use the tool hassle free and dabble occasionally by fixing things here and there.

In the early days when I first ran into this problem I naturally assumed it was my problem and that I just needed to set the GOPATH variable to the root of the repo I had just cloned. I soon learned that this was a fools errand as the repo also has to buy into this and structure their source code accordingly. However a variant of this has got some traction with the gb tool which has (IMHO) got the right idea about isolation but sadly is not the sanctioned approach and so you’re dicing with the potential for future impedance mismatches. Ironically to build and install this tool requires GOPATH to still be working the proper way.

The latest version of Go (1.8) will assume a default location for GOPATH (a “go” folder under your profile) if it’s not set but that does not fix the fundamental issue for me which is that you need to understand that any code you pull down may be somewhere unrelated on your file-system if you don’t understand how all this works.

Embracing GOPATH

Ultimately if I am going to properly embrace Go as a language, and I would like to do more, I know that I need to stop fighting the philosophy and just “get with the programme”. This is hard when you have a couple of decades of inertia to overcome but I’m sure I will eventually. It’s happened enough times now that I know what the warnings signs are and what I need to Google to do it “the Go way”.

Okay, so I don’t like or agree with all of (that I know) the choices the Go language has taken but then I’m always aware of the popular quote by Bjarne Stroustrup:

“There are only two kinds of languages: the ones people complain about and the ones nobody uses.”

This post is but one tiny data point which covers one of the very few complaints I have about Go, and even then it’s just really a bit of early friction that occurs at the start of the journey if you’re not a regular. Still, better that than yet another language nobody uses.

 

[1] Or golang if you want to appease the SEO crowd :o).

[2] It was winrm-cli as I was trying to put together a bug report for Terraform and there was something going wrong with running a Windows script remotely via WinRM.

[3] Let’s put old fashioned technologies like COM to one side for the moment and assume we have learned from past mistakes.