Are Refactoring Tools Less Effective Overall?

Chris Oldwood from The OldWood Thing

Prior to the addition of automatic refactoring tools to modern IDEs refactoring was essentially a manual affair. You would make a code change, hit build, and then fix all the compiler errors (at least for statically typed languages). This technique is commonly known as “leaning on the compiler”. Naturally the operation could be fraught with danger if you were far too ambitious about the change, but knowing when you could lean on the compiler was part of the art of refactoring safely back then.

A Hypothesis

Having lived through both eras (manual and automatic) and paired with developers far more skilled with the automatic approach I’ve come up with a totally non-scientific hypothesis that suggests automatic refactoring tools are actually less effective than the manual approach, overall.

I guess the basis of this hypothesis pretty much hinges on what I mean by “effective”. Here I’m suggesting that automatic tools help you easily refactor to a local minima but not to a global minima [1]; consequently the codebase as a whole ends up in a less coherent state.

Shallow vs Deep Refactoring

The goal of an automatic refactoring tool appears to be to not break your code – it will only allow you to use it to perform a simple refactoring that can be done safely, i.e. if the tool can’t fix up all the code it can see [2] it won’t allow you to do it in the first place. The consequence of this is that the tool constantly limits you to taking very small steps. Watching someone refactor with a tool can sometimes seem tortuous as they may need to use so many little refactoring steps to get the code into the desired state because you cannot make the leaps you want in one go unless you switch to manual mode.

This by itself isn’t a bad thing, after all making a safe change is clearly A Good Thing. No, where I see the problem is that by fixing up all the call sites automatically you don’t get to see the wider effects of the refactoring you’re attempting.

For example the reason you’d choose to rename a class or method is because the existing one is no longer appropriate. This is probably because you’re learned something new about the problem domain. However that class or method does not exist in a vacuum, it has dependencies in the guise of variable names and related types. It’s entirely likely that some of these may now be inappropriate too, however you won’t easily see them because the tool has likely hidden them from you.

Hence one of the “benefits” of the old manual refactoring approach was that as you visited each broken call site you got to reflect on your change in the context of where it’s used. This often led to further refactorings as you began to comprehend the full nature of what you had just discovered.

Blue or Red Pill?

Of course what I’ve just described could easily be interpreted as the kind of “black hole” that many, myself included, would see as an unbounded unit of work. It’s one of those nasty rabbit holes where you enter and, before you know it, you’re burrowing close to the Earth’s core and have edited nearly every file in the entire workspace.

Yes, like any change, it takes discipline to stick to the scope of the original problem. Just because you keep unearthing more and more code that no longer appears to fit the new model it does not mean you have to tackle it right now. Noticing the disparity is the first step towards fixing it.

Commit Review

It’s not entirely true that you won’t see the entire outcome of the refactoring – at the very least the impact will be visible when you review the complete change before committing. (For a fairly comprehensive list of the things I go through at the point I commit see my C Vu article “Commit Checklist”.)

This assumes of course that you do a thorough review of your commits before pushing them. However by this point, just as writing tests after the fact are considerably less attractive, so is finishing off any refactoring; perhaps even more so because the code is not broken per-se, it just might not be the best way of representing the solution.

It’s all too easy to justify the reasons why it’s okay to go ahead and push the change as-is because there are more important things to do. Even if you think you’re aware of technical debt it often takes a fresh pair of eyes to see how you’re living in a codebase riddled with inconsistencies that make it hard to see it’s true structure. One is then never quite sure without reviewing the commit logs what is the legacy and what is the new direction.

Blinded by Tools

Clearly this is not the fault of the tool or their vendors. What they offer now is far more favourable than not having them at all. However once again we need to be reminded that we should not be slaves to our tools but that we are the masters. This is a common theme which is regularly echoed in the software development community and something I myself tackled in the past with “Don’t Let Your Tools Pwn You”.

The Boy Scout Rule (popularised by Uncle Bob) says that we should always leave the camp site cleaner than we found it. While picking up a handful of somebody else’s rubbish and putting it in the bin might meet the goal in a literal sense, it’s no good if the site is acquiring rubbish faster than it’s being collected.

Refactoring is a technique for improving the quality of a software design in a piecewise fashion; just be careful you don’t spend so long on your hands and knees cleaning small areas that you fail to spot the resulting detritus building up around you.

 

[1] I wasn’t sure whether to say minima or maxima but I felt that refactoring was about lowering entropy in some way so went with the reduction metaphor.

[2] Clearly there are limits around published APIs which it just has to ignore.

Excel-style DDE Requests

Chris Oldwood from The OldWood Thing

Despite being over 2 decades old Microsoft’s Dynamic Data Exchange (DDE) in Windows still seems to be in use for Windows IPC by a not insignificant number of companies. At least, if the frequency of DDE questions in my inbox is anything to go by [1][2].

Earlier this year I got a question from someone who was trying to use my DDE Command tool (a command line tool for querying DDE servers) to get data out of the MetaTrader 4 platform. Finance is the area I first came across DDE in anger and it still seems to be a popular choice there even to this day.

Curious Behaviour

The problem was that when they used the ddecmd “request” verb to send an XTYP_REQUEST message to the MetaTrader 4 DDE Server (MT4) for a symbol they always got an immediate result of “N/A”. As a workaround they tried using the “advise” verb, which sends an XTYP_ADVSTART, to listen for updates for a short period instead. This worked for symbols which changed frequently but missed those that didn’t change during the interval. Plus this was a dirty hack as they had to find a way to send a CTRL+C to my tool to stop it after this short interval.

Clearly the MetaTrader DDE server couldn’t be this broken, and the proof was that it worked fine with Microsoft Excel – the other stalwart of the finance industry. Hence the question posed to me was why Excel appeared to work, but sending a request from my tool didn’t, i.e. was there a bug in my tool?

Reproducing the Problem

Given the popularity of the MetaTrader platform and Microsoft Excel the application of Occam’s Razor would suggest a bug in my tool was clearly the most likely answer, so I investigated…

Luckily MetaTrader 4 is a free download and they will even give you a demo account to play with which is super welcome for people like me who only want to use the platform to fix interop problems in their own tools and don’t actually want to use it to trade.

I quickly reproduced the problem by sending a DDE request for a common symbol:

> DDECmd.exe request -s MT4 -t QUOTE -i COPPER
N\A

And then I used the DDE advise command to see it working for background updates:

> DDECmd.exe advise -s MT4 -t QUOTE -i COPPER
2.5525 2.5590
2.5520 2.5590
. . . 

I also tried it in Excel too to see that it was successfully managing to request the current value, even for slow ticking symbols.

How Excel Requests Data Via DDE

My DDE Command tool has a nice feature where it can also act as a DDE server and logs the different requests sent to it. This was originally added by me to help diagnose problems in my own DDE client code but it’s also been useful to see how other DDE clients behave.

As you can see below, when Excel opens a DDE link (=TEST|TEST!X) it actually sends a number of different XTYP_ADVSTART messages as it tries to find the highest fidelity format to receive the data in:

> DDECmd.exe listen –s TEST –t TEST
XTYP_CONNECT: 'TEST', 'TEST'
XTYP_CONNECT_CONFIRM: 'TEST', 'TEST'
XTYP_ADVSTART: 'T…', 'T…', 'StdDocumentName', '49157'
XTYP_ADVSTART: 'TEST', 'TEST', 'X', '50018'
. . .
XTYP_REQUEST: 'TEST', 'TEST', 'X', '50018'

After it manages to set-up the initial advise loop it then goes on to send a one-off XTYP_REQUEST to retrieve the initial value. So, apart from the funky data formats it asks for, there is nothing unusual about the DDE request Excel seems to make.

Advise Before Request

And then it dawned on me, what if the MetaTrader DDE server required an advise loop to be established on a symbol before you’re allowed to request it? Sure enough, I hacked a bit of code into my request command to start an advise loop first and the subsequent DDE request succeeded.

I don’t know if this is a bug in the MetaTrader 4 DDE server or the intended behaviour. I suspect the fact that it works with Excel covers the vast majority of users so maybe it’s never been a priority to support one-off data requests. The various other financial DDE servers I coded against circa 2000 never exhibited this kind of requirement – you could make one-off requests for data with a standalone XTYP_REQUEST message.

The New Fetch Command

The original intent of my DDE Command tool was to provide a tool that allows each XTYP_* message to be sent to a DDE server in isolation, mostly for testing purposes. As such the tools’ verbs pretty much have a one-to-one correspondence with the DDE messages you might send yourself.

To allow people to use my tool against the MetaTrader 4 platform to snapshot data would therefore mean making some kind of small change. I did consider adding various special switches to the existing request and advise verbs, either to force an advise first or to force a request if no immediate update was received but that seemed to go against the ethos a bit.

In the end I decided to add a new verb called “fetch” which acts just like “request”, but starts an advise loop for every item first, then sends a request message for the latest value, thereby directly mimicking Excel.

> DDECmd.exe fetch -s MT4 -t QUOTE -i COPPER -i SILVER
COPPER|2.6075 2.6145
SILVER|16.771 16.821

Hey presto it now works!

This feature was released in DDE Command v1.6.

 

[1] This is a bit of artistic licence :o), they are not a daily occurrence but once every couple of months wouldn’t be far off. So yes, “DDE Is Still Alive & Kicking”.

[2] Most recently it seems quite a few people are beginning to discover that Microsoft dropped NetDDE support way back in Windows Vista.

My Dislike of GOPATH

Chris Oldwood from The OldWood Thing

[This post was written in response to a tweet from Matt Aimonetti which asked “did you ever try #golang? If not, can you tell me why? If yes, what was the first blocker/annoyance you encountered?”]

I’m a dabbler in Go [1], by which I mean I know enough to be considered dangerous but not enough to be proficient. I’ve done a number of katas and even paired on some simple tools my team has built in Go. There is so much to like about it that I’ve had cause to prefer looking for 3rd party tools written in it in the faint hope that I might at some point be able to contribute one day. But, every time I pull the source code I end up wasting so much time trying to get the thing to build because Go has its own opinions about how the source is laid out and tools are built that don’t match the way I (or most other tools I’ve used) work.

Hello, World!

Go is really easy to get into, on Windows it’s as simple as:

> choco install golang

This installs the compiler and standard libraries and you’re all ready to get started. The obligatory first program is as simple as doing:

> pushd \Dev
> mkdir HelloWorld
> notepad HelloWorld.go
> go build HelloWorld.go
> HelloWorld

So far, so good. It’s pretty much the same as if you were writing in any other compiled language – create folder, create source file, build code, run it.

Along the way you may have run into some complaint about the variable GOPATH not being set. It’s easily fixed by simply doing:

> set GOPATH=%CD%

You might have bothered to read up on all the brouhaha, got side-tracked and discovered that it’s easy to silence the complaint without any loss of functionality by setting it to point to anywhere. After all the goal early on is just to get your first program built and running not get bogged down in tooling esoterica.

When I reached this point I did a few katas and used a locally installed copy of the excellent multi-language exercise tool cyber-dojo.org to have a play with the test framework and some of the built-in libraries. Using a tool like cyber-dojo meant that GOPATH problems didn’t rear their ugly head again as it was already handled by the tool and the katas only needed standard library stuff.

The first non-kata program my team wrote in Go (which I paired on) was a simple HTTP smoke test tool that also just lives in the same repo as the service it tests. Once again their was nary a whiff of GOPATH issues here either – still simple.

Git Client, But not Quite

The problems eventually started for me when I tried to download a 3rd party tool off the internet and build it [2]. Normally, getting the source code for a tool from an online repository, like GitHub, and then building it is as simple as:

> git clone https://…/tool.git
> pushd tool
> build

Even if there is no Windows build script, with the little you know about Go at this point you’d hope it to be something like this:

> go build

What usually happens now is that you start getting funny errors about not being able to find some code which you can readily deduce is some missing package dependency.

This is the point where you start to haemorrhage time as you seek in vain to fix the missing package dependency without realising that the real mistake you made was right back at the beginning when you used “git clone” instead of “go get”. Not only that but you also forgot that you should have been doing all this inside the “%GOPATH%\src” folder and not in some arbitrary TEMP folder you’d created just to play around in.

The goal was likely to just build & run some 3rd party tool in isolation but that’s not the way Go wants you to see the world.

The Folder as a Sandbox

The most basic form of isolation, and therefore version control, in software development is the humble file-system folder [3]. If you want to monkey with something on the side just make a copy of it in another folder and you know your original is safe. This style of isolation (along with other less favourable forms) is something I’ve written about in depth before in my C Vu In the Toolbox column, see “The Developer’s Sandbox”.

Unfortunately for me this is how I hope (expect) all tools to work out of the box. Interestingly Go is a highly opinionated language (which is a good thing in many cases) that wants you to do all your coding under one folder, identified by the GOPATH variable. The rationale for this is that it reduces friction from versioning problems and helps ensure everyone, and everything, is always using a consistent set of dependencies – ideally the latest.

Tool User, Not Developer

That policy makes sense for Google’s developers working on their company’s tools, but I’m not a Google developer, I’m a just a user of the tool. My goal is to be able to build and run the tool. If there happens to be a simple bug that I can fix, then great, I’d like to do that, but what I do not have the time for is getting bogged down in library versioning issues because the language believes everyone, everywhere should be singing from the same hymn sheet. I’m more used to a world where dependencies move forward at a pace dictated by the author not by the toolchain itself. Things can still move quickly without having to continually live on the bleeding edge.

Improvements

As a Go outsider I can see that the situation is definitely improving. The use of a vendor subtree to house a snapshot of the dependencies in source form makes life much simpler for people like me who just want to use the tool hassle free and dabble occasionally by fixing things here and there.

In the early days when I first ran into this problem I naturally assumed it was my problem and that I just needed to set the GOPATH variable to the root of the repo I had just cloned. I soon learned that this was a fools errand as the repo also has to buy into this and structure their source code accordingly. However a variant of this has got some traction with the gb tool which has (IMHO) got the right idea about isolation but sadly is not the sanctioned approach and so you’re dicing with the potential for future impedance mismatches. Ironically to build and install this tool requires GOPATH to still be working the proper way.

The latest version of Go (1.8) will assume a default location for GOPATH (a “go” folder under your profile) if it’s not set but that does not fix the fundamental issue for me which is that you need to understand that any code you pull down may be somewhere unrelated on your file-system if you don’t understand how all this works.

Embracing GOPATH

Ultimately if I am going to properly embrace Go as a language, and I would like to do more, I know that I need to stop fighting the philosophy and just “get with the programme”. This is hard when you have a couple of decades of inertia to overcome but I’m sure I will eventually. It’s happened enough times now that I know what the warnings signs are and what I need to Google to do it “the Go way”.

Okay, so I don’t like or agree with all of (that I know) the choices the Go language has taken but then I’m always aware of the popular quote by Bjarne Stroustrup:

“There are only two kinds of languages: the ones people complain about and the ones nobody uses.”

This post is but one tiny data point which covers one of the very few complaints I have about Go, and even then it’s just really a bit of early friction that occurs at the start of the journey if you’re not a regular. Still, better that than yet another language nobody uses.

 

[1] Or golang if you want to appease the SEO crowd :o).

[2] It was winrm-cli as I was trying to put together a bug report for Terraform and there was something going wrong with running a Windows script remotely via WinRM.

[3] Let’s put old fashioned technologies like COM to one side for the moment and assume we have learned from past mistakes.

Manual Mutation Testing

Chris Oldwood from The OldWood Thing

One of the problems when making code changes is knowing whether there is good test coverage around the area you’re going to touch. In theory, if a rigorous test-first approach is taken no production code should be written without first being backed by a failing test. Of course we all know the old adage about how theory turns out in practice [1]. Even so, just because a test has been written, you don’t know what the quality of it and any related ones are.

Mutation Testing

The practice of mutation testing is one way to answer the perennial question: how do you test the tests? How do you know if the tests which have been written adequately cover the behaviours the code should exhibit? Alternatively, as a described recently in “Overly Prescriptive Tests”, are the tests too brittle because they require too exacting a behaviour?

There are tools out there which will perform mutation testing automatically that you can include as part of your build pipeline. However I tend to use them in a more manual way to help me verify the tests around the small area of functionality I’m currently concerned with [2].

The principle is actually very simple, you just tweak the production code in a small way that would likely mimic a real change and you see what tests fail. If no tests fail at all then you probably have a gap in your spec that needs filling.

Naturally the changes you make on the production code should be sensible and functional in behaviour; there’s no point in randomly setting a reference to null if that scenario is impossible to achieve through the normal course of events. What we’re aiming for here is the simulation of an accidental breaking change by a developer. By tweaking the boundaries of any logic we can also check that our edge cases have adequate coverage too.

There is of course the possibility that this will also unearth some dead code paths too, or at least lead you to further simplify the production code to achieve the same expected behaviour.

Example

Imagine you’re working on a service and you spy some code that appears to format a DateTime value using the default formatter. You have a hunch this might be wrong but there is no obvious unit test for the formatting behaviour. It’s possible the value is observed and checked in an integration or acceptance test elsewhere but you can’t obviously [3] find one.

Naturally if you break the production code a corresponding test should break. But how badly do you break it? If you go too far all your tests might fail because you broke something fundamental, so you need to do it in varying degrees and observe what happens at each step.

If you tweak the date format, say, from the US to the UK format nothing may happen. That might be because the tests use a value like 1st January which is 01/01 in both schemes. Changing from a local time format to an ISO format may provoke something new to fail. If the test date is particularly well chosen and loosely verified this could well still be inside whatever specification was chosen.

Moving away from a purely numeric form to a more natural, wordy one should change the length and value composition even further. If we reach this point and no tests have failed it’s a good chance nothing will. We can then try an empty string, nonsense strings and even a null string reference to see if someone only cares that some arbitrary value is provided.

But what if after all that effort still no lights start flashing and the klaxon continues to remain silent?

What Does a Test Pass or Fail Really Mean?

In the ideal scenario as you slowly make more and more severe changes you would eventually hope for one or maybe a couple of tests to start failing. When you inspect them it should be obvious from their name and structure what was being expected, and why. If the test name and assertion clearly specifies that some arbitrary value is required then its probably intentional. Of course It may still be undesirable for other reasons [4] but the test might express its intent well (to document and to verify).

If we only make a very small change and a lot of tests go red we’ve probably got some brittle tests that are highly dependent on some unrelated behaviour, or are duplicating behaviours already expressed (probably better) elsewhere.

If the tests stay green this does not necessary mean we’re still operating within the expected behaviour. It’s entirely possible that the behaviour has been left completely unspecified because it was overlooked or forgotten about. It might be that not enough was known at the time and the author expected someone else to “fill in the blanks” at a later date. Or maybe the author just didn’t think a test was needed because the intent was so obvious.

Plugging the Gaps

Depending on the testing culture in the team and your own appetite for well defined executable specifications you may find mutation testing leaves you with more questions than you are willing to take on. You only have so much time and so need to find a way to plug any holes in the most effective way you can. Ideally you’ll follow the Boy Scout Rule and at least leave the codebase in a better state than you found it, even if that isn’t entirely to your own satisfaction.

The main thing I get out of using mutation testing is a better understanding of what it means to write good tests. Seeing how breaks are detected and reasoned about from the resulting evidence gives you a different perspective on how to express your intent. My tests definitely aren’t perfect but by purposefully breaking code up front you get a better feel for how to write less brittle tests than you might by using TDD alone.

With TDD you are the author of both the tests and production code and so are highly familiar with both from the start. Making a change to existing code by starting with mutation testing gives you a better orientation of where the existing tests are and how they perform before you write your own first new failing test.

Refactoring Tests

Refactoring is about changing the code without changing the behaviour. This can also apply to tests too in which case mutation testing can provide the technique for which you start by creating failing production code that you “fix” when the test is changed and the bar goes green again. You can then commit the refactored tests before starting on the change you originally intended to make.

 

[1] “In theory there is no difference between theory and practice; in practice there is.” –- Jan L. A. van de Snepscheut.

[2] Just as with techniques like static code analysis you really need to adopt this from the beginning if you want to keep the noise level down and avoid exploring too large a rabbit hole.

[3] How you organise your tests is a subject in its own right but suffice to say that it’s usually easier to find a unit test than an acceptance test that depends on any given behaviour.

[4] The author may have misunderstood the requirement or the requirement was never clear originally and so the the behaviour was left loosely specified in the short term.

Automate Only What You Need To

Chris Oldwood from The OldWood Thing

The meme tells us to “automate all the things” and it’s a noble cause which has sprung up as a backlash against the ridiculous amount of manual work we’ve often had to do in the past. However in our endeavour to embrace the meme we should not go overboard and lose sight of what we’re automating and why.

The Value

The main reason we tend to automate things is to save ourselves time (and by extension, money) by leveraging tools that can perform tasks quicker than we can, but also with more determinism and reliability, thereby saving even more time. For example, pasting a complex set of steps off a wiki page into a command prompt to perform a task is slower than an interpreter running a script and is fraught with danger as we might screw up at various points along the way and so end up not doing exactly what we’d intended. Ultimately computers are good at boring repetitive tasks whilst we humans are not.

However if we only do this operation once every six months and there are too many potential points of failure we might spend far longer trying to automate it than it actually takes to do carefully, manually. It’s a classic trade-off and like most things in IT there are some XKCD’s for that – “Automation” and “Is It Worth the Time”. They make sobering reading when you’re trying to work out how much time you might save automating something and therefore also gives a good indication of the maximum amount of time you should spend on achieving that.

Orchestration First, Actor Later

Where I think the meme starts to break down is when we get this balance wrong and begin to lose sight of where the real value is, thereby wasting time trying to automate not only all the steps but also wire it into some job scheduling system (e.g. CI server) so that once in a blue moon we can push a button and the whole task from start to finish is executed for us without further intervention.

The dream suggests at that point we can go off and do something else more valuable instead. Whilst this notion of autonomy is idyllic it can also come with a considerable extra up-front cost and any shortcuts are likely to buy us false security (i.e. it silently fails and we lose time investigating downstream failures instead).

For example there are many crude command prompt one-liners I’ve written in the past to pick up common mistakes that are trivial for me to run because they’ve been written to automate the expensive bit, not the entire problem. I often rely on my own visual system to filter out the noise and compensate for the impurities within the process. Removing these wrinkles is often where the proverbial “last 10% that takes 90% of the time” goes [1].

It’s all too easy to get seduced by the meme and believe that no automation task is truly complete until it’s fully automated.

An Example

In .Net when you publish shared libraries as NuGet packages you have a .nuspec file which lists the package dependencies. The library .csproj build file also has project dependencies for use with compilation. However these two sets of dependencies should be kept in sync [2].

Initially with only a couple of NuGet packages it was easy to do manually as I knew it was unlikely to change. However once the monolithic library got split it up the dependencies started to grow and manually comparing the relevant sections got harder and more laborious.

Given the text based nature of the two files (XML) it was pretty easy to write a simple shell one-liner to grep the values from the two sets of relevant XML tags, dump them in a file, and then use diff to show a side-by-side comparison. Then it just needed wrapping in a for loop to traverse the solution workspace.

Because the one-liner was mine I got to take various shortcuts like hardcoding the input path and temporary files along with “knowing” that a certain project was always misreported. At this point a previously manual process has largely been automated and as long as I run it regularly will catch any mistakes.

Of course it’s nice to share things like this so that others can take advantage after I’m gone, and it might be even better if the process can be added as a build step so that it’s caught the moment the problem surfaces rather than later in response to a more obscure issue. Now things begin to get tricky and we start to see diminishing returns.

First, the Gnu on Windows (GoW) toolset I used isn’t standard on Windows so now I need to make the one-liner portable or make everyone else match my tooling choice [3]. I also need to fix the hard coded paths and start adding a bit of error handling. I also need to find a way to remove the noise caused by the one “awkward” project.

None of this is onerous, but this all takes time and whilst I’m doing it I’m not doing something (probably) more valuable. The majority of the value was in being able to scale out this safety check, there is (probably) far less value in making it portable and making it run reliably as part of an automated build. This is because essentially it only needs to be run whenever the project dependencies change and that was incredibly rare once the initial split was done. Additionally the risk of not finding an impedance mismatch was small and should be caught by other automated aspects of the development process, i.e. the deployment and test suite.

Knowing When to Automate More

This scenario of cobbling something together and then finding you need to do it more often is the bread and butter of build & deployment pipelines. You often start out with a bunch of hacked together scripts which do just enough to allow the team to bootstrap itself in to an initial fluid state of delivery. This is commonly referred to as a walking skeleton because it forms the basis for the entire development process.

The point of starting with the walking skeleton rather than just diving headlong into features is to try and tackle some of the problems that historically got left until it was too late, such as packaging and deployment. In the modern era of continuous delivery we look to deliver a thin slice of functionality quickly and then build upon it piecemeal.

However it’s all too easy to get bogged down early on in a project and spend lots of time just getting the build pipeline up and running and have nothing functional to show for it. This has always made me feel a little uncomfortable as it feels as though we should be able to get away with far less than perhaps we think we need to.

In “Building the Pipeline - Process Led or Automation Led” and my even earlier post “Layered Builds” I’ve tried to promote a more organic approach that focuses on what I think really matters most which is a consistent and extensible approach. In essence we focus first on producing a simple, repeatable process that can be used locally to enable the application skeleton to safely evolve and then balance the need for automating this further along with the other features. If quality or speed of delivery drops and more automation looks to be the answer then it can be added with the knowledge that it’s being done for deliberate reasons, rather than because we’ve got carried away gold plating the build system based on what other people think it should do (i.e. a cargo cult mentality).

Technical Risk

The one caveat to being leaner about your automation is that you may (accidentally) put off addressing one or more technical risks because you don’t perceive them as risks. This leads us back to why the meme exists in the first place – failing to address certain aspects of software delivery until it’s too late. If there is a technical concern, address it, but only to the extent that the risk is understood, you may not need to do anything about it now.

With a team of juniors there is likely to be far more unknowns [4] than with a team of experienced programmers, therefore the set of perceived risks will be higher. Whilst you might not know the most elegant approach to solving a problem, knowing an approach already reduces the risk because you know that you can trade technical debt in the short term for something else more valuable if necessary.

Everything is Negotiable

The thing I like most about an agile development process is that every trade-off gets put front-and-centre, everything is now negotiable [5]. Every task now comes with an implicit question: is this the most valuable thing we could be doing?

Whilst manually building a private cloud for your production system using a UI is almost certainly not the most scalable approach, neither is starting day one of a project by diving into, say, Terraform when you don’t even know what you’re supposed to be building. There is nothing wrong with starting off manually, you just need to be diligent and ensure that your decision to only automate “enough of the things” is always working in your favour.

 

[1] See “The Curse of NTLM Based HTTP Proxies”.

[2] I’m not aware of Visual Studio doing this yet although there may now be extensions and tools written by others I’m not aware of.

[3] Yes, the Unix command line tools should be ubiquitous and maybe finally they will be with Bash on Windows.

[4] See “Turning Unconscious Incompetence to Conscious Incompetence”.

[5] See “Estimating is Liberating”.

LINQ: Did You Mean First(), or Really Single()?

Chris Oldwood from The OldWood Thing

TL;DR: if you see someone using the LINQ method First() without a comparator it’s probably a bug and they should have used Single().

I often see code where the author “knows” that a sequence (i.e. an Enumerable<T>) will result in just one element and so they use the LINQ method First() to retrieve the value, e.g.

var value = sequence.First();

However there is also the Single() method which could be used to achieve a similar outcome:

var value = sequence.Single();

So what’s the difference and why do I think it’s probably a bug if you used First?

Both First and Single have the same semantics for the case where the the sequence is empty (they throw) and similarly when the sequence contains only a single element (they return it). The difference however is when the sequence contains more than one element – First discards the extra values and Single will throw an exception.

If you’re used to SQL it’s the difference between using “top” to filter and trying extract a single scalar value from a subquery:

select top 1 x as [value] from . . .

and

select a, (select x from . . .) as [value] from . . .

(The latter tends to complain loudly if the result set from the subquery is not just a single scalar value or null.)

While you might argue that in the face of a single-value sequence both methods could be interchangeable, to me they say different things with Single begin the only “correct” choice.

Seeing First says to me that the author knows the sequence might contain multiple values and they have expressed an ordering which ensures the right value will remain after the others have been consciously discarded.

Whereas Single suggests to me that the author knows this sequence contains one (and only one) element and that any other number of elements is wrong.

Hence another big clue that the use of First is probably incorrect is the absence of a comparator function used to order the sequence. Obviously it’s no guarantee as the sequence might be being returned from a remote service or function which will do the sorting instead but I’d generally expect to see the two used together or some other clue (method or variable name, or parameter) nearby which defines the order.

The consequence of getting this wrong is that you don’t detect a break in your expectations (a multi-element sequence). If you’re lucky it will just be a test that starts failing for a strange reason, which is where I mostly see this problem showing up. If you’re unlucky then it will silently fail and you’ll be using the wrong data which will only manifest itself somewhere further down the road where it’s harder to trace back.

Dumbing Down Code

Chris Oldwood from The OldWood Thing

A conversation with a colleague, which was originally sparked off by “C# BAD PRACTICES: Learn how to make a good code by bad example” (an article about writing clear code), reminded me about a recent change I had just made. When I reflected back I began to consider whether I had just “dumbed down” the code instead of keeping my original possibly simpler version. I also couldn’t decide whether the reason I had changed it was because I was using an old idiom which might now be unfamiliar or because I just didn’t think it was worth putting the burden on the reader.

Going Loopy

The change I made was to add a simple retry loop to some integration test clean-up code that would occasionally fail on a laggy VM. The loop just needed to retry a small piece of code a few times with a brief pause in between each iteration.

My gut instinct was to write the loop like this:

int attempt = 5;

while (attempt-- > 0)
{
  . . .
}

Usually when I need to write a “for” loop these days, I don’t. By far the most common use case is to iterate over a sequence and so I’d use LINQ in C# or, say, a pipeline (or foreach) in PowerShell. If I did need to manually index an array (which is rare) I’d probably go straight for a classic for loop.

After I wrote and tested the code I took a step back just to review what I’d written and realised I felt uncomfortable with the while loop. Yes this was only test code but I don’t treat it as a second class citizen, it still gets the same TLC as production code. So what did I not like?

  • While loops are much rarer than for loops these days.
  • The use of the pre and post-decrement operators in a conditional expression is also uncommon.
  • The loop counts down instead of up.

Of these the middle one – the use of the post-decrement operator in a conditional expression – was the most concerning. Used by themselves (in pre or post form) to just mutate a variable, such as in the final clause of a for loop [1], seems trivial to comprehend, but once you include it in a conditional statement the complexity goes up.

Hence there is no difference between --attempt and attempt-- when used in a simple arithmetic adjustment, but when used in a conditional expression it matters whether the value is decremented before or after the comparison is made. If you get it wrong the loop executes one less time than you expect (or vice-versa) [2].

Which leads me to my real concern – how easy is it to reason about how many times the loop executes? Depending on the placement of the decrement operator it might be one less than the number “attempt” is initialised with at the beginning.

Of course you can also place the “while” comparison at the end of the block which means it would execute at least once, so subconsciously this might also cause you to question the number of iterations, at least momentarily.

Ultimately I know I’ve written the loop so that the magic number “5” which is used to initialise the attempt counter represents the maximum number of iterations, but will a reader trust me to have done the same? I think they’ll err on the side of caution and work it out for themselves.

Alternatives

The solution I went with in the end was this:

const int maxAttempts = 5;

for (int attempt = 0; attempt != maxAttempts;
                                            ++attempt)
{
  . . .
}

Now clearly this is more verbose, but not massively more verbose than my original “while” loop. However the question is whether (to quote Sir Tony Hoare [3]) it obviously contains less bugs that the original. Being au fait with both forms it’s hard for me to decide so I’m trying to guess that the reader would prefer the latter.

Given that we don’t care what the absolute value of “attempt” is, only that we execute the loop N times, I did consider some other approaches, at least momentarily. Both of these examples use methods from the Enumerable class.

The first generates a sequence of 5 numbers starting from the “arbitrary” number 1:

const int maxAttempts = 5;

foreach (var _ in Enumerable.Range(1, maxAttempts))
{
  . . .
}

The second also generates a sequence of 5 numbers but this time by repeating the “arbitrary” number 0:

const int maxAttempts = 5;

foreach (var _ in Enumerable.Repeat(0, maxAttempts))
{
  . . .
}

In both cases I have dealt with the superfluous naming of the loop variable by simply calling it “_”.

I discounted both these ideas purely on the grounds that they’re overkill. It just feels wrong to be bringing so much machinery into play just to execute a loop a fixed number of times. Maybe my brain is addled from too much assembly language programming in my early years but seemingly unnecessary waste is still a hard habit to shake.

As an aside there are plenty of C# extension methods out there which people have written to try and reduce this further so you only need write, say, “5.Times()” or “0.To(5)” but they still feel like syntactic sugar just for the sake of it.

Past Attempts

This is not the first time I’ve questioned whether it’s possible to write code that’s perhaps considered too clever. Way back in 2012 I wrote “Can Code Be Too Simple?” which looked at some C++ code I had encountered 10 years ago and which first got me thinking seriously about the subject.

What separates the audience back then and the one now is the experience level of the programmers who will likely tackle this codebase. A couple of years later in “Will Your Successor Be a Superstar Programmer?” I questioned whether you write code for people of your own calibre or the (inevitable) application support team who have the unenviable task of having to be experts at two disciplines – support and software development. As organisations move towards development teams owning their services this issue is diminishing.

My previous musings were also driven by my perception of the code other people wrote, whereas this time I’m reflecting solely on my own actions. In particular I’m now beginning to wonder if my approach is in fact patronising rather than aiding? Have I gone too far this time and should I give my successors far more credit? Or doesn’t it matter as long as we just don’t write “really weird” code?

 

[1] In C++ iterators can be implemented as simple pointers or complex objects (e.g. STL container iterators in debug builds) and so you tend to be aware of the difference because of the performance impacts it can have.

[2] I was originally going to write while (attempt-- != 0), again because in C++ you normally iterate from “begin” to “!= end”, but many devs seem to be overly defensive and favour using the > and < comparison operators instead.

[3] “There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” – Sir Tony Hoare.

Journey Code

Chris Oldwood from The OldWood Thing

In the move from a Waterfall approach to a more Agile way of working we need to learn to be more comfortable with our software being in a less well defined state. It’s not any less correct per-se, it’s just that we may choose to prioritise our work differently now so that we tackle the higher value features first.

Destination Unknown

A consequence of this approach is that conversations between, say, traditional architects (who are talking about the system as they expect it to end up) and those developing it (who know it’s only somewhere between started and one possible future), need interpretation. Our job as programmers is to break these big features down into much smaller units that can be (ideally) independently scoped and prioritised. Being agile is about adapting to changing requirements and that’s impossible if every feature needs to designed, delivered and tested in it’s entirely.

In an attempt to bridge these two perspectives I’ve somewhat turned into a broken record by frequently saying “but that’s the destination, we only need to start the journey” [1]. This is very much a statement to remind our (often old and habitual) selves that we don’t build software like that anymore.

Build the Right Thing

The knock on effect is that any given feature is likely only partially implemented, especially in the early days when what we are still trying to explore what it is we’re actually trying to build in the first place. For example, data validation is feature we want in the product before going live, but we can probably make do with only touching on the subject lightly whilst we explore what data even needs validating in the first place.

This partially implemented feature has started to go by the name “journey code”. While you can argue that all code is essentially malleable as the system grows from its birth to its eventual demise, what we are really trying to convey here is code which cannot really be described as having been “designed”. As such it does not indicate in any meaningful way the thoughts and intended direction of the original author – they literally did the simplest possible thing to make the acceptance test pass and that’s all.

Destination in Sight

When we finally come to play out the more detailed aspects of the feature that becomes the time at which we intend to replace what we did along the journey with where we think the destination really is. Journey code does not even have to have been written in response to the feature itself, it may just be some infrastructure code required to bootstrap exploring a different one, such as logging or persistence.

What it does mean is that collaboration is even more essential when the story is eventually picked up for fleshing out to ensure that we are not wasting time trying to fit in with some design that never existed in the first place. Unless you are someone who happily ignores whatever code anyone else ever writes anyway, and they do exist, you will no doubt spend at least some time trying to understand what’s gone before you. Hence being able to just say “journey code” has become a nice shorthand for “it wasn’t designed so feel free to chuck it away and do the right thing now we know what really needs doing”.

 

[1] I covered a different side-effect of this waterfall/agile impedance mismatch in “Confusion Over Waste”.

Weak and Strong Idempotency

Chris Oldwood from The OldWood Thing

In my recent post “PUT vs POST and Idempotency” I touched on the different degrees of idempotency. In retrospect I think it’s better to describe our idempotency guarantee as weak or strong. Either way this post looks at why you might choose one over the other.

Weak Idempotency

At one end of the spectrum we can choose to classify our unique operations with a single scalar value such as a large number, UUID or complex string value (with other embedded values like the date & time). In this scenario we assume that a simple value represents the identity of the operation in much the same way that a reference (or pointer) is the unique address of a value in languages like C# and Java (or C++).

In SQL terms your write with an idempotency check might look something like this:

insert into Operations values(...)
where not exists
(
  select 1
  from Operations
  where OperationId <> @operationId
)

For a document oriented database like MongoDB where you might store everything in a single object to achieve atomic updates you could do an equivalent find-and-modify-where style query. If the data model requires multiple writes you might still be able to order them carefully to achieve the desired effect (See “Observable State versus Persisted State”).

Strong Idempotency

In contrast, at the other end, we define an idempotent operation based not only on some unique operation identifier but also the arguments to the operation itself, i.e. the request body in HTTP. In this instance we are not willing to accept that a replay can be distinguished solely by it’s handle but also wish to ensure the actual details match too, just like a “deep” value equality comparison.

In SQL terms we potentially now have two steps, one to attempt the write and, if a conflict occurs, a subsequent read to validate it:

insert into Operations values(...)
where not exists
(
  select 1
  from Operations
  where OperationId <> @operationId
)

select *
from Operations
where OperationId = @operationId

If replays are rare (and I’d expect them to be) you should find this adds only a little extra overhead but also allows you to report broken requests properly.

Identity and Equality

Although it might seem like these weak and strong guarantees are directly analogous to the concept of identity and equality in normal programming there is a subtle difference.

Whilst in the weak case we are happy for the unique operation identifier to act as our sole definition of equality, in the strong guarantee both the identifier and the input values [1] take part in the comparison. In a programming language we tend to choose either the reference based comparison or the value based approach but not both [2].

Failure Modes

In theory it should be adequate for us to take the simpler approach, and as the provider of a service it’s almost certainly slightly cheaper for us to do that. However the downside is that it makes it harder to diagnose missed updates caused by the service erroneously treating them as replays. But that’s just an edge case right?

During development when your clients are coding against your API for the first time they may make a simple mistake and not correctly generate unique requests. This is not quite as uncommon as you may think.

Another potential source of mistaken uniqueness can come from transforming requests, e.g. from a message queue. If you’re integrating with a legacy system what you may think is unique might in fact turn out to be an artefact of the small data set you sampled and it could turn out only to be mostly unique [3].

Answering these kinds of support queries (“what happened to my request?“) can easily chip away at a team’s time. The more effort you put into making it harder for consumers to incorrectly use your API the less time you’ll need to support them.

Debug Switch

Although the need to verify a replay should be pretty infrequent you could easily wrap the validation logic in a feature switch so that development and test environments provide better diagnostics whilst production has less work to do. But you should measure before considering dropping such a low-cost feature.

Monitoring

What will help both the service and it’s consumers is if you can afford to capture when replays do occur so that the monitoring can alert if there appears to be a sudden rise in them. For example a client may go into an failure state that keeps replaying the same message over and over again. Alternatively the rise might suggest some configuration problem where upstream traffic is being duplicated. From a functional point of view the entire system could be behaving correctly but wasting resources in the process that could lead to a more catastrophic failure state.

Although a client should never need to behave differently from a functional perspective when a replay occurs, the flip side to the service tracking the event is for the client to report it as well. They could do this if, say, the return code changes in the replay case. In HTTP for example there is 200 OK and 201 Created which could be used to tell the client what happened.

Middle Ground

As I described in my previous post there can also be points between the two extremes. One of the problems with modern document oriented databases is that they have limits on the size of a single document. Depending on the anticipated depth of history this may exceed the practical document limit and therefore require some compromises to be made.

If no false replays can be tolerated than perhaps the model needs splitting across documents which removes the ability to do simple atomic updates. If the history grows slowly and data retention is negotiable then maybe the fine details of much older operations can be discarded to make room for the new.

Brave New World

The new era of databases continues to bring interesting challenges and trade offs due to their differing goals from a traditional big iron RDBMS. Pushing back on some requirements (e.g. “Don’t Be Afraid to Throw Away Data”) is essential if we are to balance their constraints with correctness.

 

[1] You may store server derived properties as well, like a “created at” timestamp. These are of course ignored in any comparison.

[2] Of course breaking an invariant like the relationship between Equals() and GetHashCode() is a popular mistake which can make objects appear to go missing in maps & sets.

[3] I know of one organisation where many people mistakenly believed their customer identifiers were unique, but we discovered they were only truly unique if you took their legacy back-office sources into account too.

The Cost of Long-Lived Feature Branches

Chris Oldwood from The OldWood Thing

Many moons ago I was working at large financial organisation on one of their back office systems. The ever increasing growth of the business meant that our system, whilst mostly distributed, was beginning to creak under the strain. I had already spent a month tracking down some out-of-memory problems in the monolithic orchestration service [1] and a corporate programme to reduce hardware meant we needed to move to a 3rd party compute platform to save costs by sharing hardware.

Branch Per Project

The system was developed by a team (both on-shore and off-shore) numbering around 50 and the branching strategy was based around the many ongoing projects, each of which typically lasted many months. Any BAU work got done on the tip of whatever the last release branch was.

While this allowed the team to hack around to their heart’s content without bumping into any other projects it also meant merge problems where highly likely when the time came. Most of the file merges were trivial (i.e. automatic) but there were more than a few awkward manual ones. Luckily this was also in a time before “relentless refactoring” too so the changes tended to be more surgical in nature.

Breaking up the Monolith

Naturally any project that involved taking one large essential service apart by splitting it into smaller, more distributable components was viewed as being very risky. Even though I’d managed to help get the UAT environment into a state where parallel running meant regressions were usually picked up, the project was still held at arms length like the others. The system had a history of delays in delivering, which was unsurprising given the size of the projects, and so naturally we would be tarred with the same brush.

After spending a bit of time working out architecturally what pieces we needed and how we were going to break them out we then set about splitting the monolith up. The service really had two clearly distinct roles and some common infrastructure code which could be shared. Some of the orchestration logic that monitored outside systems could also be split out and instead of communicating in-process could just as easily spawn other processes to do the heavy lifting.

Decomposition Approach

The use of an enterprise-grade version control system which allows you to keep your changes isolated means you have the luxury of being able to take the engine to pieces, rebuild it differently and then deliver the new version. This was the essence of the project, so why not do that? As long as at the end you don’t have any pieces left over this probably appears to be the most efficient way to do it, and therefore was the method chosen by some of the team.

An alternative approach, and the one I was more familiar with, was to extract components from the monolith and push them “down” the architecture so they turn into library components. This forces you to create abstractions and decouple the internals. You can then wire them back into both the old and new processes to help verify early on that you’ve not broken anything while you also test out your new ideas. This probably appears less efficient as you will be fixing up code you know you’ll eventually delete when the project is finally delivered.

Of course the former approach is somewhat predicated on things never changing during the life of the project…

Man the Pumps!

Like all good stories we never got to the end of our project as a financial crisis hit which, through various business reorganisations, meant we had to drop what we were doing and make immediate plans to remediate the current version of the system. My protestations that the project we were currently doing would be the answer if we could just finish it, or even just pull in parts of it, were met with rejection.

So we just had to drop months of work (i.e. leave it on the branch in stasis) and look for some lower hanging fruit to solve the impending performance problems that would be caused by a potential three times increase in volumes.

When the knee jerk reaction began to subside the project remained shelved for the foreseeable future as a whole host of other requirements came flooding in. There was an increase in data volume but there were now other uncertainties around the existence of the entire system itself which meant it never got resurrected in the subsequent year, or the year after so I’m informed. This of course was still on our current custom platform and therefore no cost savings could be realised from the project work either.

Epilogue

This project was a real eye opener for me around how software is delivered on large legacy systems. Having come from a background where we delivered our systems in small increments as much out of a lack of tooling (a VCS product with no real branching support) as a need to work out what the users really wanted, it felt criminal to just waste all that effort.

In particular I had tried hard to encourage my teammates to keep the code changes in as shippable state as possible. This wasn’t out of any particular foresight I might have had about the impending economic downturn but just out of the discomfort that comes from trying to change too much in one go.

Ultimately if we had made each small refactoring on the branch next being delivered (ideally the trunk [2]) when the project was frozen we would already have been reaping the benefits from the work already done. Then, with most of the work safely delivered to production, the decision to finish it off becomes easier as the risk has largely been mitigated by that point. Even if a short hiatus was required for other concerns, picking the final work up later is still far easier or could itself even be broken down into smaller deliverable chunks.

That said, it’s easy for us developers to criticise the actions of project managers when they don’t go our way. Given the system’s history and delivery record the decision was perfectly understandable and I know I wouldn’t want to be the one making them under those conditions of high uncertainty both inside and outside the business.

Looking back it seems somewhat ridiculous to think that a team would split up and go off in different directions on different branches for many months with little real coordination and just hope that when the time comes we’d merge the changes [3], fix the niggles and release it. But that’s exactly how some larger teams did (and probably even still do) work.

 

[1] The basis of this work was written up in “Utilising More Than 4GB of Memory in a 32-bit Windows Process”.

[2] See “Branching Strategies”.

[3] One developer on one project spend most of their time just trying to resolve the bugs that kept showing up due to semantic merge problems. They had no automated tests either to reduce the feedback loop and a build & deployment to the test environment took in the order of hours.