Good Stories Assure the Architecture

Chris Oldwood from The OldWood Thing

One of the problems a team can run into when they adopt a more agile way of working is they struggle to frame their backlog in the terms of user focused stories. This is a problem I’ve written about before in “Turning Technical Tasks Into User Stories” which looked at the problem for smaller units of work. Even if the team can buy into that premise for the more run-of-the-mill features it can still be a struggle to see how that works for the big ticket items like the system’s architecture.

The Awkward Silence

What I’ve experienced is that the team can start to regress when faced with discussions around what kind of architecture to aim for. With a backlog chock full of customer pleasing functionality the architectural conversations might begin to take a bit of a back seat as the focus is on fleshing out the walking skeleton with features. Naturally the nervousness starts to set in as the engineers begin to wonder when the architecture is going to get the attention it rightly deserves. It’s all very well supporting a handful of “friendly” users but what about when you have real customers who’ve entrusted you with their data and they want to make use of it without a moments notice at any hour of the day?

The temptation, which should be resisted, can be to see architectural work as outside the scope of the core backlog – creating a separate backlog for stuff “the business does not understand”. This way can lead to a split in the backlog, and potentially even two separate backlogs – a functional and a non-functional one. This just makes prioritisation impossible. Also burying the work kills transparency, eventually erodes trust, and still doesn’t get you the answers you really need.

Instead, the urge should be to frame the architectural concerns in terms the stakeholder does understand, so that the business can be more informed about their actual benefits. In addition, when “The Architecture” is a journey and not a single destination there is no longer one set of benefits to aim for there are multiple trade-offs as the architecture evolves over time, changing at each step to satisfy the ongoing needs of the customer(s) along the way. There is in essence no “final solution” there is only “what we need for the foreseeable future”.

Tell Me a Story

So, what do I mean by “good stories”? Well, the traditional way this goes is for an analyst to solicit some non-functional requirements for some speculative eventual system behaviour. If we’re really lucky it might end up in the right ballpark at one particular point in the future. What’s missing from this scene is a proper conversation, a proper story – one with a beginning, a middle, and an end – where we are today, the short term and the longer term vision.

But not only do we need to get a feel for their aspirations we also need quantifiable metrics about how the system needs to perform. Vague statements like “fast enough” are just not helpful. A globally accessible system with an anticipated latency in the tens of milliseconds will need to break the law of physics unless we trade-off something else. We also need to know how those exceptional events like Cyber Monday are to be factored into the operation side.

It’s not just about performance either. In many cases end users care that their data is secure, both in-flight (over the network) and at rest, although they likely have no idea what this actually means in practice. Patching servers is a technical task, but the bigger story is about how the team responds to a vulnerability which may make patching irrelevant. Similarly database backups are not the issue it’s about service availability – you cannot be highly available if the loss of an entire data centre potentially means waiting for a database to be restored from scratch elsewhere.

Most of the traditional conversations around non-functional requirements focus entirely on the happy path, for me the conversation doesn’t really get going until you start talking about what needs to happen when the system is down. It’s never a case of “if”, but “when” it fails and therefore mitigating these problems features heavily in our architectural choices. It’s an uncomfortable conversation as we never like discussing failure but that’s what having “grown up” conversations mean.

Incremental Architecture

Although I’ve used the term “story” in this post’s title, many of the issues that need discussing are really in the realm of “epics”. However we shouldn’t get bogged down in the terminology, instead the essence is to remember to focus on the outcome from the user’s perspective. Ask yourselves how fast, how secure, how available, etc. it needs to be now, and how those needs might change in response to the system’s, and the business’s growth.

With a clearer picture of the potential risks and opportunities we are better placed to design and build in small increments such that the architecture can be allowed to emerge at a sustainable rate.

Don’t Hide the Solution Structure

Chris Oldwood from The OldWood Thing

Whenever you join an existing team and start work on their codebase you need to orientate yourself so that you have a feel for the system’s architecture and design. If you’re lucky there is some documentation, perhaps nice diagrams to give you an overview. Hopefully you also have an extensive suite of tests to tell you how the system behaves.

More than likely there is nothing or very little to go on, and if it’s a truly legacy system any documentation could well be way out of date. At this point you pretty much only have the source code to work from. Whilst this is the source of truth, the amount of code you need to read to become au fait with all the various high-level concepts depends in part on how well it’s laid out.

Static Structure

Irrespective of whether you like to think of your layers in terms of onions or brick walls, all code essentially gets organised on disk and that means the solution structure is hierarchical in nature. In the most popular languages that support namespaces, these are also hierarchical and are commonly laid out on disk to reflect the same hierarchy [1].

Although the compiler is happy to just hoover up source code from the entire solution and largely ignore the relative position of the callers and callees there are useful conventions, which if honoured, allow you to reason and refactor the code more easily due to lower coupling. For example, defining an interface in the same source file as a class that implements it suggests a different inheritance use than when the interface sits externally further up the hierarchy. Also, seeing code higher up the hierarchy referencing types deeper down in an unrelated branch is another smell, of an abstraction potentially depending on an implementation detail.

Navigating the Structure

One of the things I’ve noticed in recent years whilst pairing is that many developers appear to navigate the source code solely through their IDE, and within the IDE by using features like “go to definition (implementation)”. Some very rarely see the solution structure because they hide it to gain more screen real estate for the source file of current interest [2].

Hence the only time the solution structure is visible is when there is a need to add a new source file. My purely anecdotal evidence suggests that this will be added without a great deal of thought as the code can be easy located in future directly by the author through its class name or another reference; they never have to consider where it “logically” resides.

Sprawling Suburbs

The net result is that namespaces and packages suffer from urban sprawl as they slowly accrete more and more code. This newer code adds more dependencies and so the package as a whole acquires an ever increasing number of dependencies. Left unchecked this can lead to horrible cyclic dependencies that are a nightmare to resolve.

I recently had the opportunity to revisit the codebase for a greenfield system I had started a few years before. We initially partitioned the code into a few key assemblies to get ourselves going and so I was somewhat surprised to still see the same assemblies a few years later, albeit massively overgrown with extra responsibilities. As a consequence even their simple home-grown tools had bizarre dependencies dragged in through bloated shared libraries [3].

Take a Stroll

So in future, instead of taking the Underground (subway) through your codebase every day, stop, and take a stroll every now-and-then around the paths. The same rules about cohesion within the methods of a class also apply at the higher levels of design – classes in a namespace, namespaces in an assembly, assemblies in a solution, etc. Then you’ll find that as the system grows it’s easier to refactor at the package level [3].

(For more on this topic see my older post “Who’s Maintaining the 100 Foot View?”.)

 

[1] Annoyingly this is not a common practice in the C++ codebases I’ve worked on.

[2] If I was being flippant I might suggest that if you really need the space the code may be too complicated, as I once did on Twitter here.

[3] I once dragged in a project’s shared library for a few useful extension methods to use in a simple console app and found I had pulled in an IoC container and almost a dozen other NuGet dependencies!

[4] In C# the internal access modifier has zero effect if you stick all your code into one assembly.

Are Refactoring Tools Less Effective Overall?

Chris Oldwood from The OldWood Thing

Prior to the addition of automatic refactoring tools to modern IDEs refactoring was essentially a manual affair. You would make a code change, hit build, and then fix all the compiler errors (at least for statically typed languages). This technique is commonly known as “leaning on the compiler”. Naturally the operation could be fraught with danger if you were far too ambitious about the change, but knowing when you could lean on the compiler was part of the art of refactoring safely back then.

A Hypothesis

Having lived through both eras (manual and automatic) and paired with developers far more skilled with the automatic approach I’ve come up with a totally non-scientific hypothesis that suggests automatic refactoring tools are actually less effective than the manual approach, overall.

I guess the basis of this hypothesis pretty much hinges on what I mean by “effective”. Here I’m suggesting that automatic tools help you easily refactor to a local minima but not to a global minima [1]; consequently the codebase as a whole ends up in a less coherent state.

Shallow vs Deep Refactoring

The goal of an automatic refactoring tool appears to be to not break your code – it will only allow you to use it to perform a simple refactoring that can be done safely, i.e. if the tool can’t fix up all the code it can see [2] it won’t allow you to do it in the first place. The consequence of this is that the tool constantly limits you to taking very small steps. Watching someone refactor with a tool can sometimes seem tortuous as they may need to use so many little refactoring steps to get the code into the desired state because you cannot make the leaps you want in one go unless you switch to manual mode.

This by itself isn’t a bad thing, after all making a safe change is clearly A Good Thing. No, where I see the problem is that by fixing up all the call sites automatically you don’t get to see the wider effects of the refactoring you’re attempting.

For example the reason you’d choose to rename a class or method is because the existing one is no longer appropriate. This is probably because you’re learned something new about the problem domain. However that class or method does not exist in a vacuum, it has dependencies in the guise of variable names and related types. It’s entirely likely that some of these may now be inappropriate too, however you won’t easily see them because the tool has likely hidden them from you.

Hence one of the “benefits” of the old manual refactoring approach was that as you visited each broken call site you got to reflect on your change in the context of where it’s used. This often led to further refactorings as you began to comprehend the full nature of what you had just discovered.

Blue or Red Pill?

Of course what I’ve just described could easily be interpreted as the kind of “black hole” that many, myself included, would see as an unbounded unit of work. It’s one of those nasty rabbit holes where you enter and, before you know it, you’re burrowing close to the Earth’s core and have edited nearly every file in the entire workspace.

Yes, like any change, it takes discipline to stick to the scope of the original problem. Just because you keep unearthing more and more code that no longer appears to fit the new model it does not mean you have to tackle it right now. Noticing the disparity is the first step towards fixing it.

Commit Review

It’s not entirely true that you won’t see the entire outcome of the refactoring – at the very least the impact will be visible when you review the complete change before committing. (For a fairly comprehensive list of the things I go through at the point I commit see my C Vu article “Commit Checklist”.)

This assumes of course that you do a thorough review of your commits before pushing them. However by this point, just as writing tests after the fact are considerably less attractive, so is finishing off any refactoring; perhaps even more so because the code is not broken per-se, it just might not be the best way of representing the solution.

It’s all too easy to justify the reasons why it’s okay to go ahead and push the change as-is because there are more important things to do. Even if you think you’re aware of technical debt it often takes a fresh pair of eyes to see how you’re living in a codebase riddled with inconsistencies that make it hard to see it’s true structure. One is then never quite sure without reviewing the commit logs what is the legacy and what is the new direction.

Blinded by Tools

Clearly this is not the fault of the tool or their vendors. What they offer now is far more favourable than not having them at all. However once again we need to be reminded that we should not be slaves to our tools but that we are the masters. This is a common theme which is regularly echoed in the software development community and something I myself tackled in the past with “Don’t Let Your Tools Pwn You”.

The Boy Scout Rule (popularised by Uncle Bob) says that we should always leave the camp site cleaner than we found it. While picking up a handful of somebody else’s rubbish and putting it in the bin might meet the goal in a literal sense, it’s no good if the site is acquiring rubbish faster than it’s being collected.

Refactoring is a technique for improving the quality of a software design in a piecewise fashion; just be careful you don’t spend so long on your hands and knees cleaning small areas that you fail to spot the resulting detritus building up around you.

 

[1] I wasn’t sure whether to say minima or maxima but I felt that refactoring was about lowering entropy in some way so went with the reduction metaphor.

[2] Clearly there are limits around published APIs which it just has to ignore.

Unmatched REST Resources – 400, 404 or 405?

Chris Oldwood from The OldWood Thing

There is always a tension in programming between creating something that is hard to misuse but at the same time adheres to standards to try and leverage the Principle of Least Surprise. One area I personally struggle with this conflict is how to communicate to a client (of the software kind) that they have made a request for something which doesn’t currently exist, and almost certainly will never exist.

As a general rule when someone requests a resource that doesn’t exist then you should return a 404 (Not Found). And this makes perfect sense when we’re in production and all the bugs have been ironed but during development when we’re still exploring the API it’s all too easy to make a silly mistake and not realise that it’s due to a bug in our code.

An Easy Mistake

Imagine you’re looking up all orders for a customer, you might design your API something like this:

GET /orders/customer/12345

For a starter you have the whole singular noun vs plural debate which means you’ll almost definitely try this by accident:

GET /order/customer/12345

or make the inverse mistake

GET /orders/customers/12345

By the standard HTTP rules you should return a 404 as the resource does not exist at that address. But does it actually help your fellow developers to stick to the letter of the law?

Frameworks

What makes this whole issue much thornier is that if you decide you want to do the right thing by your fellow programmers you will likely have to fight any web framework you’re using because they usually take the moral high ground and do what the standard says.

What then ensues is a fight between the developer and framework as they try their hardest to coerce the framework to send all unmatched routes through to a handler that can return their preferred non-404 choice.

A colleague who is also up for the good fight recently tried to convince the Nancy .Net framework to match the equivalent of “/.*” (the lowest weighted expression) only to find they had to define one route for each possible list of segments, i.e. “/.*”, “/.*/.*”, “/.*/.*/.*”, etc. [1].

Even then he still got some inconsistent behaviour. Frameworks also make it really easy to route based on value types which gives you a form of validation. For example if I know my customer ID is always an integer I could express my route like this:

“/orders/customer/{integer}”

That’s great for me but when someone using my API accidentally formats a URL wrong and puts the wrong type of value for the ID, say the customer’s name, they get a 404 because no route matches a non-integer ID. I think this is a validation error and should probably be a 400 (Bad Request) as it’s a client programmer bug, but the framework has caused it to surface in a way that’s no different to a completely invalid route.

Choice of Status Code

So, assuming we want to return something other than Not Found for what is clearly a mistake on the client’s part, what are our choices?

In the debates I’ve seen on this 400 (Bad Request) seems like a popular choice as the request, while perhaps not technically malformed, is often synonymous with “client screwed up”. I also like Phil Parker’s suggestion of using 405 (Method Not Allowed) because it feels like less abuse of the 4XX status codes and is also perhaps not as common as a 400 so shows up a bit more.

 

[1] According to this StackOverflow post it used to be possible, maybe our Google fu was letting us down.