New Directions Of Interpolation – a.k.

a.k. from thus spake a.k.

We have spent a few months looking at how we might interpolate between sets of points (xi, yi), where the xi are known as nodes and the yi as values, to approximate values of y for values of x between the nodes, either by connecting them with straight lines or with cubic curves.
Last time, in preparation for interpolating between multidimensional vector nodes, we implemented the ak.grid type to store ticks on a set of axes and map their intersections to ak.vector objects to represent such nodes arranged at the corners of hyperdimensional rectangular cuboids.
With this in place we're ready to take a look at one of the simplest multidimensional interpolation schemes; multilinear interpolation.

Disaster Recovery: A Dynamic Redundancy Approach

Tim Pizey from Tim Pizey

The problem with disaster planning is that it is not rehearsed. When you need to retrieve a file from backup is when you discover that your backup has been broken for three months.

Modern cloud systems, based upon software defined infrastructure and redundant, auto-scaling fleets of micro-services, come with disaster recovery built in. They are designed to be resilient against DDoS attacks, unexpected peaks in usage and continent wide unavailability.

Some systems have yet to migrate to outsourced infrastructure, some never will migrate. For these systems we need a Disaster Recovery Strategy which can be implemented within reasonable costs and ideally does not suffer from the fails when needed feature of many backup systems. One answer is to do regular fire drills. No one would dispute the importance of fire drills in saving lives and ensuring that people know what to do in the case of a real fire, however we all know there is a big difference between a rehearsal and the real thing.

The key insight in the modern cloud architectures is that every version of a system is the same (at a particular time).

We can reduce this to a minimal redundant system: a pair of identical systems with one designated Primary and the other Secondary, with a standard data mirroring link from Primary to Secondary.

To ensure that both elements of the pair really can function as the Primary you could rehearse a cutover one weekend.

But if the two systems really are identical then there is no reason to reverse the cutover at the end of the rehearsal. The old Secondary is the new Primary, the old Primary is the new Secondary. The Primary can be swapped at a periodicity the business is comfortable with, say twice a year.

This Dynamic Redundancy strategy ensures that your Disaster Recovery works when you need it to and can be adjusted according to the business' appetite for risk.

Adding a new scalar type to C

Derek Jones from The Shape of Code

I think the time has arrived for a new scalar type in C, which for want of a better name I shall call the compendium type.

On today’s processors a compendium type behaves a lot like an integer type, except that nobody really wants to include it in the list of supported integer types, e.g., 128-bit scalars.

Why is a new scalar type needed? The Standard supports extended integer types, why not treat a scalar object that supports integer arithmetic as an integer type?

The C Standard says (section 6.2.5 Types):
“There are five standard signed integer types, designated as signed char, short int, int, long int, and long long int. (These and other types may be designated in several additional ways, as described in 6.7.2.) There may also be implementation-defined extended signed integer types.38) The standard and extended signed integer types are collectively called signed integer types.39)”

There is corresponding wording for unsigned integer types.

The standard header allows implementations to define a whole menagerie of integer types: section Exact-width integer types
“The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.”

This all sounds very feasible, but there is a catch. The Standard defines a greatest-width integer type, section Greatest-width integer types
“The following type designates a signed integer type capable of representing any value of any signed integer type:

and various library functions have an argument type intmax_t (there is also an uintmax_t).

An ‘extra-large’ integer type is not something that can just sit there, in the list of available integer types, waiting to be used. Preprocessor arithmetic and a variety of library are based around the type intmax_t. An extra-large integer type would have a very visible impact on all developers, many of whom would want to ignore it.

GCC supports 128-bit integers, e.g., __int128. But some magic pixie dust is involved, this type has no connection with intmax_t.

What do developers do with these 128- and 256-bit scalar objects? Evaluating graphics algorithms, hashes and cryptographic calculations are obvious candidates; yes, perhaps even calculations involving integers that require this many bits. I have not seen any analysis of the uses of this kind of wide-integer-like type.

Extra-wide scalar types have a variety of uses and the term compendium type, captures this. Hardware support for such extra-width types is growing, with vendors looking to fill major niches.

Contorting existing wording, in the Standard, so accommodate these extra-wide types within the existing integer type machinery is a short term solution. Work on the upcoming revision of the C Standard should either do nothing and allow vendors to take the approach currently used by GCC, or create a new scalar type (perhaps using a TR).

Dealing with unplanned but urgent work

Allan Kelly from Allan Kelly Associates


3) Maintenance and Evolution
To keep a product alive, we choose backlog stories that will bring value, and do them one after the other.
But… as support of the application may take a huge part the work. And when the problem is critical, there is nothing you can do but stop what you do and fix it. This can blow any estimation.
How do you deal with firefighting in a #NoProjects world?
And techniques to avoid it.
How does #NoProject and DevOps work together?

Let me take the last part of this question first. Operations has never been plagued by the project model the way development has. When does a SysAdmin ever say “The project is finished so I’m not going to restart the server” ?

DevOps (aka Continuous Delivery) and Continuous Digital are a natural fit. The team is responsible and accountable: writing the code, deploying it and supporting it there after: “You built it, you operate it” as DevOps people like to say.

Of course the team needs to contain all the skills needed to service this approach. That might mean having an individual specialist on the team or it might mean that team members have multiple skills. A Continuous team is not just a DevOps team, it is also a Business-Technology team – or #BizTech to coin a hashtag. (This week I heard such a team called a BizDevOps team. That is one portmanteau too far for me.)

Which brings us quite nicely to the first part of this question: how do you manage – and perhaps even plan for: unplanned work?

What I would like to happen when unplanned work appears is that it is written on a card and placed in the backlog. It then takes its place with all the other possible work. But… as the questioner states: this work can’t wait, it is urgent.

Unplanned but urgent simply needs to be done. Quite possibly other work, less valuable work or work which is not time critical may even be interrupted.

At this point I was about to refer readers to an old blog post about Yellow Cards. But it turns out that I never wrote that post. Despite talking about Yellow cards for years I’ve never blogged about them. I wrote about them in Xanpan but for some reason or another I never wrote the blog… so here you go…

When a team is mid-sprint and unplanned work appears the team should:

  • First ask “Can this work wait?” – If so then write it on a regular card and put it in the backlog
  • If not then ask, is this more valuable than work we are doing now? – If not then someone needs to find the source of the request and explain why is isn’t going to get done.
  • Assuming it is urgent then it gets written on a Yellow card.
  • If it is really really urgent then someone drops what they are doing and works on the yellow card immediately.
  • If it can wait a little while then the next person who finishes their current work picks up the card and does it.
  • Once the yellow card is done mark it as done as with any other card and work continues as it was before.

Accepting unplanned work into a sprint impacts the other work the team is doing. I’m not a big fan of the commitment protocol so to me it is no big deal if this work displaces something else. But if your team are expected to hold fast to hard commitments while dealing with unplanned work then you have a problem, call me, we need to talk more.

At the end of the iteration we can look at the cards and reason about them. Now we can see the work we can manage it and decide what to do about it.

I count up the yellow cards – and all the planned work cards. That allows me to calculate a ratio of planned versus unplanned work. (Sometimes teams put a retrospective points estimate on a yellow but doing a card count is often sufficient.)

This can be tracked over time – graph it, make it visible again. Now we can look at the work and the pattern of work, reason about it, maybe do some root-cause analysis. Perhaps:

  • Perhaps much of the urgent work isn’t really so urgent, perhaps the team should push back more. Maybe the team, or one of the team leaders, needs to the authority to say No.
  • Perhaps most of the unplanned work comes from a particular person. Maybe this person doesn’t realise the impact of their unplanned requests, or maybe they need to be included in the planning process, or … a million other reasons.
  • Perhaps the unplanned work is coming from the same sub-system, maybe some remedial work on that sub-system could reduce the amount of unexpected work.
  • Perhaps the unplanned work is just the nature of the business and being responsive is valuable.

Looked at this way we can think about reducing the amount of unplanned work. But also, we can plan for unplanned work.

It is likely that over time a pattern will emerge. One team I know found that 20% to 25% of their work in any sprint was unplanned. They simply planned for 20% less work. They now had the capacity to cope with unplanned work. At the least they could expectation manage stakeholders.

One team found that each sprint they were doing about 20% IT support tasks (new PCs, Word problems, etc.) so they hired a support technician.

Another team who agonised about unplanned work found that actually they only had about one unplanned card a week. Their problem was not excessive unplanned work but the fact that unplanned work tended to have a very high profile in the company.

Teams which find they have very high levels of unplanned work on a regular basis (e.g. over 50% of work for several months) may well decide to adopt a full Kanban system. Indeed, Kanban folk probably recognise my description as a very simple example of quality-of-service and policies.

I say more about Yellow Cards for unplanned but urgent in Xanpan so you might like to continue reading there.

This is the third question carried over from the #NoEstimates/#NoProjects August workshop in Zurich.

If you have any questions about Continuous Digital, Project Myopia and #NoProjects please mail them over and I’ll do my best to answer them in this blog.

Receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

The post Dealing with unplanned but urgent work appeared first on Allan Kelly Associates.

The Nostradamus argument in software engineering research

Derek Jones from The Shape of Code

The Nostradamus argument in software engineering research goes something like: This idea was proposed in a paper by XX, some years ago.

I regularly encounter the Nostradamus argument when discussing what people in industry are doing, with one or more academics. The same argument is probably made in other fields.

The rules of academic research pretty much guarantee that somebody, at sometime, has published a paper containing an idea related to something being discussed today.

The first researcher(s) to publish an idea gets the credit for the idea, and ‘uses it up’ the idea, that is somebody else cannot subsequently publish a paper claiming that idea (it does happen, either through plagiarism or slip-ups during review).

The job of researchers is to find new ideas (well, actually these days it is to quickly find an idea that will get published; researchers are on a publication treadmill). Sometimes a paper will explicitly point out the novel idea they are claiming (usually a sign of a very poor paper; the author(s) obviously don’t feel confident that the reader will see anything of merit). Researchers also talk of gaps in the literature, i.e., some topic where little, if anything, has been published.

Before starting work in an area, researchers are supposed to read all relevant prior publications; this can be an awful lot of work and take a lot of time. In practice people tend to read the papers in the top 10, or so, journals published in the last few years; maybe looking at more journals and going further back in time if the initial search fails to return many results. I have had many conversations with researchers about a paper, or thesis, they are just completing and been told “I’m just finishing off the literature search”, i.e., they are doing the background checks after completing their research, not before (yes, sometimes rather similar work has already been published and some quick footwork is needed).

So the work of prior researchers is venerated in theory, but rarely in practice.

Technical Debt – Conscious Competence

Chris Oldwood from The OldWood Thing

Once upon a time the term Technical Debt seemed to have a very clear meaning but over the last few years that has been diluted to generally mean any crap code or process which is holding back delivery. I’m sure any scholars of Wittgenstein will be at pains to point out that “meaning is use” and therefore if everyone uses it this way who am I to argue?

For me the canonical source of information on the technical debt metaphor comes from the wiki of the person who coined the phrase in the first place – Ward Cunningham. The entry on Technical Debt there suggests to me that the choice to enter into debt is a wholly conscious one, not the unconscious acts of a less professional bunch of programmers.

By way of example I thought I’d take the opportunity to write up one of those occasions where I’ve been involved with taking debt on (in the original spirit of the term) and how we dealt with it, to show where the distinction lies.

The Bug

Soon after going live with v1.0 of a new calculation system in a large financial organisation we discovered that a number of key counterparties were missing from the daily report. The report generator was a late addition and there were various other issues around it’s development and testing which muddied the waters somewhat but suffice to say that this wasn’t delivered as cleanly as the core system was. (You might consider the more recent meaning of the term to apply here.)

More importantly what transpired was that due to various mergers in the company’s history a few counterparties had the same “unique” code in different back-end systems. This wasn’t just news to my team (we were all recent hires) but also to quite a few people in the business too. Due to only dealing with a limited set of “books” the codes were always unique to them in their context, but our new system cut right across them all.

The Root Cause

The generation of calculations was ultimately based around a Cartesian product of two counterparties, however given that most of those were pointless there was an optimisation which used another source of data to reduce that by more than an order of magnitude.

This optimisation should have been fairly simple but due to a need to initially use some existing manually managed counterparty data to ease the cutover (so regression testing should then reconcile exactly) it was somewhat more complicated than first envisaged.

Our system was designed to use the correct source of data eventually, but do a reverse lookup for the time being. It might sound simple but the lookup actually involved multiple lookups using combinations of keys that had to make assumptions about which legacy back-end system might hold the related data. The right person who could explain how we could do what we needed to do correctly also seemed elusive; there were many people with “heuristics”, but nobody who knew for sure.

In total there were ~100 counterparties out of a total of ~15,000 permutations that suffered from this problem. Unfortunately a handful of those 100 had a significant effect on the “bottom line” and therefore the usefulness of the system as a whole was in doubt at that point.

Entering Into Debt

Naturally once we unearthed this clanger we had to decide how to tackle it. After getting our heads around what this all meant and roughly where in the code the missing logic probably needed to go we had to make a decision – do we try and fix the underlying issue right away or try and put a workaround in place (assuming that’s even possible) to mitigate the problem, at least temporarily.

We were all very aware of going down the dark road of putting a tactical fix in place because we’d all seen where that can lead. We had made a concerted effort over the 12 months required to build the system to refactor relentlessly [1] and squash any bugs as soon as possible. This felt like a backwards step.

On the positive side by adopting a Design for Testability approach in most parts of the code we had extra switches on our processes [2] that allowed us to make per-counterparty requests, usually for diagnostic purposes. Hence the workaround took the form of sticking the list of missing counterparties in a simple text file, then using a command prompt FOR loop [3] to read the file and invoke the tool in “single counterparty mode”. Yes it was a little slow due the constant restarting of the process but it was easy to surgically insert into the workflow with the minimum of testing or risk.

Paying Back the Debt

With the hole plugged for now, and an easy mechanism in place for adding any other missing counterparties – update the text file – we were in a position to sort out the root problem without feeling under pressure to get the system working correctly 100%, ASAP.

As you can probably imagine the real solution wasn’t easy, not least because it was one of a few areas of SQL code that didn’t have any unit tests and was a tangled web of tables and views which had grown organically in an attempt to graft the old and the new worlds together [4].

What Did it Cost?

If we assume that the gung-ho approach would have been to just jump in and start fixing the real code, then what did we lose by not doing that? It’s possible that the final fix was simple and a little more investigation may have lead to that solution instead.

In contrast, the risk is that we end up in one of those “have we fixed it or not” scenarios where we spend an indeterminate amount of time being “real close” to getting towards “done”. The old adage about the last 10% also taking 90% of the time springs immediately to mind. Instead we were almost positive we had a simple workaround that could be deployed and get the system running correctly enough in an estimable amount of time. I believe there is a lot of value in having that degree of confidence.

What I think was critical was being able to remove the pressure on finding the right solution as this gave us time to really consider what needed to be done. Any fix done under pressure is not going to be given the attention to detail that it probably deserves. You then run the risk of making the system worse and having an even deeper hole to dig yourself out of.

The customer does not care about strategic versus tactical decisions per-se, they just want the thing to work. We cared about the solution because we knew it would be a burden in the short term as everyone had to remember about the bit “grafted on the side”. The general trust the team had built up by keeping quality at the forefront meant that the business would be more willing to trust us to reconcile the problem appropriately when the time came.

Use Language With Care

I really hope the term Technical Debt doesn’t continue to get watered down even further as it’s a powerful concept which is incredibly useful in the right hands. We already have far too many words for “alternate implementation” that are pejoratives carrying an air of unprofessionalness about them. I would like this one to remain in the hands of the professionals so they can continue to have “grown up” conversations with their customers about when it’s appropriate to consider taking shortcuts for a short term business need without them rolling their eyes, yet again.


[1] See “Relentless Refactoring” for more thoughts around this (unfortunately) contentious topic.

[2] “From Test Harness To Support Tool”, “Building Systems as Toolkits” and “In The Toolbox - Home-Grown Tools” all look at the non-functional side of tooling.

[3] A batch file just wouldn’t be complete without a for loop, see “Every Solution Starts With ‘FOR /F’”.

[4] This one feature seemed destined to plague us forever, see “So Many Wrongs, But No Rights” for another tale of woe.