Delphi and group estimation

Derek Jones from The Shape of Code

A software estimate is a prediction about the future. Software developers were not the first people to formalize processes for making predictions about the future. Starting in the last 1940s, the RAND Corporation’s Delphi project created what became known as the Delphi method, e.g., An Experiment in Estimation, and Construction of Group Preference Relations by Iteration.

In its original form experts were anonymous; there was a “… deliberate attempt to avoid the disadvantages associated with more conventional uses of experts, such as round-table discussions or other milder forms of confrontation with opposing views.”, and no rules were given for the number of iterations. The questions involved issues whose answers involved long term planning, e.g., how many nuclear weapons did the Soviet Union possess (this study asked five questions, which required five estimates). Experts could provide multiple answers, and had to give a probability for each being true.

One of those involved in the Delphi project (Helmer-Hirschberg) co-founded the Institute for the Future, which published reports about the future based on answers obtained using the Delphi method, e.g., a 1970 prediction of the state-of-the-art of computer development by the year 2000 (Dalkey, a productive member of the project, stayed at RAND).

The first application of Delphi to software estimation was by Farquhar in 1970 (no pdf available), and Boehm is said to have modified the Delphi process to have the ‘experts’ meet together, rather than be anonymous, (I don’t have a copy of Farquhar, and my copy of Boehm’s book is in a box I cannot easily get to); this meeting together form of Delphi is known as Wideband Delphi.

Planning poker is a variant of Wideband Delphi.

An assessment of Delphi by Sackman (of Grant-Sackman fame) found that: “Much of the popularity and acceptance of Delphi rests on the claim of the superiority of group over individual opinions, and the preferability of private opinion over face-to-face confrontation.” The Oracle at Delphi was one person, have we learned something new since that time?

Group dynamics is covered in section 3.4 of my Evidence-based software engineering book; resource estimation is covered in section 5.3.

The likelihood that a group will outperform an individual has been found to depend on the kind of problem. Is software estimation the kind of problem where a group is likely to outperform an individual? Obviously it will depend on the expertise of those in the group, relative to what is being estimated.

What does the evidence have to say about the accuracy of the Delphi method and its spinoffs?

When asked to come up with a list of issues associated with solving a problem, groups generate longer lists of issues than individuals. The average number of issues per person is smaller, but efficient use of people is not the topic here. Having a more complete list of issues ought to be good for accurate estimating (the validity of the issues is dependent on the expertise of those involved).

There are patterns of consistent variability in the estimates made by individuals; some people tend to consistently over-estimate, while others consistently under-estimate. A group will probably contain a mixture of people who tend to over/under estimate, and an iterative estimation process that leads to convergence is likely to produce a middling result.

By how much do some people under/over estimate?

The multiplicative factor values (y-axis) appearing in the plot below are from a regression model fitted to estimate/actual implementation time for a project involving 13,669 tasks and 47 developers (data from a study Nichols, McHale, Sweeney, Snavely and Volkmann). Each vertical line, or single red plus, is one person (at least four estimates needed to be made for a red plus to occur); the red pluses are the regression model’s multiplicative factor for that person’s estimates of a particular kind of creation task, e.g., design, coding, or testing. Points below the grey line are overestimation, and above the grey line the underestimation (code+data):

3n+1 programs containing various lines of code.

What is the probability of a Delphi estimate being more accurate than an individual’s estimate?

If we assume that a middling answer is more likely to be correct, then we need to calculate the probability that the mix of people in a Delphi group produces a middling estimate while the individual produces a more extreme estimate.

I don’t have any Wideband Delphi estimation data (or rather, I only have tiny amounts); pointers to such data are most welcome.

Our brains want more but less is more

Allan Kelly from Allan Kelly Associates

Less is more – I say it so often but it is still a lesson I still have to relearn regularly. Go small. You could say that my most famous blog post is saying just the same thing in fancy language – Software has diseconomies of scale – not economies of scale.

So a couple of weeks ago I was fascinated to see an article in The Economist entitled “Why people forget that less is often more” (paywall) reporting on research published in Nature “People systematically overlook subtractive changes” (another paywall) – being Nature one knows this is serious science.

The researched showed that less is more applies mentally as well as physically. That is, people are far more likely to solve problems by adding elements than by subtracting them – even when subtraction is both viable and costs less (some of the experiments introduced the idea of cost.) The researches even go as far as to suggest this isn’t just a case of believing the additive (more) solution is better than the subtractive, it appears our brains are less likely to consider the a solution which subtracts elements to create a solution.

Last year when I was struggling to complete “Succeeding with OKRs in Agile” I found a solution in removing some chapters. Some were little more than notes, some were in draft and a couple were fully written (and edited). Removing those chapters made the book less, but it made my work load less too – which had an immediate benefit.

It also meant I could finish the book sooner. It meant the copy editing process was quicker and cheaper, and it meant that the book could be published on Amazon and start earning for sooner. But I also believe people like shorter books, I really believe that I’m selling more books because it has less than 200 pages than I would if it had over 200, let alone 300.

This has a bearing on the way companies organise themselves and their processes too. When I first started talking about #NoProjects (which became Project Myopia) I really saw this as a “just remove the project model” – keep doing all the other stuff but just drop projects. Part of me still believes that and while I recognise that some places need more structure I also believe that adding projects is simply overhead for many small companies.

I see it too in the “fear of coding” that many companies have – don’t let people code! Plan it, write it down, estimate it, find the cheapest supplier, argue about it – when simply doing it would be cheaper.

I see it too in the way “agile methods” have grown. Scrum, and XP, are barely viable development models. Compared to RUP they are miniscule. But they worked. Before agile we called them “lightweight”.

Now we have SAFe and other frameworks which bring big thinking back. Nobody would call SAFe lightweight – not with its 10 principles, four configurations and five versions. Perhaps we should have stuck with “lightweight development methods.”

I think it was Alistair Cockburn who once said “Traditional methods are tailored by removing elements, agile methods are tailored by adding.” If the research above is right it was only a matter of time before someone created an “agile method” as big as SAFe.

Finally, another example of less is more: I could write more in this blog but less is more.


Subscribe to my blog newsletter and download Continuous Digital for free – normal price $9.99/£9.95/€9.95

The post Our brains want more but less is more appeared first on Allan Kelly Associates.

Estimate variability for the same task

Derek Jones from The Shape of Code

If 100 people estimate the time needed to implement a feature, in software, what is the expected variability in the estimates?

Studies of multiple implementations of the same specification suggest that standard deviation of the mean number of lines across implementations is 25% of the mean (based on data from 10 sets of multiple implementations, of various sizes).

The plot below shows lines of code against the number of programs (implementing the 3n+1 problem) containing that many lines (code and data):

3n+1 programs containing various lines of code.

Might any variability in the estimates for task implementation be the result of individuals estimating their own performance (which is variable)?

To the extent that an estimate is based on a person’s implementation experience, a developer’s past performance will have some impact on their estimate. However, studies have found a great deal of variability between individual estimates and their corresponding performance.

One study asked 14 companies to bid on implementing a system (four were eventually chosen to implement it; see figure 5.2 in my book). The estimated elapsed time varied by a factor of ten. Until the last week this was the only study of this question for which the data was available (and may have been the only such study).

A study by Alhamed and Storer investigated crowd-sourcing of effort estimates, structured by use of planning poker. The crowd were workers on Amazon’s Mechanical Turk, and the tasks estimated came from the issue trackers of JBoss, Apache and Spring Integration (using issues that had been annotated with an estimate and actual time, along with what was considered sufficient detail to make an estimate). An initial set of 419 issues were whittled down to 30, which were made available, one at a time, as a Mechanical Turk task (i.e., only one issue was available to be estimated at any time).

Worker estimates were given using a time-based category (i.e., the values 1, 4, 8, 20, 40, 80), with each value representing a unit of actual time (i.e., one hour, half-day, day, half-week, week and two weeks, respectively).

Analysis of the results from a pilot study were used to build a model that detected estimates considered to be low quality, e.g., providing a poor justification for the estimate. These were excluded from any subsequent iterations.

Of the 506 estimates made, 321 passed the quality check.

Planning poker is an iterative process, with those making estimates in later rounds seeing estimates made in earlier rounds. So estimates made in later rounds are expected to have some correlation with earlier estimates.

Of the 321 quality check passing estimates, 153 were made in the first-round. Most of the 30 issues have 5 first-round estimates each, one has 4 and two have 6.

Workers have to pick one of five possible value as their estimate, with these values being roughly linear on a logarithmic scale, i.e., it is not possible to select an estimate from many possible large values, small values, or intermediate values. Unless most workers pick the same value, the standard deviation is likely to be large. Taking the logarithm of the estimate maps it to a linear scale, and the plot below shows the mean and standard deviation of the log of the estimates for each issue made during the first-round (code+data):

Mean against standard deviation for log of estimates of each issue.

The wide spread in the standard deviations across a spread of mean values may be due to small sample size, or it may be real. The only way to find out is to rerun with larger sample sizes per issue.

Now it has been done once, this study needs to be run lots of times to measure the factors involved in the variability of developer estimates. What would be the impact of asking workers to make hourly estimates (they would not be anchored by experimenter specified values), or shifting the numeric values used for the categories (which probably have an anchoring effect)? Asking for an estimate to fix an issue in a large software system introduces the unknown of all kinds of dependencies, would estimates provided by workers who are already familiar with a project be consistently shifted up/down (compared to estimates made by those not familiar with the project)? The problem of unknown dependencies could be reduced by giving workers self-contained problems to estimate, e.g., the 3n+1 problem.

The crowdsourcing idea is interesting, but I don’t think it will scale, and I don’t see many companies making task specifications publicly available.

To mimic actual usage, research on planning poker (which appears to have non-trivial usage) needs to ensure that the people making the estimates are involved during all iterations. What is needed is a dataset of lots of planning poker estimates. Please let me know if you know of one.

A Review of Absolution Gap: Ignore the naysayers. Read this book. Love this book.

Paul Grenyer from Paul Grenyer

Alastair Reynolds ISBN-13 : 978-0575083165

I re-read Absolution Gap a decade or more after the first time in anticipation of the next part coming out in July of this year (2021). It was always the weakest of the trilogy, but not nearly as bad as I remember. In fact this time I devoured it in a relatively, for me, short period of time.

It’s true, as some other reviewers have said, that the story here could have been told in far fewer words, but then much of the texture of the story telling would have been lost and I think this is what makes this such a great book!


The characters and themes are believable in this universe. I think the story could have been very different if certain characters had not been killed off so early or at all. It’s always a shame when a lot of the main thrust of a previous book (Redemption Ark) is undone, but this is often how things play out in the real world.


I achieved what I wanted by re-reading Absolution Gap, I’m up to speed ready for the next instalment. The problem is that there is so much in the Revelation Space universe I now feel the need to go back and read it all again.


Ignore the naysayers. Read this book. Love this book.


Bring Out The Big Flipping GunS – a.k.

a.k. from thus spake a.k.

Last month we took a look at quasi-Newton multivariate function minimisation algorithms which use approximations of the Hessian matrix of second partial derivatives to choose line search directions. We demonstrated that the BFGS rule for updating the Hessian after each line search maintains its positive definiteness if they conform to the Wolfe conditions, ensuring that the locally quadratic approximation of the function defined by its value, the vector of first partial derivatives and the Hessian has a minimum.
Now that we've got the theoretical details out of the way it's time to get on with the implementation.

Suspending the computer using Kupfer

Andy Balaam from Andy Balaam's Blog

I have recently started using Kupfer again as my application launcher in Ubuntu MATE, and I found it lacked the ability to suspend the computer.

Here is the plugin I wrote to support this.

To install it, quit Kupfer, create a directory in your home dir called .local/share/kupfer/plugins, and create this file suspend.py inside:

__kupfer_name__ = _("Power management")
__kupfer_sources__ = ("PowerManagementItemsSource", )
__description__ = _("Actions to suspend the computer")
__version__ = "2021-05-05"
__author__ = "Andy Balaam "


from kupfer.plugin import session_support as support


class Suspend (support.CommandLeaf):
    def __init__(self, commands):
        support.CommandLeaf.__init__(self, commands, "Suspend")
    def get_description(self):
        return _("Suspend the computer")
    def get_icon_name(self):
        return "system-suspend"


class PowerManagementItemsSource (support.CommonSource):
	def __init__(self):
		support.CommonSource.__init__(self, _("Power management"))
	def get_items(self):
		return (Suspend((["systemctl", "suspend"],)),)

# Copyright 2021 Andy Balaam, released under the MIT license.

Now restart Kupfer, go to Preferences, Plugins, and tick “Power management”.

You should now see a “Suspend” item if you search for it in the Kupfer interface.

Inspired by: Mate Session Management – Kupfer Plugin.

Reference docs: Kupfer Plugin API

OKRs top-down? bottom-up? or ripples in a pond?

Allan Kelly from Allan Kelly Associates

One of the great things about writing a book is that you get a greater understanding of the subject. So it was with “Succeeding with OKRs in Agile”. In particular writing the book forced me to think about how OKRs fitted with agile and hierarchal structures. I get the impression that many people are interested in using OKRs to align teams but not everyone has worked out all the nuances.

On the face of it OKRs are hierarchal: it seems “obvious” that someone, somewhere, is going to set a big goal, that goal will cascade down and every team will end up with their own mini version of the goal. As I said, this seems obvious, how else could it be? Especially in a large organization?

After all, is that how the not so distant ancestors of OKRs, Management-By-Objectives (MBOs), worked.

This also fits the engineer’s mind: the product team have a goal, and all the supporting teams – whether contributing components or services – make their goals subservient to the one goal. The classic inverted tree with each team doing what the node above asks.

But, top-down conflicts directly with reality and with agile. Teams don’t have one goal, they don’t answer to but one leader but to multiple leaders, multiple customers, multiple stakeholders and these don’t always agree. Agile folk have long railed against command-and-control from above while advocating self-managing or self-organising teams. Surely OKRs go against this philosophy?

So, if we are to use OKRs in an agile environment these positions need to be reconciled. In Succeeding with OKRs I described the process as more bottom-up than top-down. Thus, rather than a big boss saying what should happen and that being cascaded down the origanization to provide goals at every level, I describer the big boss setting out a vision, a goal, an objective but not describing details. Then they say to the teams: “Help, how can you help more us towards that goal?”

Now, only a few months after I wrote that my thinking has moved on. I don’t disagree with myself but I see the need for a more nuanced explanation and a revised model.

First off, I’m guilty of using the language of hierarchy: top-down and bottom-up. In so doing I’m supporting the view that hierarchies are the natural state of things and creating a, possibly false dichotomy: one thing or another. For years I’ve been thinking of organisations both as federal entities and as solar systems. While the leadership team may be central to decisions they are not all powerful . Teams have their own paths, leadership and leadership teams are the sun and teams/divisions orbit them. (If I recall correctly I picked this idea up in the Henry Mintzberg book Simply Managing.)

Bear in mind, as I say in everyone of my books: team are autonomous. We stive for independent teams with devolved authority. Each team exists to deliver a product or service and each team has multiple stakeholders – of whom the big boss is but one. Each team therefore has to decide how best to deliver benefit to (potentially competing) stakeholders. Sometimes that means co-ordinating with other teams and even other companies.

Put that together: teams are creating OKRs but they are not doing it in isolation, they are listening to the big bosses at the centre but also the teams they need to work with.

Recently I’ve started to think of these concentric circles less as planetary orbits and more as waves, or ripples to be precise. The big boss at the centre makes big ripples that carry out to the edges.

But leaders are not the only ripple makers. Teams, customers and other stakeholders also have an effect – like rain falling on water. Sometimes these waves some together and magnify each other, other times they cancel each other out, more often than-not they are out of sync and disrupt each other in ways too complex to predictable.

We think of leaders as single water droplets but inreality there are lots of drops making lots of ripples

OKRs are the messaging system that allows teams to signal what ripples they are creating and which they are reacting to. Teams are iterating – OKRs reset every 13 weeks – which means every quarter teams get a chance to react to other ripples and rest their own.

Thought of like this you also get a scaling model. Not so much a model of “how to do this to scale” but a mental model which describes how to think about scaling.


I have some upcoming presentations and webinars about OKRs if you would like to know more

Or, buy the book “Succeeding with OKRs in Agile”


Subscribe to my blog newsletter and download Continuous Digital for free – normal price $9.99/£9.95/€9.95

The post OKRs top-down? bottom-up? or ripples in a pond? appeared first on Allan Kelly Associates.

Presentation & speaking engagements

Allan Kelly from Allan Kelly Associates

Agile on the beach

The attention Succeeding with OKRs in Agile has got means I’ve had a lot of invites to speak about, well, Agile and OKRs. So I am at a number of meetup groups and conferences, and odd conversation, over the next few weeks, most of which are free to attend online.

An up to date list of public speaking engagements can always be found on my website. In addition to public speaking I regularly deliver private presentations too. So please get in touch and book a date your team or company.


Reawakening Agile with OKRs

When: 6 May, 5.30pm Sydney, 8.30am London

Online presentation

Organized by SAFe Sydney, free, booking required


Reawakening Agile with OKRs

When: 11 May 2021, online presentation

Organized by BCS Change Management group – booking required (free to attend)


Succeeding with OKRs

An online conversation with Adrian Reed, 20 May 2021.

Organiser and booking with BlackMetric


Reawakening Agile with OKRs

When: 27 May, online presentation

Organized by Cambridge Agile Exchange, free, booking required


Reawakening Agile with OKRs

When: 30 June 2021, online presentation

Organized by Future of Work in Scotland, booking required


Allan Kelly at Agile on the Beach
Allan Kelly at Agile on the Beach

Reawakening Agile with OKRs at Agile on the Beach conference

First live, in person, event since March 2020 – with beach party!

Sept 2, 2021, tickets on sale now

The post Presentation & speaking engagements appeared first on Allan Kelly Associates.

Claiming that software is AI based is about to become expensive

Derek Jones from The Shape of Code

The European Commission is updating the EU Machinery Directive, which covers the sale of machinery products within the EU. The updates include wording to deal with intelligent robots, and what the commission calls AI software (contained in machinery products).

The purpose of the initiative is to: “… (i) ensuring a high level of safety and protection for users of machinery and other people exposed to it; and (ii) establishing a high level of trust in digital innovative technologies for consumers and users, …”

What is AI software, and how is it different from non-AI software?

Answering these questions requires knowing what is, and is not, AI. The EU defines Artificial Intelligence as:

  • ‘AI system’ means a system that is either software-based or embedded in hardware devices, and that displays behaviour simulating intelligence by, inter alia, collecting and processing data, analysing and interpreting its environment, and by taking action, with some degree of autonomy, to achieve specific goals;
  • ‘autonomous’ means an AI system that operates by interpreting certain input, and by using a set of predetermined instructions, without being limited to such instructions, despite the system’s behaviour being constrained by and targeted at fulfilling the goal it was given and other relevant design choices made by its developer;

‘Simulating intelligence’ sounds reasonable, but actually just moves the problem on, to defining what is, or is not, intelligence. If intelligence is judged on an activity by activity bases, will self-driving cars be required to have the avoidance skills of a fly, while other activities might have to be on par with those of birds? There is a commission working document that defines: “Autonomous AI, or artificial super intelligence (ASI), is where AI surpasses human intelligence across all fields.”

The ‘autonomous’ component of the definition is so broad that it covers a wide range of programs that are not currently considered to be AI based.

The impact of the proposed update is that machinery products containing AI software are going to incur expensive conformance costs, which products containing non-AI software won’t have to pay.

Today it does not cost companies to claim that their systems are AI based. This will obviously change when a significant cost is involved. There is a parallel here with companies that used to claim that their beauty products provided medical benefits; the Federal Food and Drug Administration started requiring companies making such claims to submit their products to the new drug approval process (which is hideously expensive), companies switched to claiming their products provided “… the appearance of …”.

How are vendors likely to respond to the much higher costs involved in selling products that are considered to contain ‘AI software’?

Those involved in the development of products labelled as ‘safety critical’ try to prevent costs escalating by minimizing the amount of software treated as ‘safety critical’. Some of the arguments made for why some software is/is not considered safety critical can appear contrived (at least to me). It will be entertaining watching vendors, who once shouted “our products are AI based”, switching to arguing that only a tiny proportion of the code is actually AI based.

A mega-corp interested in having their ‘AI software’ adopted as an industry standard could fund the work necessary for the library/tool to be compliant with the EU directives. The cost of initial compliance might be within reach of smaller companies, but the cost of maintaining compliance as the product evolves is something that only a large company is likely to be able to afford.

The EU’s updating of its machinery directive is the first step towards formalising a legal definition of intelligence. Many years from now there will be a legal case that creates what later generation will consider to be the first legally accepted definition.

My experience with the NZXT H1 recall so far

Timo Geusch from The Lone C++ Coder's Blog

First, I’m very much a “very occasional” gamer so I’m usually not the target audience for most gaming related accessories and parts. I did however want to rebuild my rather large grey box Linux/Windows workstation into something more compact with a watercooler for the CPU. The NZXT H1 seemed at that point to be a really good match for my requirements and had received good reviews. One was duly ordered, together with a Mini-ITX motherboard and somehow, a better graphics card also snuck on the shopping list.