Almost all published analysis of fault data is worthless

Derek Jones from The Shape of Code

Faults are the subject of more published papers that any other subject in empirical software engineering. Unfortunately, over 98.5% of these fault related papers are at best worthless and at worst harmful, i.e., make recommendations whose impact may increase the number of faults.

The reason most fault papers are worthless is the data they use and the data they don’t to use.

The data used

Data on faults in programs used to be hard to obtain, a friend in a company that maintained a fault database was needed. Open source changed this. Now public fault tracking systems are available containing tens, or even hundreds, of thousands of reported faults. Anybody can report a fault, and unfortunately anybody does; there is a lot of noise mixed in with the signal. One study found 43% of reported faults were enhancement requests, the same underlying fault is reported multiple times (most eventually get marked as duplicate, at the cost of much wasted time) and …

Fault tracking systems don’t always contain all known faults. One study found that the really important faults are handled via email discussion lists, i.e., they are important enough to require involving people directly.

Other problems with fault data include: biased reported of problems, reported problem caused by a fault in a third-party library, and reported problem being intermittent or not reproducible.

Data cleaning is the essential first step that many of those who analyze fault data fail to perform.

The data not used

Users cause faults, i.e., if nobody ever used the software, no faults would be reported. This statement is as accurate as saying: “Source code causes faults”.

Reported faults are the result of software being used with a set of inputs that causes the execution of some sequence of tokens in the source code to have an effect that was not intended.

The number and kind of reported faults in a program depends on the variety of the input and the number of faults in the code.

Most fault related studies do not include any user related usage data in their analysis (the few that do really stand out from the crowd), which can lead to very wrong conclusions being drawn.

User usage data is very hard to obtain, but without it many kinds of evidence-based fault analysis are doomed to fail (giving completely misleading answers).

Conclusion: Who works on what – Comparative advantage part 3 of 3

Allan Kelly from Allan Kelly Associates

HeadacheiStock_000014496990Small-2017-12-21-17-24.jpg

In my last two posts – Who should work on what? part1 and part 2 – I’ve tried to apply the comparative advantage model from economics to the question of which software developer should work on what. The model has come up with two different answers:

  • If productivity (measured by quantity of features is the goal) then it probably makes sense for everyone to work on the product that they are comparatively most productive on (comparatively being the key word here.)
  • If value produced in the goal then it may well make sense for everyone to work on the most valuable features (or product) regardless of personal strengths.

Along the way I’ve highlighted a number of difficulties in applying this model:

  • If common resources are being used, or if doing one piece of work impacts another, then the model doesn’t work.
  • There is no consideration of time or urgency in the model. When urgency enters the picture then productivity may well suffer.
  • Over time things may change: backlogs will stratify and people will learn.
  • Operating this model in practice requires data which is usually unavailable and so getting the data would itself take time.

At this point it is tempting to throw ones hands up in the air and say: “We’ve learned nothing!”

But I don’t think so. I think there are lessons in here.

Right at the start of this I knew this was a difficult question to answer, trying to answer it has shown just how hard it is to get a definitive answer. There are still more assumptions which could be relaxed in this model and still more variables that could be added.

The model has also shown how important it is to have a sense of value. Not only between products but between features. That in turn demonstrates the importance of both valuing work in the backlog and regularly reviewing those valuations.

However, the first big lesson I think that needs learning here is: you have to know what your intention is.

You need to know what you are trying to optimise.
You need a strategy.

For example:

  • Do you want to maximise the quantity of features delivered?
  • Do you want to maximise the value delivered? (probably measured in money)
  • How much do you want to allow for urgent work? And to what standard are you going to hold those requests?
  • Do you want to promote specific knowledge (so one person can become more productive in one domain) or spread knowledge around (so many people can work on many different things)?

In many this is going to be a self-fulfilling prophecy, the result will be what you put in. That is, if people only work on one product then moving people between products will get harder and less productive. If people follow the value then value delivered will increase as people become more productive in the products with the higher value.

Knowing what your intention is should be the first step to formulating a strategy. And having a strategy is important because answering that question – “who should work on what?” – is hard.

To answer that question rationally one needs to create a model, a model far more complex than my model, then calculate every variable in the model – plus keep the variables up to date as they change. Then to apply that model to every work question which arises.

Phew.

Alternatively one can formulate a rule of thumb, a heuristic, a rough guideline, a “good enough” decision process. This might sound a bit amateurish but as Gerd Gigerenzer says in Risk Savvy:

“To make good decisions in an uncertain world, one has to ignore part of the information, which is exactly what rules of thumb do. Doing so can save time and effort and lead to better decisions.”

To build up such rules of thumb requires experience and reflection, something which might be described as intuition.

So to answer my original question in terms an economist would recognise: It depends.

Read more? Subscribe to my newsletter – free updates on blog post, insights, events and offers.

The post Conclusion: Who works on what – Comparative advantage part 3 of 3 appeared first on Allan Kelly Associates.

The first compiler was implemented in itself

Derek Jones from The Shape of Code

I have been reading about the world’s first actual compiler (i.e., not a paper exercise), described in Corrado Böhm’s PhD thesis (French version from 1954, an English translation by Peter Sestoft). The thesis, submitted in 1951 to the Federal Technical University in Zurich, takes some untangling; when you are inventing a new field, ideas tend to be expressed using existing concepts and terminology, e.g., computer peripherals are called organs and registers are denoted by the symbol pi.

Böhm had work with Konrad Zuse and must have known about his language, Plankalkül. The language also has a APL feel to it (but without the vector operations).

Böhm’s language does not have a name, his thesis is really about translating mathematical expressions to machine code; the expressions are organised by what we today call basic blocks (Böhm calls them groups). The compiler for the unnamed language (along with a loader) is written in itself; a Java implementation is being worked on.

Böhm’s work is discussed in Donald Knuth’s early development of programming languages, but there is nothing like reading the actual work (if only in translation) to get a feel for it.

Update (3 days later): Correspondence with Donald Knuth.

Update (3 days later): A January 1949 memo from Haskell Curry (he of Curry fame and more recently of Haskell association) also uses the term organ. Might we claim, based on two observations on different continents, that it was in general use?

Adding value – Who works on what? – part 2 of comparative advantage

Allan Kelly from Allan Kelly Associates

Dollar000012188941XSmall-2017-12-20-10-51.jpg

In my previous post I tried to use the economic theory of comparative advantage to answer the question:

Who should work on what? or Shouldn’t every developer work on the software where they are most productive?

The economic model gave an answer but more importantly it provided a framework for answering the question. As I examined the assumptions behind the model it became clear there are many other considerations which deserve attention.

Perhaps the most important one is: value.

The basic economic model looks, perhaps naively, at quantity of goods produced. Really, one should consider the value of the goods produced. Not only did the model assume that every feature is the same size but it also assumed that all features have the same value.

Flipping back to the basic model, lets assume that each Bonds feature generates $10,000 in revenue while each Equities feature generates $20,000. Now the options are:

  1. Jenny and Joe both work on Equities, they produce seven features and generate $140,000 in revenue.
  2. Jenny and Joe both work on Bonds, they produce seven features and generate $70,000 in revenue.
  3. Joe works on Equities and Jenny on Bonds, the six features they produce generate $80,000 in revenue.
  4. Joe works on Bonds and Jenny on Equities, the eight features they produce generates $130,000 in revenue.

Clearly option #1 is the one to choose because it generates the greatest revenue even though Joe would be more productive if he were to work on Bonds. Adding value to the basic model changes the answer.

Now, again there is an assumption here: all features produce the same value. That is unlikely to be true.

Indeed, over time if no work is done on Bonds it would be reasonable to assume the value of the features would increase. Not that all features would increase in value but failure to do any would mean some of those in the backlog would become more valuable. In addition new requests might arise which may be more valuable than existing requests.

Further, while the value of Bonds features would be increasing the value of Equities might be falling. This follows another economic theory, the law of diminishing marginal utility. This law states that as one consumes more of a given product the added utility (i.e. value) derived from one more unit will be less and less.

So now we have exposed another assumption in the model: the model is static. The model does not consider the effects over time of how things change – I’ll come back to this in another context later too.

Over time the backlogs for both products will stratify, each will contain some items which are higher in value than average and some which are lower in value.

Lets suppose each product has its own backlog:

  • Equities backlog contains seven features with the values: $60,000, $54,000, $48,000, $42,000, $36,000, $30,000 and $24,000.
  • Bonds backlog contains another seven features with the values: $32,500, $10,000, $7,000, $6,000, $5,000, $4,000 and $,3000.

Now there are (at least) four options open:

  1. Equities: both Jenny and Joe work on the equities product. Together they will deliver seven features and a total of $294,000 of value.
  2. Bonds: both Jenny and Joe work on the bonds product. Together they will deliver seven features and a total of $67,500 of value.
  3. Specialise: Jenny does five equities features ($240,000) and Joe three bonds features ($49,500) delivering a total of eight features and $289,500.
  4. Value seeking: Jenny does her five equities features but Joe delivers one bonds feature, one equities feature and gets to go home early. In total they deliver six features and $302,500.

ValueDeliveredCompAdv-2017-12-20-10-51.jpg

The highest value option if #4, which delivers $13,000 more than if they specialise. That might seem counter intuitive: the option that delivers the most money delivers the least features. And again it shows deciding work in the absence of value can be misleading.

The second best option is for both to do Equities only, this delivers $8,500 more than specialisation. Adding value to the basic model isn’t a big change but it has changed the answer. When output was measured in features then specialisation looked to be the best option.

Returning to the question of the static model, there is one more assumption to relax: Learning. Economist J.K.Galbraith pointed out that the comparative advantage neglects to factor in learning, and I’ve done the same thing so far.

Assuming Joe specialises in Bonds and spends most of his time working there he will learn and in time he will become more productive. Suppose after a year he can produce 5 bonds features in the time he takes to produce 2 equities features – a 66% improvement.

Now how to the numbers stack up? What is the revenue maximising choice now?

And perhaps more importantly, how long would it take before Joe’s increased output paid for all the time he spent learning?

But, another what-if, what if Joe had specialised in Equities instead? He would now be more productive on a product with higher value features.

Again the question “Who should work on what?” needs to consider intent. Which product do you want Joe to learn? Which product is expected to have the highest value? Are you maximising value or quantity?

As usual, you can argue with my model and question my assumptions but I think that only demonstrates my point: these things need thinking about.

If you want you can continue relaxing the assumptions and do more what-if calculations – for example I’ve assumed Jenny and Joe cost the same. Nor have I factored in risk or cost-of-delay. This model can get a lot more complicated. I’ve also assumed that partially done features have no value at all, each week starts afresh and no work carries over.

Read more? Subscribe to my newsletter – free updates on blog post, insights, events and offers.

The post Adding value – Who works on what? – part 2 of comparative advantage appeared first on Allan Kelly Associates.

Who should work on what? – Comparative advantage part 1

Allan Kelly from Allan Kelly Associates

CoachiStock_000009613557XSmall-2017-12-19-17-55.jpg

Returning to my theme of numerical and economic analysis of software development, I’d like to address that old chestnut:

Shouldn’t every developer work on the software where they are most productive?

We can model this question using a bit of economic theory called Comparative advantage – which is also the economics that justifies free trade. However, while this model will give us an answer it also raises a number of questions which are outside the model. In this case the model gives us a structure for examining the issues rather than providing an answer.

By the way, this discussion is going to span two blog posts, or perhaps three.

Lets set up the model with a simple case. As before there are some assumptions needed, its when we examine these assumptions that things get really interesting.

Imagine a small trading desk. The desk invests in corporate bonds and equities. Jenny has been working for the desk for some years and has written two applications for trading imaginatively called Equities and Bonds. She wrote Equities after Bonds and prefers Equities and is more productive on Equities.

Measured in features Jenny can produce 5 new Equities features or 4 new Bonds features in one week. (We’ll assume that all features are the same size for now.)

The company hires a new developer, Joe. He is new to the code bases he can only produce 2 Equities features or 3 Bonds features a week. Thus Jenny is the most productive developer on both apps.

Features per week
Equities
Bonds
Jenny5
4
Joe2
3

Now comparative advantage theory tells us not to look at the total output of either party but at the relative output. In other words:

  • For Jenny every bond feature costs 1.2 equities features. Equally Jenny can produce one equities feature at a the cost of 0.8 (4/5ths) bonds features.
  • For Joe every bond feature costs 0.66 (2/3rds) equities features. Or, to put it the other way round, Joe’s equities features cost 1.5 bond features.

Looked at this way, relatively, Jenny is a better (more productive) Equities developers and Joe is the most productive Bonds developer.

Think about that.

During one week Jenny can produce more Bonds features than Joe but when measured in terms of the alternative Joe is the more productive Bonds developer. This is the important point. You might say “look at everyones individual strengthens.” Relatively Joe is better at Bonds.

Together Jenny and Joe could produce 7 features for either product. If Jenny works where she is stronger, Equities, and Joe works where he is strongest, Bonds, then together they will produce 8 features. If they both worked on their weaker product then they will only produce 6 features combined but four of those six would be Bonds features.

So, it seems the case solved: Everyone should specialise and work on the product where the individual is relatively strongest. Although this is not necessarily the same as “who is the best developer” for a product.

But… things are more complex. Now we have the model we can start changing the assumptions and see what happens.

First off, we could relaxed the assumption about all features being a different size. However this doesn’t make any real difference. It doesn’t matter how big a feature is, Jenny is always 20% more productive on Equities than Bonds and similarly Joe is 50% more productive on Bonds than Equities. Using different size features complicates the model without creating new insights.

Varying the size of features doesn’t change the integrity of the model but it does make a difference if we start to look at throughput and consider time.

So lets relax the time assumption. What happens if Joe is in the middle of a Bonds feature and another feature gets flagged up as urgent. Should Joe drop what he is doing and pick up the urgent Bond feature?

The model doesn’t answer this question. The model is only measuring output. If we are attempting to maximise output then changing work part way through the week only makes sense if the both pieces of work – the part done original and the urgent interrupt – can still be completed by the end of the week.

So one needs to ask: is the feature urgent enough to justify Joe halting his current work and doing the new feature? Then perhaps returning to his current work?

Possibly but in making one feature arrive faster another would be delayed. Statistically there is little difference because the differences cancel each other out. Which itself demonstrates how managing by numbers can be misleading.

And what is Joe couldn’t finish both pieces by the end of the week? Would it make sense to reduce overall efficiency to expedite some work?

What if Jenny becomes available, should she work on Bonds? Even though she is relatively less productive at Bonds and would thus delay even more Equities features?

These questions can be answered in many different ways but answering them depends on what you are trying to maximise. And lets also note that in real life the data is unlikely to be so clear cut

On average Joe takes two and a half days to complete an Equities feature while Jenny completes one Equities feature a day. On average Jenny can complete her current feature and a second one before Joe could. But it doesn’t take much to invalidate that answer, in particular if feature sizes vary things change.

What if Jenny is working on an over-sized feature? – well call it urgent #1. Suppose urgent #1 is twice as big as urgent #2 and she has just started #1. Jenny will take three days to finish both features. If goes starts urgent #2 he will have it finished in 2.5 days, during that time Jenny will have urgent #1 finished. Looked at this way it makes sense for Joe to work on the highest priority even if it takes him longer.

And what happens if Equities has three, or more, urgent features? Even with Joe working more slowly than Jenny all the urgent features will be delivered sooner if Joe works on Equities too. Again, total productivity would be impacted but what is more important: total productivity or rapid delivery?

If efficiency is your objective then all is well, simply understand the relative efficiency of individuals and do the maths. (Except of course, understanding the efficiency of any individual isn’t that straight forward.) Adding time dependent features complicates things, the comparative advantage model helps show the cost of urgency although it cannot answer the question.

It is entirely possible, even likely, that efficiency is not the only concern, it may not even be the primary concern. Rather the timeliness of feature delivery may be more important.

Specifically, I have assumed that all features are about the same effort but I’ve assumed they are also the same value. Efficiency has been measured as quantity of units produced is a poor measurement compared with efficiency in value delivered. I’ll turn my attention to value in the next blog.

But before I leave this post, one more assumption to surface.

In this model Joe and Jenny are completely independent. There work does not impact the other and they share no resources. What if they did?

What if both Joe and Jenny handed their completed work to the same Tester? Or they both needed use of s single test environment? Or their work needed to be bundled into a common release?

In such cases the shared resource – the tester, the environment, the release schedule – would become the constraint on productivity. This is getting towards Theory of Constraints space.

For Joe and Jenny to work at their most productive not only would that bottleneck need enough capacity to service them both it would actually need more capacity to cope with the variation and peak load (when Jenny and Joe delivered at the same time.)

Providing that extra capacity at the bottleneck would allow Joe and Jenny to work at their maximum throughput but would introduce waste because the extra capacity would sometimes be idle. To tackle that question one needs a far more complex theory: Queuing Theory – which I’ve discussed in previous posts, Utilisation and non-core team members and Kanban: efficient or predictable, you decide.

Read more? Subscribe to my newsletter – free updates on blog post, insights, events and offers.

The post Who should work on what? – Comparative advantage part 1 appeared first on Allan Kelly Associates.

Finally On A Calculus Of Differences – student

student from thus spake a.k.

My fellow students and I have spent much of our spare time this past year investigating the similarities between the calculus of functions and that of sequences, which we have defined for a sequence sn with the differential operator

  Δ sn = sn - sn-1

and the integral operator
  n
  Δ-1 sn = Σ si
  i = 1
where Σ is the summation sign, adopting the convention that terms with non-positive indices equate to zero.

We have thus far discovered how to differentiate and integrate monomial sequences, found product and quotient rules for differentiation, a rule of integration by parts and figured solutions to some familiar-looking differential equations, all of which bear a striking resemblance to their counterparts for functions. To conclude our investigation, we decided to try to find an analogue of Taylor's theorem for sequences.

Wit Limits

Chris Oldwood from The OldWood Thing

I’ve used the lightning talks at the last two ACCU conferences as a means of subjecting a captive audience to my dreadful array of programming / IT / geek one liners. (My previous two ACCU stand-up routines are published on this blog as “The Daily Stand-Up” and “Stand-Up and Deliver”.) This year was no different, but I wasn’t sure if I had enough “decent” new or unused material to survive the whole 5 minutes; unluckily for the audience I had...

Hence, here are the 34 one-liners I delivered under the title “Wit Limits”  [1] at this year’s ACCU conference:

“I thought it was odd when the doctor prescribed ‘programming’ to help me cope with my migraine; then I realised he said ‘codeine’.”

“These news reports of drone strikes are quite disturbing, but what I don’t understand is why we allowed delivery bots to form unions in the first place.”

“When we have chips at the seaside and I run out of ketchup I like to go round dipping them in other people’s. I call it crowd saucing.”

“The marketing department said we needed to be more disruptive, so I dropped the production database and deleted all the source code.”

“Our product doesn’t have a road map, it has a star map. Each release depends on whatever new shiny thing the developers become infatuated with next.”

“We’ve recently started using CRC cards. We now add a 32-bit checksum to each user story to stop the product owner messing with it mid-sprint.”

“Our Scrum Master is forever asking what we did yesterday, what we’re doing today, and what our impediments are. He’s a big fan of continuous interrogation.”

“I’ve always been envious of the autonomy granted to James Bond, but I guess that’s what you get when you’re M-powered.”

“Teams that refuse to do planning poker have really gone up in my estimation.”

“I’ve always felt it’s important to allow slack time in a schedule. I mean, how else are you going to keep up with all the instant messages?”

“The problem with people who are Prince certified is that they want to manage projects like it’s 1999.”

“Someone recently told me there is a new build system written entirely in F#, but I reckon it’s just Fake news.”

“I know he invented object-orientation, but was the Hexagonal Architecture also invented by Alan Key?”

“Guido seemed somewhat subdued when I asked him about how the Python enhancement process was going, so I gave him a PEP talk.”

“I recently went to see beauty and the beast; a system where the back-end was written in Python and the front-end in JavaScript.”

“I once worked at an online china shop. The CEO said we needed to move fast and break things, so I hired a bull.”

“The problem with Amazon’s Dynamo DB is that it stops working when they stop peddling it.”

“Companies that securely store my important data in offsite data centres really get my back up.”

“Vampires never use database replication as they can’t see their data in the mirror.”

“The other day a sysadmin asked me what I was using to provision hardware; he said that he was using Terraform. I replied, ‘Application Form’.”

“Whenever I provision some new hardware I like to do it in batches of a hundred. My motto is ‘infra-penny, infra-pound’.”

“Calvin Klein once offered me a modelling contract but I had to turn it down when I discovered they still used Rational Rose.”

“The other day I felt really uncomfortable after we had a massive disagreement about whether to use dashes or slashes to prefix our console app switches. I hate command line arguments.”

“I like to think of myself as a pragmatist. When the code doesn’t compile due to warnings, I just pragma them out.“

“I reckon Vim should be classified as a Class A drug on the grounds that it’s impossible to quit.”

“I’m pretty disappointed that my ZX81 based mule racing game keeps falling over. I guess I shouldn’t have called it 1K Donkey.”

“Surely to create safe self-driving cars we first have to solve the Halting Problem?”

“Never use someone that can’t write regular expressions to perform jobs interviews – they tend to be a bad judge of character.”

“When Robocop eats breakfast in the morning does he use his cereal port?”

“If you hit the Levis REST API twice, on endpoints they haven’t implemented, you’ll get a pair of 501’s.”

“The last time my wife and I tried to plait my daughter’s hair concurrently it ended in dreadlock.”

“Someone has been sending me tiny photos of my bank’s login page. I think I’m being subjected to a micro-fiching attack.”

“The last time I hired a rowing boat I could turn left and turn right, but not move forwards or backwards. I reckon it must have had exclusive oars.”

“I’ve always felt it’s important that my kids are well grounded so when they go to bed at night I attach a wire from their ear to the radiator.”

 

[1] I also used this title for an “agile” focused routine at Agile in the City: Birmingham the month before. However the less said about this performance the better...

I’m delighted – I’m in the 20 TOP Agile Blogs

Allan Kelly from Allan Kelly Associates

I’m delighted, this blog has been listed in the “20 TOP Agile Blogs for Scrum Masters (2017 edition)”.

I recognise most of the other bloggers in this list and frankly it is an honour to be classed with them.

(The news also gives me something to publish in this blog be because I’m real and truly stalled on next economics piece! Requires some analysis.)

The post I’m delighted – I’m in the 20 TOP Agile Blogs appeared first on Allan Kelly Associates.

Running Emacs from inside Emacs

Timo Geusch from The Lone C++ Coder's Blog

I’m experimenting with screen recordings at the moment and just out of curiosity decided to see if I can load and edit a text file inside the main Emacs process from inside an ansi-term using emacsclient. Spoiler alert – yes, you can. At least the way it is set up on my system, emacsclient doesn’t […]

The post Running Emacs from inside Emacs appeared first on The Lone C++ Coder's Blog.

Running Emacs from inside Emacs

The Lone C++ Coder's Blog from The Lone C++ Coder's Blog

I’m experimenting with screen recordings at the moment and just out of curiosity decided to see if I can load and edit a text file inside the main Emacs process from inside an ansi-term using emacsclient. Spoiler alert - yes, you can. At least the way it is set up on my system, emacsclient doesn’t play with text mode (-nw) as it doesn’t recognise eterm-color as a valid terminal type, but loading and editing the file into the GUI works flawlessly.