While the content is very different from my original thoughts, 10-years ago, the original aim of discussing all the publicly available software engineering data has been carried through (in some cases more detailed data, in greater quantity, has supplanted earlier less detailed/smaller datasets).
The aim of only discussing a topic if public data is available, has been slightly bent in places (because I thought data would turn up, and it didn’t, or I wanted to connect two datasets, or I have not yet deleted what has been written).
The outcome of these two aims is that the flow of discussion is very disjoint, even disconnected. Another reason might be that I have not yet figured out how to connect the material in a sensible way. I’m the first person to go through this exercise, so I have no idea where it’s going.
The roughly 620+ datasets is three to four times larger than I thought was publicly available. More data is good news, but required more time to analyse and discuss.
Depending on the quantity of issues raised, updates of the beta release will happen.
As always, if you know of any interesting software engineering data, please tell me.
The lockdown hasn’t effected my work environment that much. This is a picture of my home office, my garden office, my “man cave.” I am very lucky. When I’m not on site with clients I spend most of my time working here. Still, I miss visiting clients and conferences – although I did squeeze in Agile on the Beach New Zealand in the closing days of the old world.
A friend of mine works for Canonical – the Ubunto Linux people. The vast majority of the people at Ubunto work remotely. So while my friend lives in New Zealand he works with a team spread out across the world.
Once or twice a year, his whole team co-locates. They all fly together and work together for a couple of weeks, a sprint. Then they return home and work remotely with each other for another six months, then repeat. Once or twice I’ve seen Tim as he works in, or passes through, London. But as luck would have it the week I was in New Zealand he was on his travels.
In all this talk of how wonderful working-from home can be I don’t think there has been enough discussion about what makes a good experience and what makes it bad. The Ubunto story highlights a couple of points which I think are missing from much of the current discussion over remote/home working.
Firstly the team are equal: they are all remote.
Over the years I have observed big differences in the way teams which are all distributed operate and the way teams with only one or two members operate. When the bulk of a team are co-located and only one or two are remote there is an unequal relationship. My intuition says teams are better off when they are either all remote or all co-located. When five of the team are co-located and a sixth member is remote then things are more difficult.
The way lockdown came about this year teams went from one to the other overnight, literally. Which also means it was very egalitarian. Everyone was in the same position. That can only be a good thing.
The second thing I take from Ubunto is that face-to-face time is still valued. In fact, as we unlock I expect home working will be more common, more remote tools will be used but we will value co-locating and face-to-face contact all the more.
Where companies keep people remote I expect we will see the emergence of deliberate co-location. This might be a two week block like my friend or it might be a few hours or a whole day. I can easily imagine benefit from an agile team coming together for one day a fortnight to close out a sprint, run a retrospective and then plan the next sprint. (That will have a knock on effect in the office market as companies stop renting offices for years and rent meeting rooms by the day.)
In fact this will be essential. It is easier to build social capital – trust, comradeship – when you are face-to-face. For the last three months we have been running on the social capital and experience teams built up over years working together. The social capital account now needs toping up.
Existing teams, existing social capital and egalitarian distribution helped people work during the lockdowns. As we slowly move back to our offices that is going to change and we will need to make deliberate efforts to replenish social capital, keep people equal and build new teams.
New teams will benefit from working together face-to-face to get to know one another. Build some social capital and norms.
New employees in particular will need time together. Especially those, like fresh graduates, who are new to work altogether. Such people need to learn how to work. Doing that in a distributed environment is a challenge.
Those new to the workforce face an additional challenge: space.
Home working is OK for me, I have a garden big enough for an office (and I had the money to buy one.) How many people have been working of their kitchen table? or bedroom? Mixing your sleeping space with your work space can damage sleep patterns and add to stress.
How many new workers have that space?
I remember when I was a fresh graduate finding my way in the world I lived in house shares. I had a room in a shared house with similar people. Some of them I counted as friends but not all of them. Some of them had bad habits. Going to an office for work was important, staying in the house – especially when they were all staying in! – would have been too much.
Then there is the question of broadband: I have 200Mb fibre to the door but many people make do with a lot less and that can be variable. Not that 200Mb comes cheap…
If a company expect someone to work from home then I think they should pay for good broadband internet. They should also provide the equipment – I have heard of only a few employers who have given staff money to buy equipment for home.
In February and March many people all over the world found themselves having to work from home. Its been a better experience than many expected but I know people are flagging, I know people keen to get back to the office. In future I expect we’ll value face-to-face contact even as many people stay at home.
In the coming months people aren’t just going to walk back to the office and pick up where they left off. Work will be more varied. But now the initial pleasure and surprise of working for home is over we need to have a conversation about what companies need to do to support working from home in future and how it can be made sustainable.
The picture above is a time-value profile: it shows how value changes with time. It is a graphic illustration of cost of delay.
A fine wine might increase in value over time but most things – think product, project, feature or just story – decay with time. Having something today is worth more than having it next week.
I invented Time-Value profiles – although I’m happy to acknowledge Don Rienertsen’s influence. I’ve included time-value profiles in many presentations and courses (they are a key part of my value workshop) but oddly, while I’ve mentioned them in this blog before I’ve never described them. So here goes…
Imagine we want to build a feature for a product. Naturally we ask: “what is this worth?”
Money is the obvious way to measure value but strictly speaking money is not itself valuable – unless you happen to want small colourful pieces of paper or decorated lumps of metal. Money is a store of value. The value of money is not the money itself but what you can exchange the money for. And because money can be traded for a wide variety of things, which are themselves valuable, money is a useful medium for comparison and measurement.
So the question “what is this worth?” may be answered qualitatively (“a vaccine is valuable because it saves lives”) or quantitatively (“a vaccine is worth $10 trillion because it allows life to return to normal”). In order to compare competing opportunities and valuations, and in order to draw a graph, giving value a numerical quantity helps greatly.
A time-value profile shows quantitive value over time when value is measured numerically: maybe in hard money like dollars or yen, or an abstract measure like business points, wooden dollars or Atlantic shillings (I just invented that but it works).
The graph starts today: we say “If we had feature X today it would be worth 100,000 shillings”. Maybe it is worth 100,000 because that is what a customer would pay for it, or maybe because we could sell 100,000 units at 1 shilling each, or so on.
But we don’t have X today. “If we get feature X next month it will be worth 90,000 shillings.” One month delay, one month late to the customer, one month later on Amazon, costs 10,000 shillings.
“If feature X is 3 months away then it is worth less than 50,000 shillings.” And so on.
Now, the unit of value – dollars, francs, shillings, wood – is of little important. Sure $1,000,000 is very different to ₽1,000,000 (Russian roubles if you don’t know) but as long as you don’t mix currencies the actual currency is unimportant.
What is important is the shape of the curve and, especially, where abrupt changes happen. Look again at the graph above: between months two and three there is a sudden drop in value. That has scheduling implications.
Once you start to think about time-value profiles then it becomes obvious that value is a function of time and we need to understand what that function looks like for any given work – project, product, initiative, feature, story, just anything in fact.
It should also become clear that the question “how long will it take to build X” needs to be inverted: “how long have we got to build X?”, “how much of X could we build?” or “in the time we have what could create something to satisfy need X”
And then “how much of the available value can we capture?”. Having X might be worth 100,000 but having a half of X might still be worth 50,000 more than not having X.
As I’ve written before: to any given problem there are multiple solutions. Engineering is not about creating the best possible solution, it is about creating a solution within constraints – as my widgets exercise shows.
Not that I wish to ignore costs – and effort estimates – but they are secondary, and the subject of another blog. I’ll write more about this, and perhaps put something into a workshop, in the meantime my value workshop is the best place to find out more.
Which of the following two if-statements do you think will be processed by readers in less time, and with fewer errors, when given the value of x, and asked to specify the output?
// First - sequence of subexpressions
if (x > 0 && x < 10 || x > 20 && x < 30)
// Second - nested ifs
if (x > 0 && x < 10)
else if (x > 20 && x < 30)
Ok, the behavior is not identical, in that the else if-arm produces different output than the preceding if-arm.
The paper Syntax, Predicates, Idioms — What Really Affects Code Complexity? analyses the results of an experiment that asked this question, including more deeply nested if-statements, the use of negation, and some for-statement questions (this post only considers the number of conditions/depth of nesting components). A total of 1,583 questions were answered by 220 professional developers, with 415 incorrect answers.
Based on the coefficients of regression models fitted to the results, subjects processed the nested form both faster and with fewer incorrect answers (code+data). As expected performance got slower, and more incorrect answers given, as the number of intervals in the if-condition increased (up to four in this experiment).
I think short-term memory is involved in this difference in performance; or at least I can concoct a theory that involves a capacity limited memory. Comprehending an expression (such as the conditional in an if-statement) requires maintaining information about the various components of the expression in working memory. When the first subexpression of x > 0 && x < 10 || x > 20 && x < 30 is false, and the subexpression after the || is processed, there is no now forget-what-went-before point like there is for the nested if-statements. I think that the single expression form is consuming more working memory than the nested form.
Does the result of this experiment (assuming it is replicated) mean that developers should be recommended to write sequences of conditions (e.g., the first if-statement example) about as:
if (x > 0 && x < 10)
else if (x > 20 && x < 30)
Duplicating code is not good, because both arms have to be kept in sync; ok, a function could be created, but this is extra effort. As other factors are taken into account, the costs of the nested form start to build up, is the benefit really worth the cost?
Answering this question is likely to need a lot of work, and it would be a more efficient use of resources to address questions about more commonly occurring conditions first.
A commonly occurring use is testing a single range; some of the ways of writing the range test include:
if (x > 0 && x < 10) ...
if (0 < x && x < 10) ...
if (10 > x && x > 0) ...
if (x > 0 && 10 > x) ...
Does one way of testing the range require less effort for readers to comprehend, and be more likely to be interpreted correctly?
There have been some experiments showing that people are more likely to give correct answers to questions involving information expressed as linear syllogisms, if the extremes are at the start/end of the sequence, such as in the following:
A is better than B
B is better than C
and not the following (which got the lowest percentage of correct answers):
B is better than C
B is worse than A
Your author ran an experiment to find out whether developers were more likely to give correct answers for particular forms of range tests in if-conditions.
Out of a total of 844 answers, 40 were answered incorrectly (roughly one per subject; it was a paper and pencil experiment, so no timings). It's good to see that the subjects were so competent, but with so few mistakes made the error bars are very wide, i.e., too few mistakes were made to be able to say that one representation was less mistake-prone than another.
I hope this post has got other researchers interested in understanding developer performance, when processing if-statements, and that they will be running more experiments help shed light on the processes involved.
You will no doubt recall my telling you of my fellow students' and my latest pastime of employing Professor B------'s Experimental Clockwork Mathematical Apparatus to explore the behaviours of cellular automata, which may be thought of as simplistic mathematical simulacra of animalcules such as amoebas.
Specifically, if we put together an infinite line of imaginary boxes, some of which are empty and some of which contain living cells, then we can define a set of rules to determine whether or not a box will contain a cell in the next generation depending upon its own, its left and its right neighbours contents in the current one.
I was once responsible for coaching a Product Owner called Jac. It was product owner for an “enterprise product” – a product sold to big companies to use internally. Sprint to sprint Jac got to decide what got done. In my book Product Owner authority went further: what was to be done in future sprints, what kind of things would be done in months to come, and the “roadmap” where the product was going. But… Jac seldom met customers, Jac might talk to them on the phone when they had a problem but Jac didn’t talk to the more senior people who signed the purchase orders.
Being an enterprise product there were consultants in the mix too: installing the product on site, configure the product, train customers and hold their hands. They went to customer sites and they met all sorts of people. Naturally the consultants had a view on what the product should do, what should be developed next and what should be done in coming months.
And all a consultant had to do was name a customer “Anna at Credit…” or “The CFO of first…. told me” and you could just see the product owner being undermined. I used to groan internally, I hated it – I hated it for my PO – but they were right.
When you meet customers you carry extra authority.
And here I mean Customers, and/or Users, and/or Stakeholders, I mean: people – and it is people – who have expectations, needs or wants for your product. Even if they don’t know your product could help address those needs and wants.
And when I say “meet,” well yes, I really would like you to meet them, shake them by the hand and share a coffee. I think you are both more likely to open up, share more and understand more. If possible spend some time observing them work.
I’m not stupid, I know we all work from home now, I know physical meetings are more difficult than ever so please substitute “meet” with Zoom, Skype, WhatsApp or plain old telephone. But as soon as you can get out and meet people.
In meeting people Product Owners get two big benefits.
The most obvious benefit is that Product Owners understand their customers and their needs better. They can learn how the product could help more, or where the product trips them up. They can learn how the product is beneficial and valuable and how that value might be increased.
Product Owners will also learn things they might not like to hear, things which might not be said over the telephone: where the product stinks, what alternatives might be considered and where the product gets in the way.
“what the customer buys is seldom what the company sells” Peter Drucker
People don’t like giving bad news and it is easier to hide such facts phone call, let alone in an e-mail.
The second benefit, as described in my opening story is authority and legitimacy.
Product Owners are specialists in what customer problems need solving, how enhancements to the product can add value and therefore what needs to be done to the product. They can only do that when they meet customers.
Meeting customers brings authority, one can say “Anna at Credit Bank told me…”
Meeting customers brings legitimacy because most other people never interact with actual customers, or only interact with a small subset when they have problems.
And as my example shows: if a Product Owner is not meeting customers they leave themselves vulnerable to those who do. When someone else, like a consultant, comes along and says “Anna from … says… ” they have authority, if the PO can not say “Mez from … had a different perspective…”
Direct access to customers – speaking to them and visiting them – confers authority and legitimacy. Product Owners need that authority and the knowledge that comes with it.
Recommendations for/against particular programming constructs have one thing in common: there is no evidence backing up any of the recommendations. Running experiments to measure the impact of particular language features on developer performance is not something that researchers do (there have been a handful of experiments looking at the impact of strong typing on developer performance; the effect measured was tiny).
The experiment involved 180 workers on Mechanical Turk (to be accepted, workers had to correctly answer four or five questions about regular expressions). Workers/subjects performed two different tasks, matching and composition.
In the matching task workers saw a regex and a list of five strings, and had to specify whether the regex matched (or not) each string (there was also an unsure response).
In the composition task workers saw a regular expression, and had to create a string matched by this regex. Each worker saw 10 different regexs, which were randomly drawn from a set of 60 regexs (which had been created to be representative of various regex characteristics). I have not analysed this data yet.
What were the results?
For the matching task: given each of the pairs of regexs below, which one (of each pair) would you say workers were most likely to get correct?
The percentages correct for (1) were essentially the same, at 94.0 and 93.2 respectively. The percentages for (2) were 93.3 and 87.2, which is odd given that the regex is essentially the same as (1). Is this amount of variability in subject response to be expected? Is the difference caused by letters being much less common in text, so people have had less practice using them (sounds a bit far-fetched, but its all I could think of). The percentages for (3) are virtually identical, at 93.3 and 93.7.
The percentages for (4) were 58 and 73.3, which surprised me. But then I have been using regexs since before \D support was generally available. The MTurk generation have it easy not having to use the ‘hard stuff’
This matching data might be analysed using Item Response theory, which can take into account differences in question difficulty and worker/subject ability. The plot below looks complicated, but only because there are so many lines. Each numbered colored line is a different regex, worker ability is on the x-axis (greater ability on the right), and the y-axis is the probability of giving a correct answer (code+data; thanks to Peipei Wang for fixing the bugs in my code):
Yes, for question 51 the probability of a correct answer decreases with worker ability. Heads are being scratched about this.
There might be some patterns buried in amongst all those lines, e.g., particular kinds of patterns require a given level of ability to handle, or correct response to some patterns varying over the whole range of abilities. These are research questions, and this is a blog article: answers in the comments
This is the first experiment of its kind, so it is bound to throw up more questions than answers. Are more incorrect responses given for longer regexs, particularly if they cannot be completely held in short-term memory? It is convenient for the author to use a short-hand for a range of characters (e.g., a-f), and I was expecting a difference in performance when all the letters were enumerated (e.g., abcdef); I had theories for either one being less error-prone (I obviously need to get out more).
Final call: Adding Value to Stories, starts 5 June
New: User Stories Masterclass waitlist, 29 June – waitlist available
When started writing what became “Little Book of Requirements & User Stories” I thought I was writing a few thousand words about the relatively straight forward subject of User Stories in agile development. A “few thousand” became a dozen online articles for Agile Journal and a few tens of thousands of words in the book.
Now those same lessons form the basis for my online workshop “User Stories Masterclass”. Still I find new perspectives and insights on User Stories. I was wrong, there is more to the simple “as a… I want… so that…” than I ever thought possible.
But for once I think we need to draw some boundaries, we need to be clear on what User Stories are NOT.
User Stories are not promises or commitments. Stories are ideas. Stories are tokens for work that might be done. Little more.
Simply writing a User Story, even adding it to a backlog in no way commits anyone to doing anything. Indeed I regularly recommend deleting stories from a backlog. I hear from start-ups who tell me that 30% to 50% of stories are never done.
User Stories are not functional specifications, although if you add enough acceptance criteria they may start to look like that.
User Stories are not functional specifications because User Stories are not complete. They are intended as a “placeholder for a conversation” or “a token for work to be done”. If they were complete there would be no need for that conversation and they would be much more than a token. They are deliberately incomplete to allow the conversation to happen. If the story was complete what can a conversation add? There is more value in the conversation than the story.
User Stories are not unambiguous, or rather: User Stories may contain ambiguities. In the same way User Stories are not complete they can be ambiguous. Again, if stories were clean cut and unambiguous there would be little to talk about in the famous conversation.
And by the way, the User Story conversation is not necessarily a one-off event: if you have the conversation then later, say, during development or testing, then talk some more.
User Stories are not user documentation, a collection of stories does not make a User Guide. If you need to create a user guide then the stories might serve as a reminder of what the software does but using them as user documentation would make for a very poor user guide.
If the system you are building needs a User Guide then the act of writing the User Guide is probably a story itself, a token for work to be done:
As a new customer using the system for the first time
I want a guide to get started
So that I can quickly get the benefits of the new system
All the usual rules of user stories apply here: be specific about who is using the system and what they need, include a so that clause to explain why they want to do this, leave some space for discussion (do the really want a user guide or a quick start guide?) and ensure the story has value in itself.
Similarly, User Stories are not any sort of system documentation, e.g. architecture document, maintenance guide. Again these are work to be done and may be the subject of a User Story.
In general User Stories are not part of the contractual commitment a team enters into. User Stories are poorly suited to be contractual documentation. Because User Stories are not complete and because they are ambiguous they perform poorly when used contractually.
Similarly, User Stories are not good as a record of what has been done. They can be forced into this role if they are updated after they have been developed – and preferably delivered. But that requires more work.
As with any documentation – user guide, system guide, a record of what was done, etc. – you should always ask: what value does this documentation have? and who will complain if this document does not exist?
Documentation is not free. Indeed documentation is very expensive – both to write and to read. And documentation is itself work to be done. Which means: if someone is writing a document then they are not doing something else, e.g. if a developer is documenting the system they are not adding new functionality.
Even when there is a technical author on staff to write documentation other work is displaced. The money used to pay a technical author could be spent elsewhere, another programmer or a tester maybe.
Documentation is expensive. Documentation is another deliverable.
User Stories are not intended to form part of the deliverable. They are transient, they come into existence to represent work that needs doing. They change as understanding improves and can, even should, be thrown away when the work is done.
Rather one session a week this time the class will be one afternoon, or possibly one and a half afternoons. Tickets will be on-sale soon, in the meantime you can join the waitlist and be the first to know when tickets go on sale.
As most readers will already know – and everyone else will have guessed – Agile on the Beach 2020 isn’t happening. Gather than try and move the whole conference online we are running some smaller events. Two of these are now booking: an online talk and an online TDD workshop.
Sander Hoogendoorn will be giving his Agile on the Beach 2020 online on July 8 – yes, that is the day he would have been giving it live in Falmouth if you-know-what hadn’t happened.
Generics are a programming construct that allow an algorithm to be coded without specifying the types of some variables, which are supplied later when a specific instance (for some type(s)) is instantiated. Generics sound like a great idea; who hasn’t had to write the same function twice, with the only difference being the types of the parameters.
All of today’s major programming languages support some form of generic construct, and developers have had the opportunity to use them for many years. So, how often generics are used in practice?
In C++, templates are the language feature supporting generics.
As its name suggests, the Standard Template Library (STL) is a collection of templates implementing commonly used algorithms+other stuff (some algorithms were commonly used before the STL was created, and perhaps some are now commonly used because they are in the STL).
It is to be expected that most uses of templates will involve those defined in the STL, because these implement commonly used functionality, are documented and generally known about (code can only be reused when its existence is known about, and it has been written with reuse in mind).
The template instantiation measurements show a 17:1 ratio for STL vs. developer-defined templates (i.e., 149,591 vs. 8,887).
What are the usage characteristics of developer defined templates?
Around 25% of developer defined function templates are only instantiated once, while 15% of class templates are instantiated once.
Most templates are defined by a small number of developers. This is not surprising given that most of the code on a project is written by a small number of developers.
The plot below shows the percentage instantiations (of all developer defined function templates) of each developer defined function template, in rank order (code+data):
Lines are each a fitted power law, whose exponents vary between -1.5 and -2. Is it just me, or are these exponents surprising close?
The following is for developer defined class templates. Lines are fitted power law, whose exponents vary between -1.3 and -2.6. Not so close here.
What processes are driving use of developer defined templates?
Every project has its own specific few templates that get used everywhere, by all developers. I imagine these are tailored to the project, and are widely advertised to developers who work on the project.
Perhaps some developers don’t define templates, because that’s not what they do. Is this because they work on stuff where templates don’t offer much benefit, or is it because these developers are stuck in their ways (if so, is it really worth trying to change them?)