## Multiple estimates for the same project

The first question I ask, whenever somebody tells me that a project was delivered on schedule (or within budget), is which schedule (or budget)?

New schedules are produced for projects that are behind schedule, and costs get re-estimated.

What patterns of behavior might be expected to appear in a project’s reschedulings?

It is to be expected that as a project progresses, subsequent schedules become successively more accurate (in the sense of having a completion date and cost that is closer to the final values). The term cone of uncertainty is sometimes applied as a visual metaphor in project management, with the schedule becoming less uncertain as the project progresses.

The only publicly available software project rescheduling data, from Landmark Graphics, is for completed projects, i.e., cancelled projects are not included (121 completed projects and 882 estimates).

The traditional project management slide has some accuracy metric improving as work on a project approaches completion. The plot below shows the percentage of a project completed when each estimate is made, against the ratio ; the y-axis uses a log scale so that under/over estimates appear symmetrical (code+data):

The closer a point to the blue line, the more accurate the estimate. The red line shows maximum underestimation, i.e., estimating that the project is complete when there is still more work to be done. A new estimate must be greater than (or equal) to the work already done, i.e., , and .

Rearranging, we get: (plotted in red). The top of the ‘cone’ does not represent managements’ increasing certainty, with project progress, it represents the mathematical upper bound on the possible inaccuracy of an estimate.

In theory there is no limit on overestimating (i.e., points appearing below the blue line), but in practice management are under pressure to deliver as early as possible and to minimise costs. If management believe they have overestimated, they have an incentive to hang onto the time/money allocated (the future is uncertain).

Why does management invest time creating a new schedule?

If information about schedule slippage leaks out, project management looks bad, which creates an incentive to delay rescheduling for as long as possible (i.e., let’s pretend everything will turn out as planned). The Landmark Graphics data comes from an environment where management made weekly reports and estimates were updated whenever the core teams reached consensus (project average was eight times).

The longer a project is being worked on, the greater the opportunity for more unknowns to be discovered and the schedule to slip, i.e., longer projects are expected to acquire more re-estimates. The plot below shows the number of estimates made, for each project, against the initial estimated duration (red/green) and the actual duration (blue/purple); lines are loess fits (code+data):

What might be learned from any patterns appearing in this data?

When presented with data on the sequence of project estimates, my questions revolve around the reasons for spending time creating a new estimate, and the amount of time spent on the estimate.

A lot of time may have been invested in the original estimate, but how much time is invested in subsequent estimates? Are later estimates simply calculated as a percentage increase, a politically acceptable value (to the stakeholder funding for the project), or do they take into account what has been learned so far?

The information needed to answer these answers is not present in the data provided.

However, this evidence of the consistent provision of multiple project estimates drives another nail in to the coffin of estimation research based on project totals (e.g., if data on project estimates is provided, one estimate per project, were all estimates made during the same phase of the project?)

## Multiple estimates for the same project

The first question I ask, whenever somebody tells me that a project was delivered on schedule (or within budget), is which schedule (or budget)?

New schedules are produced for projects that are behind schedule, and costs get re-estimated.

What patterns of behavior might be expected to appear in a project’s reschedulings?

It is to be expected that as a project progresses, subsequent schedules become successively more accurate (in the sense of having a completion date and cost that is closer to the final values). The term cone of uncertainty is sometimes applied as a visual metaphor in project management, with the schedule becoming less uncertain as the project progresses.

The only publicly available software project rescheduling data, from Landmark Graphics, is for completed projects, i.e., cancelled projects are not included (121 completed projects and 882 estimates).

The traditional project management slide has some accuracy metric improving as work on a project approaches completion. The plot below shows the percentage of a project completed when each estimate is made, against the ratio ; the y-axis uses a log scale so that under/over estimates appear symmetrical (code+data):

The closer a point to the blue line, the more accurate the estimate. The red line shows maximum underestimation, i.e., estimating that the project is complete when there is still more work to be done. A new estimate must be greater than (or equal) to the work already done, i.e., , and .

Rearranging, we get: (plotted in red). The top of the ‘cone’ does not represent managements’ increasing certainty, with project progress, it represents the mathematical upper bound on the possible inaccuracy of an estimate.

In theory there is no limit on overestimating (i.e., points appearing below the blue line), but in practice management are under pressure to deliver as early as possible and to minimise costs. If management believe they have overestimated, they have an incentive to hang onto the time/money allocated (the future is uncertain).

Why does management invest time creating a new schedule?

If information about schedule slippage leaks out, project management looks bad, which creates an incentive to delay rescheduling for as long as possible (i.e., let’s pretend everything will turn out as planned). The Landmark Graphics data comes from an environment where management made weekly reports and estimates were updated whenever the core teams reached consensus (project average was eight times).

The longer a project is being worked on, the greater the opportunity for more unknowns to be discovered and the schedule to slip, i.e., longer projects are expected to acquire more re-estimates. The plot below shows the number of estimates made, for each project, against the initial estimated duration (red/green) and the actual duration (blue/purple); lines are loess fits (code+data):

What might be learned from any patterns appearing in this data?

When presented with data on the sequence of project estimates, my questions revolve around the reasons for spending time creating a new estimate, and the amount of time spent on the estimate.

A lot of time may have been invested in the original estimate, but how much time is invested in subsequent estimates? Are later estimates simply calculated as a percentage increase, a politically acceptable value (to the stakeholder funding for the project), or do they take into account what has been learned so far?

The information needed to answer these answers is not present in the data provided.

However, this evidence of the consistent provision of multiple project estimates drives another nail in to the coffin of estimation research based on project totals (e.g., if data on project estimates is provided, one estimate per project, were all estimates made during the same phase of the project?)

## [fix dev diary] Week 5:

In my dev diary blog post series, I document the minutiae of what I am doing for my toy project Fix. The diary can also be found in smaller bites […]

The post [fix dev diary] Week 5: appeared first on Simplify C++!.

## Unplanned work after the sprint starts?

“Should unplanned work be allowed after the sprint starts?”

One of those questions which comes up again and again. And it came up last week when I visited a clients offices – yes I actually visited a client! The answer to this question is, as often happens: It depends. So let me give you my thinking.

First, many teams have a rule that work must be scheduled in the sprint planning meeting, after which this is fixed. Teams have a right to make this rule so if this is a team rule – what Kanban folk call a policy – then work is not allowed in.

This rule is based on a strict interpretation of Scrum. The thinking – particularly in early implemenations of Scrum – was that changing priorities was a big problem for teams and therefore fixing the work to be done for a few weeks made sense. In the event of that things did change the team would declare an “abnormal termination of sprint” and move to start a new sprint with new priorities.

Now for some teams this makes complete sense. Barring work from entering the sprint after planning makes complete sense. Equally it makes sense for team members to only do work scheduled in the sprint and refuse all other work. So, it depends… when a team is troubled by new work appearing, priorities changing, and when a team are expected to deliver something new – when their overarching priority is not support but building something new – then this approach makes complete sense.

But, don’t follow this rule just because you think Scrum says so. I just had a quick look at the latest Scrum Guide doesn’t actually mention abnormal termination of sprint. It does say “No changes are made that would endanger the Sprint Goal” which then leads us into a conversation about the sprint goal but let’s hold that for now.

Now ring fencing the team and the sprint like this solves one set of problems but it creates another set. If the team are aiming to be reactive why would they not pick up work?

And as teams increasingly become DevOps or SecDevOps, or BizDev, or whatever, things get more complicated. It would be irresponsible to hold a “no work enters the sprint” if the live server was down or a security hole had been found. But at the same time, being hyper-reactive has a downside because the team would be constant distracted.

Ultimately it is the Product Owner who should have the final say on whether work is unplanned work is accepted or not but when you have a customer on the phone someone else may be forced into a decision.

I apply two tests: is the unplanned work really urgent? – or could it wait a few days and be considered in the next sprint. (Or even queued in the backlog for longer.)

Second, is the unplanned work valuable? – namely, is it more valuable than the work the team are doing and would be displaced by this work. Ideally it would valuable enough to justify the disruption it causes by late entry too.

Hence I like to talk about urgent but valuable unplanned work. Just because something appears after sprint planning doesn’t mean it is not valuable. If the work is urgent, and if it is valuable enough, then it deserves to enter the sprint and get done.

However I like to build in two feedback loops. First, as the work arises, does the person raising the work understand the disruption this will cause? Are they prepared to accept that some other work may not get done?

I like to make this real: pull up your board and show the requester the consequences. Let them prioritise the work against the current planned work. This can make the unplanned work go away.

Second, mark the late-breaking work so you can track it through the system – on a physical board I would use a yellow card. At the end of the sprint review how many yellow cards you have and talk about whether the right decision was made.

Over time, as you build up your data – and stock of done yellow cards – you can reason about the cards and decide your long term action. For example,

You might want to make an allowance in sprint planning for unplanned work: suppose your team averages 3 yellow cards a sprint, then, when you are planning the sprint allow space for them.

If you have many yellow cards regularly you might even want to move to a Kanban model or split the team.

Review the requests, what are the common themes? – is there a module which is particularly troublesome, would some remedial work help reduce the unplanned work.

Or is there someone in particular who raises unplanned work? Should the team leaders talk to this person and see if they could change their behaviour, perhaps they could make their requests a few days earlier.

Maybe you want to ring-fence a team member to deal with unplanned work while the rest of the team pushes on with the main work.

As I said at the top of this post, the unplanned work question comes up a lot. I discussed it in Xanpan so if you want more examples go there. And if you have any other suggestions please comment on this post.

The post Unplanned work after the sprint starts? appeared first on Allan Kelly.

## Unplanned work after the sprint starts?

“Should unplanned work be allowed after the sprint starts?”

One of those questions which comes up again and again. And it came up last week when I visited a clients offices – yes I actually visited a client! The answer to this question is, as often happens: It depends. So let me give you my thinking.

First, many teams have a rule that work must be scheduled in the sprint planning meeting, after which this is fixed. Teams have a right to make this rule so if this is a team rule – what Kanban folk call a policy – then work is not allowed in.

This rule is based on a strict interpretation of Scrum. The thinking – particularly in early implemenations of Scrum – was that changing priorities was a big problem for teams and therefore fixing the work to be done for a few weeks made sense. In the event of that things did change the team would declare an “abnormal termination of sprint” and move to start a new sprint with new priorities.

Now for some teams this makes complete sense. Barring work from entering the sprint after planning makes complete sense. Equally it makes sense for team members to only do work scheduled in the sprint and refuse all other work. So, it depends… when a team is troubled by new work appearing, priorities changing, and when a team are expected to deliver something new – when their overarching priority is not support but building something new – then this approach makes complete sense.

But, don’t follow this rule just because you think Scrum says so. I just had a quick look at the latest Scrum Guide doesn’t actually mention abnormal termination of sprint. It does say “No changes are made that would endanger the Sprint Goal” which then leads us into a conversation about the sprint goal but let’s hold that for now.

Now ring fencing the team and the sprint like this solves one set of problems but it creates another set. If the team are aiming to be reactive why would they not pick up work?

And as teams increasingly become DevOps or SecDevOps, or BizDev, or whatever, things get more complicated. It would be irresponsible to hold a “no work enters the sprint” if the live server was down or a security hole had been found. But at the same time, being hyper-reactive has a downside because the team would be constant distracted.

Ultimately it is the Product Owner who should have the final say on whether work is unplanned work is accepted or not but when you have a customer on the phone someone else may be forced into a decision.

I apply two tests: is the unplanned work really urgent? – or could it wait a few days and be considered in the next sprint. (Or even queued in the backlog for longer.)

Second, is the unplanned work valuable? – namely, is it more valuable than the work the team are doing and would be displaced by this work. Ideally it would valuable enough to justify the disruption it causes by late entry too.

Hence I like to talk about urgent but valuable unplanned work. Just because something appears after sprint planning doesn’t mean it is not valuable. If the work is urgent, and if it is valuable enough, then it deserves to enter the sprint and get done.

However I like to build in two feedback loops. First, as the work arises, does the person raising the work understand the disruption this will cause? Are they prepared to accept that some other work may not get done?

I like to make this real: pull up your board and show the requester the consequences. Let them prioritise the work against the current planned work. This can make the unplanned work go away.

Second, mark the late-breaking work so you can track it through the system – on a physical board I would use a yellow card. At the end of the sprint review how many yellow cards you have and talk about whether the right decision was made.

Over time, as you build up your data – and stock of done yellow cards – you can reason about the cards and decide your long term action. For example,

You might want to make an allowance in sprint planning for unplanned work: suppose your team averages 3 yellow cards a sprint, then, when you are planning the sprint allow space for them.

If you have many yellow cards regularly you might even want to move to a Kanban model or split the team.

Review the requests, what are the common themes? – is there a module which is particularly troublesome, would some remedial work help reduce the unplanned work.

Or is there someone in particular who raises unplanned work? Should the team leaders talk to this person and see if they could change their behaviour, perhaps they could make their requests a few days earlier.

Maybe you want to ring-fence a team member to deal with unplanned work while the rest of the team pushes on with the main work.

As I said at the top of this post, the unplanned work question comes up a lot. I discussed it in Xanpan so if you want more examples go there. And if you have any other suggestions please comment on this post.

The post Unplanned work after the sprint starts? appeared first on Allan Kelly Associates.

Think of this as a free offer, let me look at your data and I’ll tel you if I find anything interesting.

When I work with clients I often download the Jira data and crunch the data in Excel to see if I can find any patterns or any information in the mass of tickets and dates. I know there are tools out there which will do this but I’m never quite sure what these tools are telling me so I like to do it myself. Also its a bit of a “fishing trip” – I don’t know what I might find. Having done this a few time I’ve developed a bit of a pattern myself – nothing i can describe yet but who knows.

So, if you would like me to crunch your data please send it over. I say Jira but I’m happy to work with data from any other systems – I’ll learn something new

You will need to export all the issues as a CSV or Excel file. And I suggest you anonymise the data, just delete the columns with names and even delete the card description. The more you can send me the better, but the columns that interest me most have to do with dates (created and closed), ticket types (story, bug, task/sub-task, etc.), status and, if they are recorded, estimates and actual times.

I won’t share the data with anyone else – I’ll even delete it when I am finished if you wish. I would like to document some of my findings in a blog post but I can give you first sight if you like.

Apart from find patterns and perhaps learning something what interests me is what I might be able to tell about a team I know nothing about. It is an experiment. I’m allan AT allankelly.net – or use the contact page.

Think of this as a free offer, let me look at your data and I’ll tel you if I find anything interesting.

When I work with clients I often download the Jira data and crunch the data in Excel to see if I can find any patterns or any information in the mass of tickets and dates. I know there are tools out there which will do this but I’m never quite sure what these tools are telling me so I like to do it myself. Also its a bit of a “fishing trip” – I don’t know what I might find. Having done this a few time I’ve developed a bit of a pattern myself – nothing i can describe yet but who knows.

So, if you would like me to crunch your data please send it over. I say Jira but I’m happy to work with data from any other systems – I’ll learn something new

You will need to export all the issues as a CSV or Excel file. And I suggest you anonymise the data, just delete the columns with names and even delete the card description. The more you can send me the better, but the columns that interest me most have to do with dates (created and closed), ticket types (story, bug, task/sub-task, etc.), status and, if they are recorded, estimates and actual times.

I won’t share the data with anyone else – I’ll even delete it when I am finished if you wish. I would like to document some of my findings in a blog post but I can give you first sight if you like.

Apart from find patterns and perhaps learning something what interests me is what I might be able to tell about a team I know nothing about. It is an experiment. I’m allan AT allankelly.net – or use the contact page.

## TIL that org-mode has an exporter for ODR

I’m by no means an Emacs org-mode power user - in fact, anything but - but I do use org-mode a lot for note taking and also when I need an outliner to try and arrange ideas in a suitable manner. It excels at both, and usually does what I need including exporting to HTML. Exporting to HTML covers about 90% of my use cases. As much as I’d like to, LaTeX does not feature in my needs, but I needed to export an org-mode file for use with Microsoft Word.

Readability, as applied to software development today, is a meaningless marketing term. Readability is promoted as a desirable attribute, and is commonly claimed for favored programming languages, particular styles of programming, or ways of laying out source code.

Whenever somebody I’m talking to, or listening to in a talk, makes a readability claim, I ask what they mean by readability, and how they measured it. The speaker invariably fumbles around for something to say, with some dodging and weaving before admitting that they have not measured readability. There have been a few studies that asked students to rate the readability of source code (no guidance was given about what readability might be).

If somebody wanted to investigate readability from a scientific perspective, how might they go about it?

The best way to make immediate progress is to build on what is already known. There has been over a century of research on eye movement during reading, and two model of eye movement now dominate, i.e., the E-Z Reader model and SWIFT model. Using eye-tracking to study developers is slowly starting to be adopted by researchers.

Our eyes don’t smoothly scan the world in front of us, rather they jump from point to point (these jumps are known as a saccade), remaining fixed long enough to acquire information and calculate where to jump next. The image below is an example from an eye tracking study, where subjects were asking to read a sentence (see figure 770.11). Each red dot appears below the center of each saccade, and the numbers show the fixation time (in milliseconds) for that point (code):

Models of reading are judged by the accuracy of their predictions of saccade landing points (within a given line of text), and fixation time between saccades. Simulators implementing the E-Z Reader and SWIFT models have found that these models have comparable performance, and the robustness of these models are compared by looking at the predictions they make about saccade behavior when reading what might be called unconventional material, e.g., mirrored or scarmbeld text.

Studies have found that fixation duration increases with text difficulty (it is also affected by decreases with word frequency and word predictability).

It has been said that attention is the window through which we perceive the world, and our attention directs what we look at.

A recent study of the SWIFT model found that its predictions of saccade behavior, when reading mirrored or inverted text, agreed well with subject behavior.

I wonder what behavior SWIFT would predict for developers reading a line of code where the identifiers were written in camelCase or using underscores (sometimes known as snake_case)?

If the SWIFT predictions agreed with developer saccade behavior, a raft of further ‘readability’ tests spring to mind. If the SWIFT predictions did not agree with developer behavior, how might the model be updated to support the reading of lines of code?

Until recently, the few researchers using eye tracking to investigate software engineering behavior seemed to be having fun playing with their new toys. Things are starting to settle down, with some researchers starting to pay attention to existing models of reading.

What do I predict will be discovered?

Lots of studies have found that given enough practice, people can become proficient at handling some apparently incomprehensible text layouts. I predict that given enough practice, developers can become equally proficient at most of the code layout schemes that have been proposed.

The important question concerning text layout, is: which one enables an acceptable performance from a wide variety of developers who have had little exposure to it? I suspect the answer will be the one that is closest to the layout they have had the most experience,i.e., prose text.

Readability, as applied to software development today, is a meaningless marketing term. Readability is promoted as a desirable attribute, and is commonly claimed for favored programming languages, particular styles of programming, or ways of laying out source code.

Whenever somebody I’m talking to, or listening to in a talk, makes a readability claim, I ask what they mean by readability, and how they measured it. The speaker invariably fumbles around for something to say, with some dodging and weaving before admitting that they have not measured readability. There have been a few studies that asked students to rate the readability of source code (no guidance was given about what readability might be).

If somebody wanted to investigate readability from a scientific perspective, how might they go about it?

The best way to make immediate progress is to build on what is already known. There has been over a century of research on eye movement during reading, and two model of eye movement now dominate, i.e., the E-Z Reader model and SWIFT model. Using eye-tracking to study developers is slowly starting to be adopted by researchers.

Our eyes don’t smoothly scan the world in front of us, rather they jump from point to point (these jumps are known as a saccade), remaining fixed long enough to acquire information and calculate where to jump next. The image below is an example from an eye tracking study, where subjects were asking to read a sentence (see figure 770.11). Each red dot appears below the center of each saccade, and the numbers show the fixation time (in milliseconds) for that point (code):

Models of reading are judged by the accuracy of their predictions of saccade landing points (within a given line of text), and fixation time between saccades. Simulators implementing the E-Z Reader and SWIFT models have found that these models have comparable performance, and the robustness of these models are compared by looking at the predictions they make about saccade behavior when reading what might be called unconventional material, e.g., mirrored or scarmbeld text.

Studies have found that fixation duration increases with text difficulty (it is also affected by decreases with word frequency and word predictability).

It has been said that attention is the window through which we perceive the world, and our attention directs what we look at.

A recent study of the SWIFT model found that its predictions of saccade behavior, when reading mirrored or inverted text, agreed well with subject behavior.

I wonder what behavior SWIFT would predict for developers reading a line of code where the identifiers were written in camelCase or using underscores (sometimes known as snake_case)?

If the SWIFT predictions agreed with developer saccade behavior, a raft of further ‘readability’ tests spring to mind. If the SWIFT predictions did not agree with developer behavior, how might the model be updated to support the reading of lines of code?

Until recently, the few researchers using eye tracking to investigate software engineering behavior seemed to be having fun playing with their new toys. Things are starting to settle down, with some researchers starting to pay attention to existing models of reading.

What do I predict will be discovered?

Lots of studies have found that given enough practice, people can become proficient at handling some apparently incomprehensible text layouts. I predict that given enough practice, developers can become equally proficient at most of the code layout schemes that have been proposed.

The important question concerning text layout, is: which one enables an acceptable performance from a wide variety of developers who have had little exposure to it? I suspect the answer will be the one that is closest to the layout they have had the most experience,i.e., prose text.