Rob Smallshire from Good With Computers
In 1968 Melvin Conway pointed out a seemingly inevitable symmetry
between organisations and the software systems they construct.
Organisations today are more fluid than 40 years ago, with short
developer tenure, and frequent migration of individuals between projects
and employers. In this article we’ll examine data on the tenure and
productivity of programmers and use this to gain insight into codebases,
by simulating their growth with simple stochastic models. From such
models, we can make important predictions about the maintainability and
long-term viability of software systems, with implications for how we
approach software design, documentation and how we assemble teams.
Legacy systems
I've always been interested in legacy software systems, primarily
because legacy software systems are those which have proven to be
valuable over time. The reason they become legacy – and get old – is
because they continue to be useful.
I have an interest in that as a software engineer, having worked on
legacy systems as an employee for various software product companies,
and more recently as a consultant with Sixty North, helping out with the
problems that such systems inevitably raise.
I called myself a "software engineer", although I use the term somewhat
loosely. To call what many developers do "engineering" is a bit of a
stretch. Engineer or not, my academic training was as a scientist, which
is perhaps reflected in the content of this article. Most readers will
be familar with the structure of the scientific method: We ask
questions. We formulate hypotheses which propose answers to those
questions. We design experiments to test our hypotheses. We collect data
from the experiments. And we draw conclusions from the data. This done,
having learned something about how our world works, we go round again.
I would like to be able to apply this powerful tool to what we do as
"software engineers" or developers. Unfortunately for our industry,
it's very difficult to do experimental science - still less randomised
controlled trials – on the process of software development, for a
whole host of reasons: Developers don't like to be watched. We can't
eliminate extraneous factors. The toy problems we use in experiments
aren't realistic. No two projects are the same. The subjects are often
students who have little experience.
Even on the rare occasions we do perform experiments, there are many
threats to validity of such experiments, so the results tend not be to
taken very seriously. Addressing the weaknesses of the experimental
design would be prohibitively expensive, if possible at all.
The role of models
Fortunately, there's another way of doing science, which doesn't rely on
the version of the scientific method just mentioned. It's the same type
of science we do in astronomy, or geology where we can't run experiments
because we don't have enough time, we don't have enough money, or we
just don't have anywhere big enough to perform the experiment.
Experimentally colliding galaxies, or experimenting with the initiation
of plate tectonics are simply in the realms of science fiction on the
money, time and space axes.
In such cases, we have to switch to a slightly different version of the
scientific method, which looks like this: We make a prediction about how
the universe works, where our 'universe' could be galactic collisions,
or the more prosaic world of software development. We then make a model
of that situation either through physical analogy or in a computer. By
executing this model we can predict the outcome based on the details of
a specific scenario. Lastly, we compare the results from the model with
reality and either reject the model completely if it just doesn't work,
or tune the model by refining, updating or tweaking it, until we have a
model that is a good match for reality.
The aim here, is to come up with a useful model. Models are by their
very nature simplifications or abstractions of reality. So long as we
bear this in mind, even a simple (although not simplistic) model can
have predictive power.
Essentially, all models are wrong, but some are useful
—George E. P. Box [Box, G. E. P., and Draper, N. R., (1987), Empirical Model Building and Response Surfaces, John Wiley & Sons, New York, NY.]
A model of software development
A key factor in modelling the development of legacy software systems is
the fact that although such systems may endure for decades - we
developers tend not to endure them for decades. In other words, the
tenure of software developers is typically much shorter than the life
span of software systems.
But how much shorter?
Whenever I speak publically on this topic with an audience of developers, I like
to perform a simple experiment with my audience. My assumption is that the
turnover of developers can be modelled as if developers have a half-life within
organizations. The related concept of residence time is probably a better
approach, but most people have a grasp of half-life, and it avoids a tedious
digression into explaining something that is ultimately tangiential to the main
discussion. In any case, a catchy hook is important when you're going for
audience participation, so half-life it is.
I start by asking everyone in the audience who has moved from working on
one codebase – for example a product – to another (including the
transition to working on their first codebase), at any time in the
preceding 32 years to raise their hands. This starting point is intended
to catch the vast majority of typical tech conference audience members,
and indeed it does, although arguably in the spirit of inclusiveness I
should start with 64 years. Now all of the audience have raised hands.
Next I ask the audience to keep their hands raised if this is still true
for 16 years: Most of the hands are still raised. Now eight years: Some
hands are going down. Now four years: A small majority of hands are
still raised. Then two years: At this point, a large minority still have
raised hands, but we have crossed the half-way threshold and established
that the 'half-life' of developers is somewhere between two and four
years. This result has been consistent on over 90% of the occasions I've
performed this experiment. The only notable deviation was a large
Swedish software consultancy where we established the half-life was
around six months!
In fact, what little research there has been into developer tenure
indicates that the half-life across the industry is about 3.2 years,
which fits nicely with what I see out in the field.
One way to think about this result concretely is as follows: If you work
on a team that numbers ten developers in total, you can expect half of
them - five in this case - to leave at some point in the next 3.2 years.
Obviously, if the team size is to remain constant, they will need to be
replaced.
Note that saying that turnover of half of a population of developers
will take 3.2 years is not the same as claiming that the average tenure
of a developer is 3.2 years. In fact, mean tenure will be \(3.2 / \\ln 2\)
which is about 4.6 years. You might want to compare that
figure against your own career so far.
If you're concerned that developers don't behave very much like
radionucleides then rest assured that the notion of half-life follows
directly from an assumption that the decay of a particle (or departure
of a developer) follows exponential decay, which again follows from the
notion of constant probability density with respect to time. All we're
saying is that in a given time interval there is a fixed probability
that are particle will decay (or a developer will depart), so it is
actually a very simple model.
Notably, the half-life of developers is shorter than the half-life of
almost anything else in our industry, including CEOs, lines of code,
megacorps or classes.
Productivity
If we're going to have a stab a modelling software developers as part of
the software development process, we're going to need some measure of
productivity. I'm going to use - and you can save your outrage for later
- lines of code. To repurpose a famous phrase by Winston Churchill:
"Lines of code is the worst programmer productivty metric, except for
all the others". Now I know, as well as you do, that what ultimately
matters to the customers of software systems is value for money, return
on investment, and all those other good things. The problem is, that
it's notoriously hard to tie any of those things back in a rigourous way
to what individual developers do on a day-to-day basis, which should be
design and code software systems based on an understanding of the
problem at hand. On the other hand, I think I'm on fairly safe ground in
assuming that software systems with zero lines of code deliver no value,
and proportionally more complex problems can be solved (and hence more
value delivered) by larger software systems.
Furthermore, there's some evidence that the number of lines of code cut
by a particular developer per day is fairly constant irrespective of
which programming language they're working in. So five lines of F# might
do twice as much 'work' as 10 lines of Python or 20 lines of C++. This
is an alternative phrasing of the notion of 'expressiveness' in
programming languages. This is why we tend to feel that expressiveness -
or semantic density - is important in programming languages. We can
often deliver as much value with 5 lines of F# as with 20 lines of C++,
yet it will take a quarter of the effort to put together.
Now, however you choose to measure productivity, not all developers are
equally productive on the same code base, and the same developer will
demonstrate different productivity on different code bases, even if they
are in the same programming language. In fact, as I'm sure many of us
have experienced, the principle control on our productivity is simply
the size of the code base at hand. Writing a hundred lines of code for
a small script is usually much less work than adding 100 lines to a one
million line system.
We can capture this variance by looking to what little literature there is on
the topic , and using this albeit sparse data to build some simple developer
productivity distributions.
For example, we know that on a small 10,000 line code base, the least
productive developer will produce about 2000 lines of debugged and
working code in a year, the most productive developer will produce about
29,000 lines of code in a year, and the typical (or average) developer
will produce about 3200 lines of code in a year. Notice that the
distribution is highly skewed toward the low productivity end, and the
multiple between the typical and most productive developers corresponds
to the fabled 10x programmer.
Given only these three numbers and in the absence of any more
information on the shape of the distribution, we'll follow a
well-trodden path and use them to erect a triangular probability density
function (PDF) characterised by the minimum, modal and maximum
productivity values. Based on this PDF it's straightforward to compute
the corresponding cumulative distribution function (CDF) which we can
use to construct simulated "teams" of developers, by using the CDF to
transform uniformly distributed samples on the cumulative probability
axis into samples on the producivity axis. In a real simulation where we
wanted to generate many typical teams, we would generate uniform random
numbers between zero and one and transform them into productivity values
using the CDF, although for clarity in the illustration that follows,
I've used evenly distributed samples from which to generate the
productivity values.
As you can see the resulting productivity values for a team of ten
developers cluster around the modal productity value, with comparitavely
few developers of very high productivity.
Perhaps more intuitively, software development of teams comprising ten
developers look like this:
This typical team has a only a couple of people being responsible for
the majority of the output. Again, it might be interesting to compare
this to your own situation. At the very least, it shows how the 'right'
team of two developers can be competitive with a much larger team; a
phenomenon you may have witnessed for yourselves.
Overall, this team produces about 90,000 lines of code in a year.
Incorporating growth of software
Of course, the story doesn't end there. Once our team has written 90,000
lines of code, they're no longer working on a 10,000 line code base,
they're working on a 100,000 line code base! This causes their
productivity to drop, so we now have a modified description of their
productivities and a new distribution from which to draw a productivity
if somebody new joins the team. But more of that in a moment. We don't
have much in the way of published data for productivity on different
sizes of code base, but we can interpolate between and extrapolate from
the data we do have, without any of the assumptions involved in such
extrapolation looking too outlandish. As you can see, we can put three
straight line through the minimums, modes and maximums respectively to
facilitate determination of a productivity distribution for any code
size up to about 10 million lines of code. (Note that we shouldn't infer
from these straight lines anything about the mathematical nature of how
productivity declines with increasing code size - there be dragons! )
When performing a simulation of growth of software in the computer, we
can get more accurate results by reducing the time-step on which we
adjust programmer productivity downwards from once per year as in the
example above, to just once per day: At the end of every simulated day,
we know how much code we have, so we can predict the productivity of our
developers on the following day, and so on.
Incorporating turnover
We've already stated our assumption that the probability of a developer
departing is constant per unit time, together with our half-life figure
of 3.2 years. Given this, it's straightforward to compute the
probability of a developer leaving on any given day, which is about
0.001, or once in every thousand days. As we all know, when a particular
developer leaves an organisation and is replaced by a new recruit,
there's no guarantee that their replacement will have the same level of
productivity. In this event, our simulation will draw a new developer at
random from the distribution of developer productivity for the current
code base size, so it's likely that a very low productivity developer
will be replaced with a higher productivity developer and that a very
high productivity developer will be replaced with a lower productivity
developer; an example of regression to mediocrity.
Simulating a project
With components of variance in developer productivity, its relationship
to code base size and a simple model of developer turnover we're ready
to run a simulation of a project. To do so, we initialize the model with
the number of developers in the development team, and set it running.
The simulator starts by randomly drawing a team of developers of the
required size from the productivity distribution for a zero-size code
base, and computes how much code they will have produced after one day.
At the end of the time step, the developer productivities are updated to
the new distribution; each developer's quantile within the distribution
remains fixed, but the underlying CDF is updated to yield a new
productivity value. The next time step for day two then begins, with
each developer producing a little less code than on the previous day.
On each day, there is a fixed probability that a developer will leave
the team. When this occurs, they are immediately replaced the following
day by a new hire whose productivity will be drawn anew from the
productivity distribution. For small teams, this case shift the overall
team productivity significantly and more often than not towards the
mean.
Let's look at an example: If we configure a simulation with a team of
seven developers, and let it run for five years, we get something like
this:
This figure has time running from left to right, and the coloured
streams show the growing contributions over time of individual
developers. We start on the left with no code and the original seven
developers in the team, from top to bottom sporting the colours brown,
orange, green, red, blue, purple and yellow. The code base grows quickly
at first, but soon slows. About 180 days into the project the purple
developer quits, indicated by a black, vertical bar across their stream.
From this point on, their contribution remains static and is shown in a
lighter shade. Vertically below this terminator we see a new stream
beginning, coloured pink, which represents the contribution of the
recruit who is purple's replacement. As you can see, they are about
three times more productive (measured in lines of code at least), than
their predecessor, although pink only sticks around for around 200 days
before moving on and being replaced by the upper blue stream.
In this particular scenario, at the end of the five year period, we find
our team of seven has churned through a grand total of 19 developers. In
fact the majority of the extant code was written by people no longer
with the organisation; only 37% of the code was programmed by people
still present at the end. This is perhaps motivation for getting
documentation in place as systems are developed, while the people who
are doing the development are still around, rather than at the end of
the effort - if at all - as is all to common.
Monte Carlo simulation
Being drawn randomly, each scenario such as the one outlines above is
different, although in aggregate they vary in a predictable way
according to the distributions we are using. The above scenario was
typical, insofar as it produced, compared to all identically configured
simulations, an average amount of code, although it did happen to get
through a rather high number of developers. Of course, individual
scenarios such as this, although interesting, can never be indicative of
what will actually happen. For that, we need to turn to Monte Carlo
modelling: Run many thousands of simulations - all with configurations
drawn randomly from identical distributions - and look at the results in
aggregate either graphically or using various statistical tools.
When we run 1000 simulations of a seven person project run over three
years, the following statistics emerge: We can expect our team of seven
to see four people leave and be replaced during the project. In fact,
the total number of contributors will be 11 ± 2 at one standard
deviation (1σ). The total body of code produced in three years will be
157,000 ± 23,000 @ 1σ. The proportion of the code written by
contributors present at the end will be 70% ± 14% @ 1σ.
Perhaps a more useful question might be to ask "How long is it likely to
take to produce 100,000 lines of code?" By answering this question for
each simulation, we can build a histogram (actually we use a kernel
density estimate here, to give a smooth, rather than binned, result).
Although this gives a good intuitive sense of when the team will reach
the 100 k threshold, a more useful chart is the cumulative distribution
of finishing time, which allows us to easily recognise that while there
is a probability of 20% of finishing in 330 days, for a much more secure
80% probability, we should allow for 470 days - some 42% longer and
correspondingly more costly.
Finally, looking at the proportion of the code base that was, at any
time, written by the current team, we see an exponential decline in this
fraction, leaving us with a headline figure of 20% after 20 years.
That's right, on a 20 year old code base only one fifth of the code will
have been created by the current team. This resonates with my own
experience, and quantitatively explains why working on large legacy
systems can be a lonely, disorienting and confusing experience.
A refinement of Conway's Law?
Any organization that designs a system (defined broadly) will
produce a design whose structure is a copy of the organization's
communication structure
—Melvin Conway
This remark, which has become known as Conway's Law, was later
interpreted, a little more playfully, by Eric Raymond as "If you have
four groups working on a compiler, you'll get a 4-pass compiler". My own
experience is that Conway's Law rings true, and I've often said that
"Conway's Law is the one thing we know about software engineering that
will still be true 1000 years from now".
However, over the long term development efforts which lead to large,
legacy sofware systems the structure and organisation of the system
isn't necessarily congruent with the organisation at present. After all,
we all know that reorganisations of people are all too frequent compared
to major reorganisation of software! Rather, the state of a system
reflects not only the organisation, but the organisational history and
the flow of people through those organisations over the long term. What
I mean is that the structure of the software reflects the organisational
structure integrated over time.
Simulations such as those presented in this article allow to to get a
sense of how large software systems as we see them today are the
fossilised footprints of developers past. Perhaps we can use this
improved, and quantitative, understanding to improve planning, costing
and ongoing guidance of large software projects.