Evidence-based software research requires access to data, and Github has become the primary source of raw material for many (most?) researchers.
Parallels are starting to emerge between today’s researchers exploring Github and biologists exploring nature centuries ago.
Centuries ago scientific expeditions undertook difficult and hazardous journeys to various parts of the world, collecting and returning with many specimens which were housed and displayed in museums and botanical gardens. Researchers could then visit the museums and botanical gardens to study these specimens, without leaving the comforts of their home country. What is missing from these studies of collected specimens is information on the habitat in which they lived.
Github is a living museum of specimens that today’s researchers can study without leaving the comforts of their research environment. What is missing from these studies of collected specimens is information on the habitat in which the software was created.
Github researchers are starting the process of identifying and classifying specimens into species types, based on their defining characteristics, much like the botanist Carl_Linnaeus identified stamens as one of the defining characteristics of flowering plants. Some of the published work reads like the authors did some measurements, spotted some differences, and then invented a plausible story around what they had found. As a sometime inhabitant of this glasshouse I will refrain from throwing stones.
The ecological definition of a biome, as the community of plants and animals that have common characteristics for the environment they exist in, maps to the end-user use of software systems. There does not appear to be a generic name for people who study the growth of plants and animals (or at least I cannot think of one).
There is only so much useful information that can be learned from studying specimens in museums, no matter how up to date the specimens are.
Studying the development and maintenance of software systems in the wild (i.e., dealing with the people who do it), requires researchers to forsake their creature comforts and undertake difficult and hazardous journeys into industry. While they are unlikely to experience any physical harm, there is a real risk that their egos will be seriously bruised.
I want to do what I can to prevent evidence-based software engineering from just being about mining Github. So I have a new policy for dealing with PhD/MSc student email requests for data (previously I did my best to point them at the data they sought). From now on, I will tell students that they need to behave like real researchers (e.g., Charles Darwin) who study software development in the wild. Charles Darwin is a great role model who should appeal to their sense of adventure (alternative suggestions welcome).
Letter I sent to my MP today on the overseas aid budget. Let’s not be foolish.
Dear Ben Spencer,
Please use your influence to persuade the government to maintain our overseas aid budget commitment at 0.7% of national income.
I believe that changing this policy would be a mistake, increasing the risks of extremism and forced migration around the world.
The policy was established when the budget was very tight, and I think the reasons for it remain compelling: to prevent selfishness and short-termism from hurting our own and others’ interests.
While it’s great that so much data was uncovered during the writing of the Evidence-based software engineering book, trying to locate data on a particular topic can be convoluted (not least because there might not be any). There are three sources of information about the data:
the paper(s) written by the researchers who collected the data,
my analysis and/or discussion of the data (which is frequently different from the original researchers),
the column names in the csv file, i.e., data is often available which neither the researchers nor I discuss.
At the beginning I expected there to be at most a few hundred datasets; easy enough to remember what they are about. While searching for some data, one day, I realised that relying on memory was not a good idea (it was never a good idea), and started including data identification tags in every R file (of which there are currently 980+). This week has been spent improving tag consistency and generally tidying them up.
How might data identification information be extracted from the paper that was the original source of the data (other than reading the paper)?
Tools are available for extracting text from pdf file, and 10-lines of Python later we have a list of named entities:
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
file_name = 'eseur.txt'
soft_eng_text = open(file_name).read()
soft_eng_doc = nlp(soft_eng_text)
for ent in soft_eng_doc.ents:
print(ent.text, ent.start_char, ent.end_char,
The catch is that en_core_web_sm is a general model for English, and is not software engineering specific, i.e., the returned named entities are not that good (from a software perspective).
While it’s easy to train a spaCy NER model, the time-consuming bit is collecting and cleaning the text needed. I have plenty of other things to keep me busy. But this would be a great project for somebody wanting to learn spaCy and natural language processing
What information is contained in the undiscussed data columns? Or, from the practical point of view, what information can be extracted from these columns without too much effort?
The number of columns in a csv file is an indicator of the number of different kinds of information that might be present. If a csv is used in the analysis of X, and it contains lots of columns (say more than half-a-dozen), then it might be assumed that it contains more data relating to X.
Column names are not always suggestive of the information they contain, but might be of some use.
Many of the csv files contain just a few rows/columns. A list of csv files that contain lots of data would narrow down the search, at least for those looking for lots of data.
Another possibility is to group csv files by potential use of data, e.g., estimating, benchmarking, testing, etc.
More data is going to become available, and grouping by potential use has the advantage that it is easier to track the availability of new data that may supersede older data (that may contain few entries or apply to circumstances that no longer exist)
My current techniques for locating data on a given subject is either remembering the shape of a particular plot (and trying to find it), or using the pdf reader’s search function to locate likely words and phrases (and then look at the plots and citations).
Suggestions for searching or labelling the data, that don’t require lots of effort, welcome.
Like most programmers I’ve generally tried to steer well clear of getting involved in management duties. The trouble is that as you get older I think this becomes harder and harder to avoid. Once you get the mechanics of programming under control you might find you have more time to ponder about some of those other duties which go into delivering software because they begin to frustrate you.
The Price of Success
Around the turn of the millennium I was working in a small team for a small financial organisation. The management structure was flat and we had the blessing of the owner to deliver what we thought the users needed and when. With a small but experienced team of programmers we could adapt to the every growing list of feature requests from our users. Much of what we were doing at the time was trying to work out how certain financial markets were being priced so there was plenty of experimentation which lead to the writing and rewriting of the pricing engine as we learned more.
The trouble with the team being successful and managing to reproduce prices from other more expensive 3rd party pricing software was that we were then able to replace it. But of course it also has some other less important features that users then decided they needed too. Being in-house and responsive to their changes just means the backlog grows and grows and grows…
The Honeymoon Must End
While those users at the front of the queue are happy their needs are being met you’ll end up pushing others further down the queue and then they start asking when you’re going to get around to them. If you’re lucky the highs from the wins can outweigh the lows from those you have to disappoint.
The trouble for me was that I didn’t like having to keep disappointing people by telling them they weren’t even on the horizon, let alone next on the list. The team was doing well at delivering features and reacting to change but we effectively had no idea where we stood in terms of delivering all those other features that weren’t being worked on.
MS Project Crash Course
The company had one of those MSDN Universal licenses which included a ton of other Microsoft software that we never used, including Microsoft Project. I had a vague idea of how to use it after seeing some plans produced by previous project managers and set about ploughing through our “backlog”  estimating every request with a wild guess. I then added the five of us programmers in the team as the “resources”  and got the tool to help distribute the work amongst ourselves as best as possible.
I don’t remember how long this took but I suspect it was spread over a few days while I did other stuff, but at the end I had a lovely Gantt Chart that told us everything we needed to know – we had far too much and not enough people to do it in any meaningful timeframe. If I remember correctly we had something like a year’s worth of work even if nothing else was added to the “TODO list” from now on, which of course is ridiculous – software is never done until it’s decommissioned.
For a brief moment I almost felt compelled to publish the plan and even try and keep it up-to-date, after all I’d spend all that effort creating it, why wouldn’t I? Fortunately I fairly quickly realised that the true value in the plan was knowing that we had too much work and therefore something had to change. Maybe we needed more people, whether that was actual programmers or some form of manager to streamline the workload. Or maybe we just needed to accept the reality that some stuff was never going to get done and we should ditch it. Product backlogs are like the garage or attic where “stuff” just ends up, forgotten about but taking up space in the faint hope that one day it’ll be useful.
The truth was uncomfortable and I remember it lead to some very awkward conversations between the development team and the users for a while . There is only so long that you can keep telling people “it’s on the list” and “we’ll get to it eventually” before their patience wears out. It was unfair to string people along when we pretty much knew in our hearts we’d likely never have the time to accommodate them, but being the eternal optimists we hoped for the best all the same.
During that period of turmoil having the plan was a useful aid because it allowed is to have those awkward conversations about what happens if we take on new work. Long before we knew anything about “agility” we were doing our best to respond to change but didn’t really know how to handle the conflict caused by competing choices. There was definitely an element of “he who shouts loudest” that had a bearing on what made its way to the top of the pile rather than a quantitative approach to prioritisation.
Even today, some 20 years on, it’s hard to convince teams to throw away old backlog items on the premise that if they are important enough they’ll bubble up again. Every time I see an issue on GitHub that has been automatically closed because of inactivity it makes me a little bit sad, but I know it’s for the best; you simply cannot have a never-ending list of bugs and features – at some point you just have to let go of the past.
On the flipside, while I began to appreciate the futility of tracking so much work, I also think going through the backlog and producing a plan made me more tolerant of estimates. Being that person in the awkward situation of trying to manage someone’s expectations has helped me get a glimpse of what questions some people are trying to answer by creating their own plans and how our schedule might knock onto them. I’m in no way saying that I’d gladly sit through sessions of planning poker simply for someone to update some arbitrary project plan because it’s expected of the team, but I feel more confident asking the question about what decisions are likely to be affected by the information I’m being asked to provide.
Naturally I’d have preferred someone else to be the one to start thinking about the feature list and work out how we were going to organise ourselves to deal with the deluge of work, but that’s the beauty of a self-organising team. In a solid team people will pick up stuff that needs doing, even if it isn’t the most glamourous task because ultimately what they want is to see is the team succeed , because then they get to be part of that shared success.
 B.O.R.I.S (aka Back Office Request Information System) was a simple bug tracking database written with Microsoft Access. I’m not proud of it but it worked for our small team in the early days :o).
 Yes, the air quotes are for irony :o).
 A downside of being close to the customer is that you feel their pain. (This is of course a good thing from a process point of view because you can factor this into your planning.)
I don’t really know what piloting a plane is like. I’m not a pilot. I have only ever been in the cockpit at museums (sitting in an SR-71 Blackbird was amazing). But, whenever I hear of software teams who need to work together – perhaps because they deliver different parts of the same product or perhaps because one supplies the other, or just because they all work for the same company – I always imagine its like synchronised flying.
In my mind I look at software teams and see the Red Arrows or Blue Angels. Now you could argue that software teams are nothing like an acrobatic team because those teams perform the same routines over and over again, and because those teams plan their routines in advance and practice, practice, practice.
But equally, while the routine may be planned in depth each plane has to be piloted by someone. That individual may be following a script but they are making hundreds of decisions a minute. Each plane is its own machine with its own variations, each plane encounters turbulence differently, each pilot has a different view through their window. And if any one pilot miscalculates…
As for the practice, one has to ask: why don’t software teams practice? – In many other disciplines practice, and rehearsal, is a fundamental part of doing the work. Thats why I’ve long aimed to make my own training workshops a form of rehearsal.
Software teams don’t perform the same routines again and again but in fact, software teams synchronise in common reoccurring ways: through APIs, at release times, at deadlines, at planning sessions. What the teams do in between differs but coordination happens in reoccurring forms.
While acrobatic teams may be an extreme example of co-ordination the same pilots don’t spend their entire lives flying stunts. Fighter pilots need to synchronise with other fighter pilots in battle situations.
OK, I’m breaking my own rule here – using a metaphor from a domain I know little of – but, at the same time I watch these displays and this image is what pops into my head.
Anyone got a better metaphor?
Or anyone know about flying and care to shoot down my metaphor?
My old ACCU friend Derek Jones has been beavering away at his Evidence Based Software Engineering book for a few years now. Derek takes an almost uniquely hard nosed evidence driven view of software engineering. He works with data. This can make the book hard going in places – and I admit I’ve only scratched the surface. Fortunately Derek also blogs so I pick up many a good lead there.
At first this finding worried me: so much of what I’ve been preaching about software living for a long time is potentially rubbish. But then I remembered: what I actually say, when I have time, when I’m using all the words is “Successful software lives” – or survives, even is permanent. (Yes its “temporary” at some level but so are we, as Keynes said “In the long run we are all dead”).
My argument is: software which is successful lives for a long time. Unsuccessful software dies.
Successful software is software which is used, software which delivers benefit, software that fills a genuine need and continues filling that need; and, most importantly, software which delivers more benefit than it costs to keep alive survives. If it is used it will change , that means people will work on it.
So actually, Derek’s observation and mine are almost the same thing. Derek’s finding is almost a corollary to my thesis: Most software isn’t successful and therefore dies. Software which isn’t used or doesn’t generate enough benefit is abandoned, modifications cease and it dies.
Actually, I think we can break Derek’s observation into two parts, a micro and a macro argument.
At the micro level are lines of code and functions. I read Derek’s analysis as saying: at the function level code changes a lot at certain times. An awful lot of that change happens at the start of the code’s life when it is first written, refactored, tested, fixed, refactored, and so on. Related parts of the wider system are in flux at the same time – being written and changed – and any given function will be impacted by those changes.
While many lines and functions come and go during the early life of software, eventually some code reaches a stable state. One might almost say Darwinian selection is at work here. There is a parallel with our own lives there: during our first 5 years we change a lot, we start school, things slow down but still, until about the age of 21 our lives change a lot, after 30 things slow down again. As we get older life becomes more stable.
Assuming software survives and reaches a stable state it can “rest” until such time as something changes and that part of that system needs rethinking. This is Kevlin Henney’s “Stable Intermediate Forms” pattern again (also is ACCU Overload).
At a macro level Derek’s observation applies to entire systems: some are written, used a few times and thrown away – think of a data migration tool. Derek’s data has little to say about whether software lifetimes correspond to expected lifetimes; that would be an interesting avenue to pursue but not today.
There is a question of cause and effect here: does software die young because we set it up to die young or because it is not fit enough to survive? Undoubtedly both cases happen but let me suggest that a lot of software dies early because it is created under the project model and once the project ends there is no way for the software to grown and adapt. Thus it stops changing, usefulness declines and it is abandoned.
The other question to pondering is: what are the implications of Derek’s finding?
The first implication I see is simply: the software you are working on today probably won’t live very long. Sure you may want it to live for ever but statistically it is unlikely.
Which leads to the question: what practices help software live longer?
Or should we acknowledge that software doesn’t live long and dispense with practices intended to help it live a long time?
Following our engineering handbook one should: create a sound architecture, document the architecture, comment the code, reduce coupling, increase cohesion, and other good engineering practices. After all we don’t want the software to fall down.
But does software die because it fails technically? Does software stop being used because programmers can no longer understand the code? I don’t think so. Big ball of mud suggests poor quality software is common.
When I was still coding I worked on lots of really crummy software that didn’t deserve to live but it did because people found it useful. If software died because it wasn’t written for old age then one wouldn’t hear programmers complaining about ‘technical debt” (or technical liabilities as I prefer).
Let me suggest: software dies because people no longer use it.
Thus, it doesn’t matter how many comments or architecture documents one writes, if software is useful it will survive, and people will demand changes irrespective of how well designed the code is. Sure it might be more expensive to maintain because that thinking wasn’t put in but…
For every system that survives to old age many more systems die young. Some of those systems are designed and documented “properly”.
I see adverse selection at work: systems which are built “properly” take longer and cost more but in the early years of life those additional costs are a hinderance. Maybe engineering “properly” makes the system more likely to die early. Conversely, systems which forego those extra costs stand a better chance of demonstrating their usefulness early and breaking-even in terms of cost-benefit.
Something like this happened with Multics and Unix. Multics was an ambitious effort to deliver a novel OS but failed commercially. Unix was less ambitious and was successful in ways nobody ever expected. (The CPL, BCPL, C story is similar.)
Finally, what about tests – is it worth investing in automated tests?
Arguably writing test so software will be easier to work on in future is waste because the chances are your software will not live. However, at the unit test level, and even at the acceptance test level, that is not the primary aim of such tests. At this level tests are written so programmers create the correct result faster. Once someone is proficient writing test-first unit tests is faster than debug-later coding.
To be clear: the primary driver for writing automated unit tests in a test first fashion is not a long term gain to test faster, it is delivering working code faster in the short term.
However, writing regression tests probably doesn’t make sense because the software is unlikely to be around long enough for them to pay back. Fortunately, if you write solid unit and acceptance tests these double as regression tests.
The code 15USERSTORY9YP should get you 15% off on the Edinburgh Agile website and there are some early bird offers too.
These are all half-day workshops which run online with Zoom. As well as the online class attendees receive one of my books to accompany the course, the workshop slides, a recording of the workshop and have the option of doing an online exam to receive a certificate.
These workshops are also available for private delivery with your teams. We ran our first client specific course last month and have another two booked before Christmas.
We are also working on a User Stories Masterclass 2 which should be available in the new year.