Source code chapter of ‘evidence-based software engineering’ reworked

Derek Jones from The Shape of Code

The Source code chapter of my evidence-based software engineering book has been reworked (draft pdf).

When writing the first version of this chapter, I was not certain whether source code was a topic warranting a chapter to itself, in an evidence-based software engineering book. Now I am certain. Source code is the primary product delivery, for a software system, and it is takes up much of the available cognitive effort.

What are the desirable characteristics that source code should have, to minimise production costs per unit of functionality? This is what an evidence-based chapter on source code is all about.

The release of this chapter completes my second pass over the material. Readers will notice the text still contains ... and ?‘s. The third pass will either delete these, or say something interesting (I suspect mostly the former, because of lack of data).

Talking of data, February has been a bumper month for data (apologies if you responded to my email asking for data, and it has not appeared in this release; a higher than average number of people have replied with data).

The plan is to spend a few months getting a beta release ready. Have the beta release run over the summer, with the book in the shops for Christmas.

I’m looking at getting a few hundred printed, for those wanting paper.

The only publisher that did not mind me making the pdf freely available was MIT Press. Unfortunately one of the reviewers was foaming at the mouth about the things I had to say about software engineering researcher (it did not help that I had written a blog post containing a less than glowing commentary on academic researchers, the week of the review {mid-2017}); the second reviewer was mildly against, and the third recommended it.

If any readers knows the editors at MIT Press, do suggest they have another look at the book. I would rather a real publisher make paper available.

Next, getting the ‘statistics for software engineers’ second half of the book ready for a beta release.

Source code has a brief and lonely existence

Derek Jones from The Shape of Code

The majority of source code has a short lifespan (i.e., a few years), and is only ever modified by one person (i.e., 60%).

Literate programming is all well and good for code written to appear in a book that the author hopes will be read for many years, but this is a tiny sliver of the source code ecosystem. The majority of code is never modified, once written, and does not hang around for very long; an investment is source code futures will make a loss unless the returns are spectacular.

What evidence do I have for these claims?

There is lots of evidence for the code having a short lifespan, and not so much for the number of people modifying it (and none for the number of people reading it).

The lifespan evidence is derived from data in my evidence-based software engineering book, and blog posts on software system lifespans, and survival times of Linux distributions. Lifespan in short because Packages are updated, and third-parties change/delete APIs (things may settle down in the future).

People who think source code has a long lifespan are suffering from survivorship bias, i.e., there are a small percentage of programs that are actively used for many years.

Around 60% of functions are only ever modified by one author; based on a study of the change history of functions in Evolution (114,485 changes to functions over 10 years), and Apache (14,072 changes over 12 years); a study investigating the number of people modifying files in Eclipse. Pointers to other studies that have welcome.

One consequence of the short life expectancy of source code is that, any investment made with the expectation of saving on future maintenance costs needs to return many multiples of the original investment. When many programs don’t live long enough to be maintained, those with a long lifespan have to pay the original investments made in all the source that quickly disappeared.

One benefit of short life expectancy is that most coding mistakes don’t live long enough to trigger a fault experience; the code containing the mistake is deleted or replaced before anybody notices the mistake.

Source code chapter added to “Evidence-based software engineering using R”

Derek Jones from The Shape of Code

The Source Code chapter of my evidence-based software engineering book has been added to the draft pdf (download here).

This chapter has suffered from coming last and there is still lots of work to be done. Almost all the source code related data has been plundered to fill up earlier chapters. Some data did not make the cut-off for release of the draft; a global review will probably result in some data migrating back to this chapter.

When talking to developers about the book I am constantly being asked ‘what is empirical software engineering?’ My explanation uses the phrase ‘evidence-based’, which everybody seems to immediately understand. It is counterproductive having a title that has to be explained, so I have changed the title to “Evidence-based Software Engineering using R”.

What is the purpose of a chapter discussing source code in a book on evidence-based software engineering? Source code is obviously an essential component of the topics discussed in the other chapters, but what is so particular to source code that it could not be said elsewhere? Having spent most of my professional life studying source code, first as a compiler writer and then involved with static analysis, am I just being driven by an attachment to the subject?

My view of source code is very different from most other developers: when developers talk about code, they spend most of the time talking about how they do things, when I talk about code I spend most of the time talking about how other developers do things (I’m a mongrel writer of code). Developers’ blinkered view of code prevents them seeing bigger pictures. I take a Gricean view of code and refrain from using meaningless marketing terms such as maintainability, readability and testability.

I have lots of source code data of interest to compiler writers (who are not the target audience) and I have lots of data related to static analysis (tool developers are not the audience). The target audience is professional software developers and hopefully what has been written is of interest to that readership.

I have been promised all sorts of data. Hopefully some of it will arrive. If somebody tells you they promised to send me data, please encourage them to take some time to sort out the data and send it.

As always, if you know of any interesting software engineering data, please tell me.

Finalizing the statistical analysis material in the second half of the book (released almost two years ago) next.

First use of: software, software engineering and source code

Derek Jones from The Shape of Code

While reading some software related books/reports/articles written during the 1950s, I suddenly realized that the word ‘software’ was not being used. This set me off looking for the earliest use of various computer terms.

My search process consisted of using pfgrep on my collection of pdfs of documents from the 1950s and 60s, and looking in the index of the few old computer books I still have.

Software: The Oxford English Dictionary (OED) cites an article by John Tukey published in the American Mathematical Monthly during 1958 as the first published use of software: “The ‘software’ comprising … interpretive routines, compilers, and other aspects of automotive programming are at least as important to the modern electronic calculator as its ‘hardware’.”

I have a copy of the second edition of “An Introduction to Automatic Computers” by Ned Chapin, published in 1963, which does a great job of defining the various kinds of software. Earlier editions were published in 1955 and 1957. Did these earlier edition also contain various definitions of software? I cannot find any reasonably prices copies on the second-hand book market. Do any readers have a copy?

Software engineering: The OED cites a 1966 “letter to the ACM membership” by Anthony A. Oettinger, then ACM President: “We must recognize ourselves … as members of an engineering profession, be it hardware engineering or software engineering.”

The June 1965 issue of COMPUTERS and AUTOMATION, in its Roster of organizations in the computer field, has the list of services offered by Abacus Information Management Co.: “systems software engineering”, and by Halbrecht Associates, Inc.: “software engineering”. This pushes the first use of software engineering back by a year.

Source code: The OED cites a 1965 issue of Communications ACM: “The PUFFT source language listing provides a cross reference between the source code and the object code.”

The December 1959 Proceedings of the EASTERN JOINT COMPUTER CONFERENCE contains the article: “SIMCOM – The Simulator Compiler” by Thomas G. Sanborn. On page 140 we have: “The compiler uses this convention to aid in distinguishing between SIMCOM statements and SCAT instructions which may be included in the source code.”

Running pdfgrep over the archive of documents on bitsavers would probably throw up all manners of early users of software related terms.