Author: Derek Jones

  • Formal methods and LLM generated mathematical proofs

    Formal methods have been popping up in the news again, or at least on the technical news sites I follow. Both mathematics and software share the same pattern of usage of formal methods. The input text is mapped to some output text. Various characteristics of the output text are checked using proof assistant(s). Assuming the…

  • Distribution of small project completion times

    Records of project estimates and actual task times show that round numbers are very common. Various possible reasons have been suggested for why actual times are often reported as a round number. This post analyses the impact of round number reports of actual times on the accuracy of estimates. The plot below shows the number…

  • Modelling time to next reported fault

    After the arrival of a fault report for a program, what is the expected elapsed time until the next fault report arrives (assuming that the report relates to a coding mistake and is not a request for enhancement or something the user did wrong, and the number of active users remains the same and the…

  • My 2025 in software engineering

    Unrelenting talk of LLMs now infests all the software ecosystems I frequent. Almost all the papers published (week) daily on the Software Engineering arXiv have an LLM themed title. Way back when I read these LLM papers, they seemed to be more concerned with doing interesting things with LLMs than doing software engineering research. Predictions…

  • Programming Punched card machines

    Punched card machines, or Tabulating machines, or Unit Record equipment, or according to a 1931 article Super Computing machines, were electromechanical devices that summarised information contained on punched cards (aka tabulating cards). These machines date from 1884, with the publication of Herman Hollerith’s patent application 18840923. In 1948 the electronic valve based IBM 603 calculating…

  • Naming convergence in a network of pairwise interactions

    While naming and categorizing things are perhaps the two most contentious issues in software engineering, there is often a great deal of similarity in the names and categorizes used by unconnected groups. These characteristics of naming and categorization are general observed behaviors across cultures and languages, with software engineering being a particular example. Studies have…

  • Christmas books for 2025

    My rate of book reading has remained steady this year, however, my ability to buy really interesting books has declined. Consequently, the list of honourable mentions is longer than the main list. Hopefully my luck/skill will improve next year. As is usually the case, most book were not published in this year. Liberal Fascism: The…

  • Lifetime of coding mistakes in the Linux kernel

    What is the lifetime of coding mistakes in the Linux kernel? Some coding mistakes result in fault reports (some of which are fixed), while many are removed when the source that contains them is deleted/changed during ongoing development. After fixing the coding mistake(s) in the kernel that generated a reported fault, developer(s) log the commit…

  • Decline in downloads of once popular packages

    What happens to the popularity of Open source packages, measured in monthly downloads, once they cease to be updated or attract new users? If the software does not have any competition within its domain, there is no reason why its popularity should decline. In practice, there are usually alternative packages offering the same or similar…

  • Occurrence of binary operator overloading in C++

    Operator overloading, like many programming language constructs, was first supported in the 1960s (Algol 68 also provided a means to specify a precedence for the operator). C++ is perhaps the most widely used language supporting operator overloading; but not redefining their precedence. I have always thought that operator overloading was more talked about than actually…

  • Fifth anniversary of Evidence-based Software Engineering book

    Yesterday was the 5th anniversary of the publication of my book Evidence-based Software Engineering. The general research trajectory I was expecting in the 2020s (e.g., more sophisticated statistical analysis and more evidence based studies) has been derailed by the arrival of LLMs three years ago. Almost all software engineering researchers have jumped on the LLM…

  • Best tool for measuring lots of source code

    Human written source code contains various common usage patterns. This blog has analysed a variety of these patterns, and in a few cases built models of processes that replicate these patterns. The data for this analysis has primarily comes from programs written in C and Java, because these are the languages that researchers most often…

  • Distribution of method chains in Java and Python

    Some languages support three different ways of organizing a sequence of functions/methods, with calls taking as their first argument the value returned by the immediately prior call. For instance, Java supports the following possibilities: r1=f1(val); r2=f2(r1); r3=f3(r2); // Sequential calls r3=f3(f2(f1(val))); // Nested calls, read right to left r3=val.f1().f2().f3(); // Method chain, read left to…

  • A process to find and extract data-points from graphs in pdf files

    Ever since I discovered that it’s sometimes possible to extract the x/y values of the points/circles/diamonds appearing in a graph, within a pdf, I have been trying to automate the process. Within a pdf there are two ways of encoding an image, such as the one below. The information can be specified using a graphics…

  • After 55.5 years the Fortran Specialist Group has a new home

    In the 1960s and 1970s, new developments in Cobol and Fortran language standards and implementations regularly appeared on the front page of the weekly computer papers (Algol 60 news sometimes appeared). Various language user groups were created, which produced newsletters and held meetups (this term did not become common until a decade or two ago).…

  • Why is actual implementation time often reported in whole hours?

    Estimates of the time needed to implement a software task are often given in whole hours (i.e., no minutes), with round numbers being preferred. Surprisingly, reported actual implementation times also share this ‘preference’ for whole hours and round numbers (around a third of short task estimates are accurate, so it is to be expected that…

  • When task time measurements are not reported by developers

    Measurements of the time taken to complete a software development task usually rely on the values reported by the person doing the work. People often give round number answers to numeric questions. This rounding has the effect of shifting start/stop/duration times to 5/10/15/20/30/45/60 minute boundaries. To what extent do developers actually start/stop tasks on round…

  • An attempt to shroud text from LLMs

    Describe the items discussed in the following sentences: “phashyon es cycklyq. chuyldren donth wanth tew weywr chloths vat there pairent weywr. pwroggwrammyng languij phashyon hash phricksionz vat inycially inqloob impleementaision suppoort, lybrareyz (whych sloa doun adopsion, ant wunsh establysht jobz ol avaylable too suppourt ecksysting kowd (slowyng doun va demighz ov a langguij).” I was…

  • Evolution has selected humans to prefer adding new features

    Assume that clicking within any of the cells in the image below flips its color (white/green). Which cells would you click on to create an image that is symmetrical along the horizontal/vertical axis? In one study, 80% of subjects added a block of four green cells in each of the three white corners. The other…

  • ClearRoute x Le Mans 24h Hackathon 2025

    This weekend, Team Awesome (Sam, Frank and yours truly) took part in the [London] ClearRoute x Le Mans 24h Hackathon 2025 (ClearRoute is an engineering consultancy and Le Mans is an endurance-focused sports car race). London hackathons have been thin on the ground during the last four years. I suspect that the chilling of the…

  • One code path dominates method execution

    A recurring claim is that most reported faults are the result of coding mistakes in a small percentage of a program’s source code, with the 80/20 ‘rule’ being cited for social confirmation. I think there is something to this claim, but that the percentages are not so extreme. A previous post pointed out that reported…

  • The inconvenient history of Liberal Fascism

    Based purely on its title, Liberal Fascism: The secret history of the Left from Mussolini to the Politics of Meaning by Jonah Goldberg, published in 2007, is not a book that I would usually consider buying. The book traces the promotion and application of fascistic ideas by activists and politicians, from their creation by Mussolini…

  • A safety-critical certification of the Linux kernel

    This week there was an announcement on the system-safety mailing list that the Red Hat In-Vehicle Operating System (a version of the Linux kernel, plus a few subsystems) had been certified as being “… capable for use in ASIL B applications, …”. The Automotive Safety Integrity Levels (ASIL A is the lowest level, with D…

  • Software_Engineering_Practices = Morals+Theology

    Including the word science in the term used to describe a research field creates an aura of scientific enterprise. Universities name departments “Computer Science” and creationist have adopted the term “Creation Science”. The word engineering is used when an aura with a practical hue is desired, e.g., “Software Engineering” and “Consciousness Engineering”. Science and engineering…

  • Long term growth of programming language use

    The names of files containing source code often include a suffix denoting the programming language used, e.g., .c for C source code. These suffixes provide a cheap and cheerful method for estimating programming language use within a file system directory. This method has its flaws, with two main factors introducing uncertainty into the results: The…

  • Deciding whether a conclusion is possible or necessary

    Psychologists studying human reasoning have primarily focused on syllogistic reasoning, i.e., the truthfulness of a necessary conclusion from two stated premises, as in the following famous example: All men are mortal. Socrates is a man. Therefore, Socrates is mortal. Another form of reasoning is modal reasoning, which deals with possibilities and necessities; for example: All…

  • CPU power consumption and bit-similarity of input

    Changing the state of a digital circuit (i.e., changing its value from zero to one, or from one to zero) requires more electrical power than leaving its state unchanged. During program execution, the power consumed by each instruction depends on the value of its operand(s). The plot below, from an earlier post, shows how the…

  • Procedure nesting a once common idiom

    Structured programming was a popular program design methodology that many developers/managers claimed to be using in the 1970s and 1980s. Like all popular methodologies, everybody had/has their own idea about what it involves, and as always, consultancies sprang up to promote their take on things. The 1972 book Structured programming provides a taste of the…

  • Functions reduce the need to remember lots of variables

    What, if any, are the benefits of adding bureaucracy to a program by organizing a file’s source code into multiple function/method definitions (rather than a single function)? Having a single copy of a sequence of statements that need to be executed at multiple points in a program reduces implementation effort, and any updates only need…

  • Remotivating data analysed for another purpose

    The motivation for fitting a regression model has a major impact on the model created. Some of the motivations include: practicing software developers/managers wanting to use information from previous work to help solve a current problem, researchers wanting to get their work published seeks to build a regression model that show they have discovered something…

  • Benchmarking string search algorithms

    Searching a sequence of items for occurrences of a specific pattern is a common operation, and researchers are still discovering faster string search algorithms. While skimming the paper Efficient Exact Online String Matching Through Linked Weak Factors by M. N. Palmer, S. Faro, and S. Scafiti, Tables 1, 2 and 3 jumped out at me.…

  • Half-life of Microsoft products is 7 years

    I get a lot of pushback from developers/managers when I tell them that the average application has a relatively short lifetime, i.e., half-life of 4-8 years. The pushback kicks in when I start citing data, up until then my listeners appear surprised/skeptical. The fact that source code has a brief and lonely existence is accepted,…

  • How has the price of a computer changed over time?

    We are told that computers are now orders of magnitude cheaper than they once were. Computers have changed an awful lot over the last 70 years; how is the functionality supported by different computers normalised such that the price of computers from long ago can be compared with today’s computers? One approach is to narrow…

  • Repo of software estimation datasets

    I have finally gotten around to creating a GitHub repository for the publicly available software estimation datasets. My reasons for doing this include increasing the visibility of the large datasets, having something to reference when I tell people about the miniscule size of most of the datasets modeled in research papers (one of my most…

  • Deep dive looking for good enough reliability models

    A previous post summarised the main highlights of my trawl through the software reliability research papers/reports/data, which failed to find any good enough models for estimating the reliability of a software system. This post summarises a deep dive into the technical aspects of the research papers. I am now a lot more confident that better…

  • Zig is the next fashionable language

    New programming languages are constantly being created, with most remaining unknown outside a small circle of friends. Every 5-10 years or so, a few of these languages break out to become fashionable to use. In the early 1980s, I was a fan of Pascal and had conversations with developers trying to figure out why they…

  • Apollo guidance computer software development process

    MIT’s Draper Lab implemented the primary Guidance, Navigation and Control System (GNCS) for the Apollo spacecraft, i.e., the hardware+software (the source code is now available on GitHub). Project Apollo ran from 1961 to 1972, and many MIT project reports are available (the five volume set: “MIT’s Role in Project Apollo” probably contains more than you…

  • Creating a global Standard requires being politically neutral

    Governments actively promote Standards because following them saves their citizens time and money. The UK and US have contrasting rationales, with the UK focusing on savings achieved through repeated use of standardized items and the US focusing on the repeated use of skills people acquired through using a standardized item (i.e., reduced training costs). Manufacturers…

  • Comparing developer/LLM coding performance

    Lots of claims are being made about how LLMs will soon outperform developers on coding tasks. Given the lack of any effective measure of developer performance, these claims are meaningless. At some point, lower costs will entice management to accept good enough LLM performance as a replacement for human developers, i.e., LLM don’t need to…

  • Extracting information from duplicate fault reports

    Duplicate fault reports (that is, reports whose cause is the same underlying coding mistake) are an underused source of information. I sometimes email the authors of a paper analysing fault data asking for information about duplicates. Duplicate information is rarely available, because the authors don’t bother to record it. If a program’s coding mistakes are…

  • Changing development culture and practices: LLM edition

    The popular perception of creating software systems is that it mainly involves writing code. In the 1950s, management treated writing code as a clerical task that just mapped the detailed requirements specified by someone with knowledge of the problem to something a computer could execute. Job titles reflected this division of labour, e.g., coder/programmer, systems…

  • My 2024 in software engineering

    Readers are unlikely to have noticed something that has not been happening during the last few years. The plot below shows, by year of publication, the number of papers cited (green) and datasets used (red) in my 2020 book Evidence-Based Software Engineering. The fitted red regression lines suggest that the 20s were going to be…

  • Small business programs: A dataset in the research void

    My experience is that most of the programs created within organizations are very short, i.e., around 50–100 lines. Sometimes entire businesses are run using many short programs strung together in various ways. These short programs invariably make extensive use of the functionality provided by a much larger package that handles all the complicated stuff. In…

  • Good enough reliability models: still an unknown

    Estimating the likelihood that a software system will operate as intended, for some period of time, is one of the big problems within the field of software reliability research. When software does not operate as intended, a fault, or bug, or hallucination is said to have occurred. Three events need to occur for a user…

  • Christmas books for 2024

    My rate of book reading has picked up significantly this year. The following are the really interesting books I read, as is usually the case, most were not published in this year. I have enjoyed Grayson Perry’s TV programs on the art world, so I bought his book “Playing to the Gallery: Helping Contemporary Art…

  • 21 Algol 60 compilers in 1962

    The specification of ALGOL 60 was published in May 1960. Unlike today, where the creators of a new language release the source of a corresponding compiler, people were expected to write their own compiler. The June 1962 paper: The Replies to the AB14 Questionnaire lists implementation details on 21’ish compilers (it’s not clear whether some…

  • The Norden-Rayleigh model: some history

    Since it was created in the 1960s, the Norden-Rayleigh model of large project manpower has consistently outperformed, or close runner-up, other models in benchmarks (a large project is one requiring two or more man-years of effort). The accuracy of the Norden-Rayleigh model comes with a big limitation: a crucial input value to the calculation is…

  • Putnam’s software equation debunked

    The implementation of a project has a lifecycle that starts and finishes with zero people working on it. Between starting and finishing, the number of staff quickly grows to a peak before slowly declining. In a series of very hard to obtain papers during the early 1960s (chapter 5), Peter Norden created a large project…

  • Employment in the software business: we know nothing

    Tens of millions of people get paid to work on the creation and maintenance of software systems, by companies employing thousands of developers to those employing a single developer (in the UK there are almost 300K registered software companies; 5% of registered companies). This huge ecosystem is almost completely ignored by the software engineering research…

  • C compiler conformance testing: with ChatGPT assistance

    How can developers check that a compiler correctly implements all the behavior requirements contained in the corresponding language specification? An obvious approach is to write lots of test cases for each distinct behavior; such a collection of tests is known as a validation suite, when used by a standard’s organization to test compilers/OS interfaces/etc. The…

  • Modelling estimate/actual including uncertainty in the estimate

    What is an effective technique for modelling the relationship between the time estimated to implement a task and the actual time taken to implement that task? A regression model is the obvious approach. However, an important assumption made by the commonly used regression techniques is not met by estimate/actual project data The commonly used regression…

  • if statement conditions, some basic measurements

    The conditions contained in if-statements control all the decisions a program makes, yet relatively little is known about their characteristics. A condition contains one or more clauses, for instance, the condition (a && b) contains two clauses that both need to be true, for the condition to be true. An earlier post modelled the number…

  • MC/DC a step towards safety critical Open source software

    Open source projects and safety critical software are at opposite ends of the development process spectrum. From the user perspective, when an Open source project becomes very widely used within its application domain, there is a huge incentive to run it within safety critical domains. How might software that was not originally developed using a…

  • Modeling program LOC growth with recurrence equations

    Models predicting the growth, in lines of code, of a program are based on the assumption that future growth follows the same pattern of behavior as past growth. One such model is the recurrence relation: , where: is LOC at time , is the LOC carried over from release , and is the LOC added…

  • Discussing new language features is more fun than measuring feature usage in code

    How often are the features supported by a programming language used by developers in the code that they write? This fundamental question is rarely asked, let alone answered (my contribution). Existing code is what developers spend their time reading, compilers translating to machine code, and LLMs use as training data. Frequently used language features are…

  • Number of statement sequences possible using N if-statements

    I recently read a post by Terence Tao describing how he experimented with using ChatGPT to solve a challenging mathematical problem. A few of my posts contain mathematical problems I could not solve; I assumed that solving them was beyond my maths pay grade. Perhaps ChatGPT could help me solve some of them. To my…

  • Measuring non-determinism in the Linux kernel

    Developers often assume that it’s possible to predict the execution path a program will take, for a given set of input values, i.e., program behavior is deterministic. The execution path may be very complicated, and may depend on the contents of certain files (e.g., database…), but it’s deterministic. There is one kind of program where…

  • Survival of CVEs in the Linux kernel

    Software contained in safety related applications has to have a very low probability of failure. How is a failure rate for software calculated? The people who calculate these probabilities, or at least claim that some program has a suitably low probability, don’t publish the details or make their data publicly available. People have been talking…

  • 1970s: the founding decade of software reliability research

    Reliability research is a worthwhile investment for very large organizations that fund the development of many major mission-critical software systems, where reliability is essential. In the 1970s, the US Air Force’s Rome Air Development Center probably funded most of the evidence-based software research carried out in the previous century. In the 1980s, Rome fell, and…

  • The units of measurement for software reliability

    How do the people define software reliability? One answer can be found by analyzing defect report logs: one study found that 42.6% of fault reports were requests for an enhancement, changes to documentation, or a refactoring request; a study of NASA spaceflight software found that 63% of reports in the defect tracking system were change…

  • Student projects for 2024/2025

    It will soon be that time of year when university students are looking for an interesting idea for a project. On an irregular basis, I post some ideas for thesis projects (here and here); primarily for students studying computing. In a change of direction, this post suggests software related ideas for business student projects. Two…

  • Memory bandwidth: 1991-2009

    The Stream benchmark is a measure of sustained memory bandwidth; the target systems are high performance computers. Sustained in the sense of distance running, rather than a short sprint (the term for this is peak memory bandwidth and occurs when the requested data is in cache), and bandwidth in the sense of bytes of memory…

  • The 2024 update to my desktop system

    I have just upgraded my desktop system. As you can see from the picture below, it is a bespoke system; the third system built using the same chassis. The 11 drive bays on the right are configured for six 5.25-inch and five 3.5-inch disks/CD/DVD/tape drives, there is a drive cage that fits above the power…

  • A new NASA software dataset from the 1970s

    When modeling the process of software development, to optimise the creation of new projects, the best measurement data to use are those relating to whatever developers are doing today. Unfortunately, measurement data for software engineering processes is very hard to find; few development groups record anything about what they do, and even when they do…

  • Program fault reports are caused by its users

    Faults are generated by users of the software; no users, no fault reports. Fault reports will be generated for software that is free of coding mistakes; one study found that 42.6% of fault reports were misclassified as either requests for an enhancement, changes to documentation, or a refactoring request, or not requiring changes to the…

  • Confidence intervals: my one recommended practice

    What recommended practices should developers/managers follow when analysing data they have collected/available? The scientific approach to data analysis mandates specifying a hypothesis before gathering the data, let alone analysing it. This approach has proven workable when researchers are familiar with an evidence-based body of knowledge. In this environment it’s acceptable, even required, to criticise anyone…

  • Techniques used for analyzing basic performance measurements

    The statistical design and analysis of experiments is a relatively recent invention (around 150 years old; verifying scientific hypotheses using experiments was first proposed over 1,000 years ago). Once an experiment has been run, and performance measurements collected, what techniques are available to analyse the data? Before electronic computers were invented, the practical statistical techniques…

  • Learning General relativity at a rudimentary mathematical level

    For the longest time, I have wanted to have an understanding of Einstein’s theory of General relativity at a rudimentary mathematical level. Because General relativity has never been a mainstream topic in undergraduate physics, there are almost no books at this level. Also, the mathematics of General relativity is based on tensor calculus, which until…

  • Distribution of program sizes

    Program size, in lines of code (LOC), used to be a topic of conversation among developers and managers. Program size is an issue when computer memory is measured in kilobytes. Large programs would be organized into overlays such that only small subsets needed to be held in memory at any time, i.e., programmer defined memory…