ACCU World of Code

Author: Derek Jones

Modelling estimate/actual including uncertainty in the estimate

What is an effective technique for modelling the relationship between the time estimated to implement a task and the actual time taken to implement that task? A regression model is the obvious approach. However, an important assumption made by the commonly used regression techniques is not met by estimate/actual project data The commonly used regression […]

October 20, 2024
if statement conditions, some basic measurements

The conditions contained in if-statements control all the decisions a program makes, yet relatively little is known about their characteristics. A condition contains one or more clauses, for instance, the condition (a && b) contains two clauses that both need to be true, for the condition to be true. An earlier post modelled the number […]

October 13, 2024
MC/DC a step towards safety critical Open source software

Open source projects and safety critical software are at opposite ends of the development process spectrum. From the user perspective, when an Open source project becomes very widely used within its application domain, there is a huge incentive to run it within safety critical domains. How might software that was not originally developed using a […]

October 6, 2024
Modeling program LOC growth with recurrence equations

Models predicting the growth, in lines of code, of a program are based on the assumption that future growth follows the same pattern of behavior as past growth. One such model is the recurrence relation: , where: is LOC at time , is the LOC carried over from release , and is the LOC added […]

September 29, 2024
Discussing new language features is more fun than measuring feature usage in code

How often are the features supported by a programming language used by developers in the code that they write? This fundamental question is rarely asked, let alone answered (my contribution). Existing code is what developers spend their time reading, compilers translating to machine code, and LLMs use as training data. Frequently used language features are […]

September 22, 2024
Number of statement sequences possible using N if-statements

I recently read a post by Terence Tao describing how he experimented with using ChatGPT to solve a challenging mathematical problem. A few of my posts contain mathematical problems I could not solve; I assumed that solving them was beyond my maths pay grade. Perhaps ChatGPT could help me solve some of them. To my […]

September 15, 2024
Measuring non-determinism in the Linux kernel

Developers often assume that it’s possible to predict the execution path a program will take, for a given set of input values, i.e., program behavior is deterministic. The execution path may be very complicated, and may depend on the contents of certain files (e.g., database…), but it’s deterministic. There is one kind of program where […]

September 8, 2024
Survival of CVEs in the Linux kernel

Software contained in safety related applications has to have a very low probability of failure. How is a failure rate for software calculated? The people who calculate these probabilities, or at least claim that some program has a suitably low probability, don’t publish the details or make their data publicly available. People have been talking […]

September 1, 2024
1970s: the founding decade of software reliability research

Reliability research is a worthwhile investment for very large organizations that fund the development of many major mission-critical software systems, where reliability is essential. In the 1970s, the US Air Force’s Rome Air Development Center probably funded most of the evidence-based software research carried out in the previous century. In the 1980s, Rome fell, and […]

August 25, 2024
The units of measurement for software reliability

How do the people define software reliability? One answer can be found by analyzing defect report logs: one study found that 42.6% of fault reports were requests for an enhancement, changes to documentation, or a refactoring request; a study of NASA spaceflight software found that 63% of reports in the defect tracking system were change […]

August 18, 2024
Student projects for 2024/2025

It will soon be that time of year when university students are looking for an interesting idea for a project. On an irregular basis, I post some ideas for thesis projects (here and here); primarily for students studying computing. In a change of direction, this post suggests software related ideas for business student projects. Two […]

August 11, 2024
Memory bandwidth: 1991-2009

The Stream benchmark is a measure of sustained memory bandwidth; the target systems are high performance computers. Sustained in the sense of distance running, rather than a short sprint (the term for this is peak memory bandwidth and occurs when the requested data is in cache), and bandwidth in the sense of bytes of memory […]

August 4, 2024
The 2024 update to my desktop system

I have just upgraded my desktop system. As you can see from the picture below, it is a bespoke system; the third system built using the same chassis. The 11 drive bays on the right are configured for six 5.25-inch and five 3.5-inch disks/CD/DVD/tape drives, there is a drive cage that fits above the power […]

July 28, 2024
A new NASA software dataset from the 1970s

When modeling the process of software development, to optimise the creation of new projects, the best measurement data to use are those relating to whatever developers are doing today. Unfortunately, measurement data for software engineering processes is very hard to find; few development groups record anything about what they do, and even when they do […]

July 21, 2024
Program fault reports are caused by its users

Faults are generated by users of the software; no users, no fault reports. Fault reports will be generated for software that is free of coding mistakes; one study found that 42.6% of fault reports were misclassified as either requests for an enhancement, changes to documentation, or a refactoring request, or not requiring changes to the […]

July 14, 2024
Confidence intervals: my one recommended practice

What recommended practices should developers/managers follow when analysing data they have collected/available? The scientific approach to data analysis mandates specifying a hypothesis before gathering the data, let alone analysing it. This approach has proven workable when researchers are familiar with an evidence-based body of knowledge. In this environment it’s acceptable, even required, to criticise anyone […]

July 7, 2024
Techniques used for analyzing basic performance measurements

The statistical design and analysis of experiments is a relatively recent invention (around 150 years old; verifying scientific hypotheses using experiments was first proposed over 1,000 years ago). Once an experiment has been run, and performance measurements collected, what techniques are available to analyse the data? Before electronic computers were invented, the practical statistical techniques […]

June 30, 2024
Learning General relativity at a rudimentary mathematical level

For the longest time, I have wanted to have an understanding of Einstein’s theory of General relativity at a rudimentary mathematical level. Because General relativity has never been a mainstream topic in undergraduate physics, there are almost no books at this level. Also, the mathematics of General relativity is based on tensor calculus, which until […]

June 23, 2024
Distribution of program sizes

Program size, in lines of code (LOC), used to be a topic of conversation among developers and managers. Program size is an issue when computer memory is measured in kilobytes. Large programs would be organized into overlays such that only small subsets needed to be held in memory at any time, i.e., programmer defined memory […]

June 16, 2024