Author: Derek Jones

  • Modelling estimate/actual including uncertainty in the estimate

    What is an effective technique for modelling the relationship between the time estimated to implement a task and the actual time taken to implement that task? A regression model is the obvious approach. However, an important assumption made by the commonly used regression techniques is not met by estimate/actual project data The commonly used regression […]

  • if statement conditions, some basic measurements

    The conditions contained in if-statements control all the decisions a program makes, yet relatively little is known about their characteristics. A condition contains one or more clauses, for instance, the condition (a && b) contains two clauses that both need to be true, for the condition to be true. An earlier post modelled the number […]

  • MC/DC a step towards safety critical Open source software

    Open source projects and safety critical software are at opposite ends of the development process spectrum. From the user perspective, when an Open source project becomes very widely used within its application domain, there is a huge incentive to run it within safety critical domains. How might software that was not originally developed using a […]

  • Modeling program LOC growth with recurrence equations

    Models predicting the growth, in lines of code, of a program are based on the assumption that future growth follows the same pattern of behavior as past growth. One such model is the recurrence relation: , where: is LOC at time , is the LOC carried over from release , and is the LOC added […]

  • Discussing new language features is more fun than measuring feature usage in code

    How often are the features supported by a programming language used by developers in the code that they write? This fundamental question is rarely asked, let alone answered (my contribution). Existing code is what developers spend their time reading, compilers translating to machine code, and LLMs use as training data. Frequently used language features are […]

  • Number of statement sequences possible using N if-statements

    I recently read a post by Terence Tao describing how he experimented with using ChatGPT to solve a challenging mathematical problem. A few of my posts contain mathematical problems I could not solve; I assumed that solving them was beyond my maths pay grade. Perhaps ChatGPT could help me solve some of them. To my […]

  • Measuring non-determinism in the Linux kernel

    Developers often assume that it’s possible to predict the execution path a program will take, for a given set of input values, i.e., program behavior is deterministic. The execution path may be very complicated, and may depend on the contents of certain files (e.g., database…), but it’s deterministic. There is one kind of program where […]

  • Survival of CVEs in the Linux kernel

    Software contained in safety related applications has to have a very low probability of failure. How is a failure rate for software calculated? The people who calculate these probabilities, or at least claim that some program has a suitably low probability, don’t publish the details or make their data publicly available. People have been talking […]

  • 1970s: the founding decade of software reliability research

    Reliability research is a worthwhile investment for very large organizations that fund the development of many major mission-critical software systems, where reliability is essential. In the 1970s, the US Air Force’s Rome Air Development Center probably funded most of the evidence-based software research carried out in the previous century. In the 1980s, Rome fell, and […]

  • The units of measurement for software reliability

    How do the people define software reliability? One answer can be found by analyzing defect report logs: one study found that 42.6% of fault reports were requests for an enhancement, changes to documentation, or a refactoring request; a study of NASA spaceflight software found that 63% of reports in the defect tracking system were change […]

  • Student projects for 2024/2025

    It will soon be that time of year when university students are looking for an interesting idea for a project. On an irregular basis, I post some ideas for thesis projects (here and here); primarily for students studying computing. In a change of direction, this post suggests software related ideas for business student projects. Two […]

  • Memory bandwidth: 1991-2009

    The Stream benchmark is a measure of sustained memory bandwidth; the target systems are high performance computers. Sustained in the sense of distance running, rather than a short sprint (the term for this is peak memory bandwidth and occurs when the requested data is in cache), and bandwidth in the sense of bytes of memory […]

  • The 2024 update to my desktop system

    I have just upgraded my desktop system. As you can see from the picture below, it is a bespoke system; the third system built using the same chassis. The 11 drive bays on the right are configured for six 5.25-inch and five 3.5-inch disks/CD/DVD/tape drives, there is a drive cage that fits above the power […]

  • A new NASA software dataset from the 1970s

    When modeling the process of software development, to optimise the creation of new projects, the best measurement data to use are those relating to whatever developers are doing today. Unfortunately, measurement data for software engineering processes is very hard to find; few development groups record anything about what they do, and even when they do […]

  • Program fault reports are caused by its users

    Faults are generated by users of the software; no users, no fault reports. Fault reports will be generated for software that is free of coding mistakes; one study found that 42.6% of fault reports were misclassified as either requests for an enhancement, changes to documentation, or a refactoring request, or not requiring changes to the […]

  • Confidence intervals: my one recommended practice

    What recommended practices should developers/managers follow when analysing data they have collected/available? The scientific approach to data analysis mandates specifying a hypothesis before gathering the data, let alone analysing it. This approach has proven workable when researchers are familiar with an evidence-based body of knowledge. In this environment it’s acceptable, even required, to criticise anyone […]

  • Techniques used for analyzing basic performance measurements

    The statistical design and analysis of experiments is a relatively recent invention (around 150 years old; verifying scientific hypotheses using experiments was first proposed over 1,000 years ago). Once an experiment has been run, and performance measurements collected, what techniques are available to analyse the data? Before electronic computers were invented, the practical statistical techniques […]

  • Learning General relativity at a rudimentary mathematical level

    For the longest time, I have wanted to have an understanding of Einstein’s theory of General relativity at a rudimentary mathematical level. Because General relativity has never been a mainstream topic in undergraduate physics, there are almost no books at this level. Also, the mathematics of General relativity is based on tensor calculus, which until […]

  • Distribution of program sizes

    Program size, in lines of code (LOC), used to be a topic of conversation among developers and managers. Program size is an issue when computer memory is measured in kilobytes. Large programs would be organized into overlays such that only small subsets needed to be held in memory at any time, i.e., programmer defined memory […]