Source code discovery, skipping over the legal complications

Derek Jones from The Shape of Code

The 2020 US elections introduced the issue of source code discovery, in legal cases, to a wider audience. People wanted to (and still do) check that the software used to register and count votes works as intended, but the companies who wrote the software wouldn’t make it available and the courts did not compel them to do so.

I was surprised to see that there is even a section on “Transfer of or access to source code” in the EU-UK trade and cooperation agreement, agreed on Christmas Eve.

I have many years of experience in discovering problems in the source code of programs I did not write. This experience derives from my time as a compiler implementer (e.g., a big customer is being held up by a serious issue in their application, and the compiler is being blamed), and as a static analysis tool vendor (e.g., managers want to know about what serious mistakes may exist in the code of their products). In all cases those involved wanted me there, I could talk to some of those involved in developing the code, and there were known problems with the code. In court cases, the defence does not want the prosecution looking at the code, and I assume that all conversations with the people who wrote the code goes via the lawyers. I have intentionally stayed away from this kind of work, so my practical experience of working on legal discovery is zero.

The most common reason companies give for not wanting to make their source code available is that it contains trade-secrets (they can hardly say that it’s because they don’t want any mistakes in the code to be discovered).

What kind of trade-secrets might source code contain? Most code is very dull, and for some programs the only trade-secret is that if you put in the implementation effort, the obvious way of doing things works, i.e., the secret sauce promoted by the marketing department is all smoke and mirrors (I have had senior management, who have probably never seen the code, tell me about the wondrous properties of their code, which I had seen and knew that nothing special was present).

Comments may detail embarrassing facts, aka trade-secrets. Sometimes the code interfaces to a proprietary interface format that the company wants to keep secret, or uses some formula that required a lot of R&D (management gets very upset when told that ‘secret’ formula can be reverse engineered from the executable code).

Why does a legal team want access to source code?

If the purpose is to check specific functionality, then reading the source code is probably the fastest technique. For instance, checking whether a particular set of input values can cause a specific behavior to occur, or tracing through the logic to understand the circumstances under which a particular behavior occurs, or in software patent litigation checking what algorithms or formula are being used (this is where trade-secret claims appear to be valid).

If the purpose is a fishing expedition looking for possible incorrect behaviors, having the source code is probably not that useful. The quantity of source contained in modern applications can be huge, e.g., tens to hundreds of thousands of lines.

In ancient times (i.e., the 1970s and 1980s) programs were short (because most computers had tiny amounts of memory, compared to post-2000), and it was practical to read the source to understand a program. Customer demand for more features, and the fact that greater storage capacity removed the need to spend time reducing code size, means that source code ballooned. The following plot shows the lines of code contained in the collected algorithms of the Transactions on Mathematical Software, the red line is a fitted regression model of the form: LOC approx e^{0.0003Day}(code+data):

Lines of code contained in the collected algorithms of the Transactions on Mathematical Software, over time.

How, by reading the source code, does anybody find mistakes in a 10+ thousand line program? If the program only occasionally misbehaves, finding a coding mistake by reading the source is likely to be very very time-consuming, i.e, months. Work it out yourself: 10K lines of code is around 200 pages. How long would it take you to remember all the details and their interdependencies of a detailed 200-page technical discussion well enough to spot an inconsistency likely to cause a fault experience? And, yes, the source may very well be provided as a printout, or as a pdf on a protected memory stick.

From my limited reading of accounts of software discovery, the time available to study the code may be just days or maybe a week or two.

Reading large quantities of code, to discover possible coding mistakes, are an inefficient use of human time resources. Some form of analysis tool might help. Static analysis tools are one option; these cost money and might not be available for the language or dialect in which the source is written (there are some good tools for C because it has been around so long and is widely used).

Character assassination, or guilt by innuendo is another approach; the code just cannot be trusted to behave in a reasonable manner (this approach is regularly used in the software business). Software metrics are deployed to give the impression that it is likely that mistakes exist, without specifying specific mistakes in the code, e.g., this metric is much higher than is considered reasonable. Where did these reasonable values come from? Someone, somewhere said something, the Moon aligned with Mars and these values became accepted ‘wisdom’ (no, reality is not allowed to intrude; the case is made by arguing from authority). McCabe’s complexity metric is a favorite, and I have written how use of this metric is essentially accounting fraud (I have had emails from several people who are very unhappy about me saying this). Halstead’s metrics are another favorite, and at least Halstead and others at the time did some empirical analysis (the results showed how ineffective the metrics were; the metrics don’t calculate the quantities claimed).

The software development process used to create software is another popular means of character assassination. People seem to take comfort in the idea that software was created using a defined process, and use of ad-hoc methods provides an easy target for ridicule. Some processes work because they include lots of testing, and doing lots of testing will of course improve reliability. I have seen development groups use a process and fail to produce reliable software, and I have seen ad-hoc methods produce reliable software.

From what I can tell, some expert witnesses are chosen for their ability to project an air of authority and having impressive sounding credentials, not for their hands-on ability to dissect code. In other words, just the kind of person needed for a legal strategy based on character assassination, or guilt by innuendo.

What is the most cost-effective way of finding reliability problems in software built from 10k+ lines of code? My money is on fuzz testing, a term that should send shivers down the spine of a defense team. Source code is not required, and the output is a list of real fault experiences. There are a few catches: 1) the software probably to be run in the cloud (perhaps the only cost/time effective way of running the many thousands of tests), and the defense is going to object over licensing issues (they don’t want the code fuzzed), 2) having lots of test harnesses interacting with a central database is likely to be problematic, 3) support for emulating embedded cpus, even commonly used ones like the Z80, is currently poor (this is a rapidly evolving area, so check current status).

Fuzzing can also be used to estimate the numbers of so-far undetected coding mistakes.

Shutdown order consistency: how Rust helps

Andy Balaam from Andy Balaam's Blog

Some Java code with bugs

Here’s my main method (in Java). Can you guess the bug?

Db db = new Db();
Monitoring monitoring = new Monitoring();
Monitoring mon2 = new Monitoring();
Billing billing = new Billing(db, monitoring);
monitoring.setDb(db);

runMainLoop(billing, mon2);

db.stop();
billing.stop();
monitoring.stop();

If you would like to hunt down the 2 bugs manually, try reading the full code here: ShutdownOrder.java

But maybe you have an idea already? Maybe you’ve seen code like this before? If you have, you probably have an instinct that there’s some kind of bug, even if you can’t say for sure what it is. Code like this almost always has bugs!

This code compiles fine, but it contains two bugs.

First, we forgot to setDb() on mon2. This causes a NullPointerException, because Monitoring expects always to have a working Db.

Second, and in general harder to spot, we shut down our services in the wrong order. It turns out that Monitoring uses its Db during shutdown, so we get an exception. Even worse, if some other code needed to run after monitoring.stop(), it won’t, because the exception prevents us getting any further.

Of course, this is toy code, but this kind of problem is common (and much harder to spot) in real-life code. In fact, my team dealt with a similar bug this week.

It’s fundamentally hard to figure out your shutdown order. It’s complicated further if classes have start() methods too, which I have seen in lots of Java code.

Given that this is just a hard problem, maybe there’s no point looking for tools to make it easier?

Some Rust code without those bugs

Let’s try writing this code in Rust. Here’s the main method:

let db = Db::new();
let monitoring = Monitoring::new(&db);
let mon2 = Monitoring::new(&db);
let billing = Billing::new(&db, &monitoring);

run_main_loop(&billing, &mon2);

// drop() is called automatically on all objects here

Here’s the full code: shutdown_order.rs

This code shuts down all the services automatically at the end, and any mistakes we make in the order are compile errors, not things we find later when our code is running.

The code to shut down each service looks like this:

impl Drop for Monitoring<'_> {
    fn drop(&mut self) {
        // [Disconnect from monitoring API]
        self.db.add_record("MonitorShutDown");
    }
}

This is us implementing the Drop trait for the struct Monitoring (traits are a bit like Java Interfaces). The Drop trait is special: it indicates what to do when an instance of this struct is dropped. In Rust, this is guaranteed to happen when the instance goes out of scope, which is why our comment at the end of the main method sounds so confident.

Furthermore, Rust’s compiler shuts down everything in the reverse order in which it was created, and guarantees that nothing gets used after it has been dropped.

Rust’s lovely world gives us two relevant treats: no unexpected nulls, and lifetimes.

Treat number 1: no unexpected nulls

First, in Rust, like in other modern languages like Kotlin, we have to be explicit about items that could be missing. In our example, we were able to re-arrange the code so that db can never be missing (or null), and the compiler encouraged us to do so. If we really needed it to be missing some of the time, we could have used the Option type, and the compiler would have forced us to handle the case when it was missing, instead of unexpectedly getting a NullPointerException like we did in Java. (In fact, if we’d structured our code to use final in as many places as possible, we could have been encouraged towards basically the same solution in Java too.)

Treat number 2: lifetimes

Second, if you look a bit more closely at the full code of shutdown_order.rs you’ll see lots of confusing-looking annotations like <'a> and &'a:

struct Monitoring<'a> {
    db: &'a Db,
}

The approximate meaning of those annotations is: a Monitoring holds a reference to a Db, and that Db must last longer than the Monitoring.

This “lasts longer than” wording is what Rust Lifetimes are for. Lifetimes are a way of saying how long something lasts.

Lifetimes are really confusing when you start with Rust, and have caused me a lot of pain. Code like this is where they are both most painful and most helpful. As I mentioned earlier, the problem of shutdown order is fundamentally hard. Rust gives you that pain at the beginning, and until you understand what’s going on, the pain is very confusing and acute. But, once your code compiles, it is correct, at least as far as problems like this are concerned.

I love the sense of security it gives me to write Rust code and know the compiler has checked my code for this kind of problem, meaning it can’t crop up at 3am on Christmas Day…

Final note/caveat

This Rust code is probably over-simplified, because all the references are immutable (you can’t change the objects they point to). In practice, we may well have mutable references, and if we do we’re going have to deal with the further difficulty that Rust won’t allow two different objects to hold references to an object if any of those references are mutable. So it would object to Billing and Monitoring using the Db object at the same time. We’d need to make it immutable (as we have here), or find a different way of structuring the code: for example, we could hold the Db instance only within the run_main_loop code, and pass it in temporarily to the Billing and Monitoring objects when we called their methods. A large part of the art, fun and pain of learning Rust is finding new patterns for your code that do what you need to do and also keep the compiler happy. When you manage it, you get amazing benefits!

Many coding mistakes are not immediately detectable

Derek Jones from The Shape of Code

Earlier this week I was reading a paper discussing one aspect of the legal fallout around the UK Post-Office’s Horizon IT system, and was surprised to read the view that the Law Commission’s Evidence in Criminal Proceedings Hearsay and Related Topics were citing on the subject of computer evidence (page 204): “most computer error is either immediately detectable or results from error in the data entered into the machine”.

What? Do I need to waste any time explaining why this is nonsense? It’s obvious nonsense to anybody working in software development, but this view is being expressed in law related documents; what do lawyers know about software development.

Sometimes fallacies become accepted as fact, and a lot of effort is required to expunge them from cultural folklore. Regular readers of this blog will have seen some of my posts on long-standing fallacies in software engineering. It’s worth collecting together some primary evidence that most software mistakes are not immediately detectable.

A paper by Professor Tapper of Oxford University is cited as the source (yes, Oxford, home of mathematical orgasms in software engineering). Tapper’s job title is Reader in Law, and on page 248 he does say: “This seems quite extraordinarily lax, given that most computer error is either immediately detectable or results from error in the data entered into the machine.” So this is not a case of his words being misinterpreted or taken out of context.

Detecting many computer errors is resource intensive, both in elapsed time, manpower and compute time. The following general summary provides some of the evidence for this assertion.

Two events need to occur for a fault experience to occur when running software:

  • a mistake has been made when writing the source code. Mistakes include: a misunderstanding of what the behavior should be, using an algorithm that does not have the desired behavior, or a typo,
  • the program processes input values that interact with a coding mistake in a way that produces a fault experience.

That people can make different mistakes is general knowledge. It is my experience that people underestimate the variability of the range of values that are presented as inputs to a program.

A study by Nagel and Skrivan shows how variability of input values results in fault being experienced at different time, and that different people make different coding mistakes. The study had three experienced developers independently implement the same specification. Each of these three implementations was then tested, multiple times. The iteration sequence was: 1) run program until fault experienced, 2) fix fault, 3) if less than five faults experienced, goto step (1). This process was repeated 50 times, always starting with the original (uncorrected) implementation; the replications varied this, along with the number of inputs used.

How many input values needed to be processed, on average, before a particular fault is experienced? The plot below (code+data) shows the numbers of inputs processed, by one of the implementations, before individual faults were experienced, over 50 runs (sorted by number of inputs needed before the fault was experienced):

Number of inputs processed before particular fault experienced

The plot illustrates that some coding mistakes are more likely to produce a fault experience than others (because they are more likely to interact with input values in a way that generates a fault experience), and it also shows how the number of inputs values processed before a particular fault is experienced varies between coding mistakes.

Real-world evidence of the impact of user input on reported faults is provided by the Ultimate Debian Database, which provides information on the number of reported faults and the number of installs for 14,565 packages. The plot below shows how the number of reported faults increases with the number of times a package has been installed; one interpretation is that with more installs there is a wider variety of input values (increasing the likelihood of a fault experience), another is that with more installs there is a larger pool of people available to report a fault experience. Green line is a fitted power law, faultsReported=1.3*installs^{0.3}, blue line is a fitted loess model.

Number of inputs processed before particular fault experienced

The source containing a mistake may be executed without a fault being experienced; reasons for this include:

  • the input values don’t result in the incorrect code behaving differently from the correct code. For instance, given the made-up incorrect code if (x < 8) (i.e., 8 was typed rather than 7), the comparison only produces behavior that differs from the correct code when x has the value 7,
  • the input values result in the incorrect code behaving differently than the correct code, but the subsequent path through the code produces the intended external behavior.

Some of the studies that have investigated the program behavior after a mistake has deliberately been introduced include:

  • checking the later behavior of a program after modifying the value of a variable in various parts of the source; the results found that some parts of a program were more susceptible to behavioral modification (i.e., runtime behavior changed) than others (i.e., runtime behavior not change),
  • checking whether a program compiles and if its runtime behavior is unchanged after random changes to its source code (in this study, short programs written in 10 different languages were used),
  • 80% of radiation induced bit-flips have been found to have no externally detectable effect on program behavior.

What are the economic costs and benefits of finding and fixing coding mistakes before shipping vs. waiting to fix just those faults reported by customers?

Checking that a software system exhibits the intended behavior takes time and money, and the organization involved may not be receiving any benefit from its investment until the system starts being used.

In some applications the cost of a fault experience is very high (e.g., lowering the landing gear on a commercial aircraft), and it is cost-effective to make a large investment in gaining a high degree of confidence that the software behaves as expected.

In a changing commercial world software systems can become out of date, or superseded by new products. Given the lifetime of a typical system, it is often cost-effective to ship a system expected to contain many coding mistakes, provided the mistakes are unlikely to be executed by typical customer input in a way that produces a fault experience.

Beta testing provides selected customers with an early version of a new release. The benefit to the software vendor is targeted information about remaining coding mistakes that need to be fixed to reduce customer fault experiences, and the benefit to the customer is checking compatibility of their existing work practices with the new release (also, some people enjoy being able to brag about being a beta tester).

  • One study found that source containing a coding mistake was less likely to be changed due to fixing the mistake than changed for other reasons (that had the effect of causing the mistake to disappear),
  • Software systems don't live forever; systems are replaced or cease being used. The plot below shows the lifetime of 202 Google applications (half-life 2.9 years) and 95 Japanese mainframe applications from the 1990s (half-life 5 years; code+data).

    Number of software systems having a given lifetime, in days

Not only are most coding mistakes not immediately detectable, there may be sound economic reasons for not investing in detecting many of them.

Finally On A Very Cellular Process – student

student from thus spake a.k.

Over the course of the year my fellow students and I have been utilising our free time to explore the behaviour of cellular automata, which are mechanistic processes that crudely approximate the lives and deaths of unicellular creatures such as amoebas. Specifically, they are comprised of unending lines of boxes, some of which contain cells that are destined to live, dive and reproduce according to the occupancy of their neighbours.
Most recently we have seen how we can categorise automata by the manner in which their populations evolve from a primordial state of each box having equal chances of containing or not containing a cell, be they uniform, constant, cyclical, migratory, random or strange. It is the latter of these, which contain arrangements of cells that interact with each other in complicated fashions, that has lately consumed our attention and I shall now report upon our findings.

Edge computing providers

Andy Balaam from Andy Balaam&#039;s Blog

I’m looking into Edge computing at work. By Edge computing I mean running WASM programs in lots and lots of smallish computers in places near to actual people (rather than in huge cloud data centres). I think it’s cool because I love Rust, and Rust is the leading language to compile to WASM.

Here are some companies providing Edge computing services:

  • Fastly – good links with WASM community (hired Mozilla devs), and early adopters – custom WASM engine wasmtime.
  • Cloudflare – huge, and early adopters – WASM engine is Google V8.
  • AWS Lambda@Edge – docs are light on detail, but it looks like a real offering, probably.

Also-rans:

Who did I miss?

Schema upgrades should be reversible (also other transformations, actually)

Andy Balaam from Andy Balaam&#039;s Blog

Are you writing schema upgrade code? Then I humbly suggest you take the time to write schema downgrade code too.

“Why would I do that?” you might well ask, “I won’t ever need to downgrade.”

Now, I imagine you’re expecting me to say you actually will need to downgrade, but that isn’t what I’m saying.

Can you please get on with what you are actually saying?

Whevener you write code to transform something, be it a schema upgrade, some serialisation, or something else, I would highly recommend that you write code to transform it in both directions.

Reasons:

  • It makes testing easier. The best kinds of tests for things like this are round-trips, where you transform something in both directions and check it hasn’t changed. It’s really hard to mess up tests like that.
  • It often uncovers bugs, because it enforces clear thinking about what the transformation actually means.
  • It may improve your code, because it gets annoying writing similar-but-different code to transform in both directions, so you are pushed towards some kind of abstraction.

Also:

  • You almost certainly are going to need it. Sometimes things go wrong and you need to back up.
  • It will be incredibly useful for testing other parts of your code.

Bidirectional scheme up/downgrades are not easy in SQL, but probably worth it. If you’re writing transformation code in a normal programming language, it’s really not that difficult, and I predict it will be worth it.

The “people problem” problem and the great agile divide

AllanAdmin from Allan Kelly Associates

“Its always a people problem.” Gerry Weinberg, The Secrets of Consulting, 1985

The great, unspoken, divide in agile is between those who believe the individual is all powerful and the centre of everything, and those who believe the individual is the product of the system.

Weinberg’s “law” is taken as unquestionable truth by most people in the agile community. Whatever the conversation, whatever the problem a wise old-sage will stand back and say “It’s always a people problem.” And in a way they are right.

People made the modern society and economy. People work in organisations, people make the rules, people enforce the rules, and ultimately someone in that organization made it the way it is. If only those people would act differently, make different rules, if only those people had greater foresight.

The problem is, once people have put all the pieces together the result is a system, not necessarily a technology system but a system of rules, norms, standards, accepted practices and “the way things work around here.” Which puts me in mind of Winston Churchill:

“We shape our buildings, and afterwards our buildings shape us.” Churchill, October 1943.

Yes, people shape our organisations, our processes and our culture, so it is always a people problem. But people are as much prisoner to those decisions as they are controllers of those decisions. Changing those things means changing the system, while that change needs to be made by people – and therefore is a people problem – the power to change that is distributed.

For example, Donald Trump has tried to change the US system in so many ways during the last four years. Often he has been frustrated by the rules of the system. He’s made some changes, and some of his actions will create changes in future. Some of us are glad the system constrained him, others are unhappy. Despite being the most powerful man in the world Trump was constrained.

So while it is “always a people problem” in that people created the system and operate the system doing something about isn’t just a case of asking the system operators to do things differently.

This is the great agile divide.

There are many, perhaps most, in the agile community who believe that every change, every improvement, is rooted in people. People are the centre of agile and all energies should be directed to help them do great work and create a system they can work within.

Then there are others who believe that it is the system which needs to be centre stage: because people work within a system.

Watch Stockless production and ask yourself: in the first simulation, when the waste is piling up, is there anything the men on the production line can do to improve it? I don’t think so because they are inside the system and the system is controlled by others.

I see the great divide again and again in the “agile” advice given out. The community doesn’t recognise the divide but every speaker, writer, consultant and coach is biased one way or another. Actually, “coaches” tend to put people first while “consultants” work with the system. Regurgitating “its a people problem” hides the divide.

Generally I align myself with the second group – its one of the reasons I associate with the Kanban community. But the process group has a problem.

In the days before agile there was a widespread belief that one could define the process, turn the handle and everything would be alright. That logic led to much of what fell under ISO-9000, TickIT, CMM(I) and other “process improvement initiatives.” I suffered under some of those regimes and I see the scars on others.

The problem was that this thinking lead to process experts who decided the process for others to follow, and process police who enforced it. So again, it is a “people problem”. But those process people are as much prisoners to the process as prison guards are. (I don’t want to be one of those people but I probably look like one of them on occasions.)

So, where does that leave us?

I no longer agree with Jerry Weinberg: people may create the problem, people may be key to fixing the problem but fixing a system is more than people.

I see my role, as an Agile Guide, as creating systems people can thrive in, where are not just cogs in a process, places where people can do their best work, where people problems can be seen and addressed. The system needs changing to respect the people, equally, those people deserve respected and involvement while changing the system.


Subscribe to my blog newsletter and download Project Myopia for Free

The post The “people problem” problem and the great agile divide appeared first on Allan Kelly Associates.

Videos: ITIL & the Product Owner

Allan Kelly from Allan Kelly Associates

Last month I appeared in two videos now available on YouTube.

First I was interviewed by Adrian Reed about the Product Manager and Owner roles for The BA Fringe. My interview appears about 12 minutes into the programme and lasts about 10 minutes.

I also joined an expert panel discussing the ITIL 4 High Velocity IT – aligning agile and lean. It was a great conversation and a lot of fun to record, we hardly mentioned ITIL but #NoProjects did get a look in.

I know, ITIL is not something I’m usually associated with but digital and agile means ITIL is changing and I contributed chapters on Product Owner and Continuous Digital (aka #NoProjects) to the recent ITIL High Velocity IT book.

The post Videos: ITIL & the Product Owner appeared first on Allan Kelly Associates.

Survival rate of WG21 meeting attendance

Derek Jones from The Shape of Code

WG21, the C++ Standards committee, has a very active membership, with lots of people attending the regular meetings; there are three or four meetings a year, with an average meeting attendance of 67 (between 2004 and 2016).

The minutes of WG21 meetings list those who attend, and a while ago I downloaded these for meetings between 2004 and 2016. Last night I scraped the data and cleaned it up (or at least the attendee names).

WG21 had its first meeting in 1992, and continues to have meetings (eleven physical meetings at the time or writing). This means the data is both left and right censored; known as interval censored. Some people will have attended many meetings before the scraped data starts, and some people listed in the data may not have attended another meeting since.

What can we say about the survival rate of a person being a WG21 attendee in the future, e.g., what is the probability they will attend another meeting?

Most regular attendees are likely to miss a meeting every now and again (six people attended all 30 meetings in the dataset, with 22 attending more than 25), and I assumed that anybody who attended a meeting after 1 January 2015 was still attending. Various techniques are available to estimate the likelihood that known attendees were attending meetings prior to those in the dataset (I’m going with what ever R’s survival package does). The default behavior of R’s Surv function is to handle right censoring, the common case. Extra arguments are needed to handle interval censored data, and I think I got these right (I had to cast a logical argument to numeric for some reason; see code+data).

The survival curves in days since 1 Jan 2004, and meetings based on the first meeting in 2004, with 95% confidence bounds, look like this:


Meeting survival curve of WG21 attendees.

I was expecting a sharper initial reduction, and perhaps wider confidence bounds. Of the 374 people listed as attending a meeting, 177 (47%) only appear once and 36 (10%) appear twice; there is a long tail, with 1.6% appearing at every meeting. But what do I know, my experience of interval censored data is rather limited.

The half-life of attendance is 9 to 10 years, suspiciously close to the interval of the data. Perhaps a reader will scrape the minutes from earlier meetings :-)

Within the time interval of the data, new revisions of the C++ standard occurred in 2007 and 2014; there had also been a new release in 2003, and one was being worked on for 2017. I know some people stop attending meetings after a major milestone, such as a new standard being published. A fancier analysis would investigate the impact of standards being published on meeting attendance.

People also change jobs. Do WG21 attendees change jobs to ones that also require/allow them to attend WG21 meetings? The attendee’s company is often listed in the minutes (and is in the data). Something for intrepid readers to investigate.