Visual Lint 6.5.3.296 has been released

Products, the Universe and Everything from Products, the Universe and Everything

This is a recommended maintenance update for Visual Lint 6.5. The following changes are included:
  • Added the ability to override the analysis tool installation folder on a per solution/workspace or project basis [Visual Lint Enterprise and Build Server Editions].
  • The analysis tool executable pathname can now be overridden on a per project basis as well as per solution/workspace [Visual Lint Enterprise and Build Server Editions].
  • Fixed a minor resizing bug in the Analysis Configuration Dialog.
  • Fixed a minor bug in the "Active Analysis Tool" dialog.
Download Visual Lint 6.5.3.296

How to write a programming language articles

Andy Balaam from Andy Balaam's Blog

Recent Overload journal issues contain my new articles on How to Write a Programming Language.

Part 1: How to Write a Programming Language: Part 1, The Lexer

Part 2: How to Write a Programming Language: Part 2, The Parser

PDF of the latest issue: Overload 146 containing part 2.

This is all creative-commons licensed and developed in public at github.com/andybalaam/articles-how-to-write-a-programming-language

Maximum team size before progress begins to stall

Derek Jones from The Shape of Code

On multi-person projects people have to talk to each other, which reduces the amount of time available for directly working on writing software. How many people can be added to a project before the extra communications overhead is such that the total amount of code, per unit time, produced by the team decreases?

A rarely cited paper by Robert Tausworthe provides a simple, but effective analysis.

Activities are split between communicating and producing code.

If we assume the communications overhead is give by: t_0(S^{alpha}-1), where t_0 is the percentage of one person’s time spent communicating in a two-person team, S the number of developers and alpha a constant greater than zero (I’m using Tausworthe’s notation).

The maximum team size, before adding people reduces total output, is given by: S=({1+t_0}/{(1+alpha)t_0})^{1/{alpha}}.

If alpha=1 (i.e., everybody on the project has the same communications overhead), then S={1+t_0}/{2t_0}, which for small t_0 is approximately S=1/{2t_0}. For example, if everybody on a team spends 10% of their time communicating with every other team member: S={1+0.1}/{2*0.1}approx 5.

In this team of five, 50% of each persons time will be spent communicating.

If alpha=0.8, then we have S=({1+0.1}/{(1+0.8)*0.1})^{1/0.8}approx 10.

What if the percentage of time a person spends communicating with other team members has an exponential distribution? That is, they spend most of their time communicating with a few people and very little with the rest; the (normalised) communications overhead is: 1-e^{-(S-1)t_1}, where t_1 is a constant found by fitting data from the two-person team (before any more people are added to the team).

The maximum team size is now given by: S=1/{t_1}, and if t_1=0.1, then: S=1/{0.1}=10.

In this team of ten, 63% of each persons time will be spent communicating (team size can be bigger, but each member will spend more time communicating compared to the linear overhead case).

Having done this analysis, what is now needed is some data on the distribution of individual communications overhead. Is the distribution linear, square-root, exponential? I am not aware of any such data (there is a chance I have encountered something close and not appreciated its importance).

I have only every worked on relatively small teams, and am inclined towards the distribution of time spent communicating not being constant. Was it exponential or a power-law? I would not like to say.

Could a communications time distribution be reverse engineered from email logs? The cc’ing of people who might have an interest in a topic complicates the data analysis; time spent in meetings are another complication.

Pointers to data most welcome and as is any alternative analysis using data likely to have a higher signal/noise ratio.

Test Language: Behaviours, Not Examples

Chris Oldwood from The OldWood Thing

Naming is hard, as we know from the old adage about the two hardest problems in Computer Science, and naming in tests is no different. I’ve documented my own journey around how I structure tests in two previous posts: “Unit Testing Evolution Part II – Naming Conventions” and “Other Test Naming Conventions”. I’ve also covered some similar ground before quite recently in “Overly Prescriptive Tests” but that was more about the content of the tests themselves, whereas here I’m trying to focus more on the language aspects.

Describing the Example

Something which I’ve observed, both from reviewing Fizz Buzz submissions with tests [1] and from real tests, is that there is often that missing leap from writing a test which describes a single example to generalising the language to describe the effective behaviour [2]. For example, imagine you’re writing a unit test for a calculator, if you literally encode your example as your test name you might write:

[Test]
public void two_plus_two_is_equal_to_four()

Given that you could accidentally implement it with multiplication and still make the test pass you might add another scenario to be sure you don’t fall into that trap:

[Test]
public void three_plus_seven_is_equal_to_ten()

The problem with these test names is that they only tell you about the specific scenario covered by the test, not about the bigger picture. One potential refactoring might be to parameterise the test thereby forcing you to generalise the name:

[TestCase(2, 2, 4)]
[TestCase(3, 7, 10)]
public void adding_two_numbers_together_returns_their_sum(. . .)

One way this often shows up in FizzBuzz tests is with examples for the various rules, e.g.

[Test]
public void three_returns_the_word_fizz()

[Test]
public void five_returns_the_word_buzz()

The rules of a basic calculator are already known to pretty much everyone but here, unless you know the rules of the game Fizz Buzz, you would not be able to derive them from these examples alone and one very important role of tests are to document, nay specify, the behaviour of our code.

Describing the Behaviour

Hence to encode the rules you need to think more generally:

a_number_divisible_by_three_returns_the_word_fizz

There are a couple of issues here, namely that technically any number is divisible by three (just not wholly), and also that it won’t be true once we start bringing in the more advanced rules. It’s not easy trying to be precise and yet also somewhat vague at the same time, but we can try:

a_number_wholly_divisible_by_three_generally_returns_the_word_fizz

Once we bring in the “divisible by three and divisible by five” rule it becomes much harder to be precise in our test names as we’d have to include the overriding rules too which likely makes them harder to read and comprehend:

a_number_wholly_divisible_by_three_but_not_also_wholly_divisible_by_five_returns_the_word_fizz

You might just get away with it this time but its not really scalable and test names, much like code comments, often have a habit of getting out of sync with reality. Even when they break due to new functionality it’s easy to end up fixing the test and forgetting to check whether the “documentation” aspect still reflects the new behaviour.

Hence I personally prefer to use words in test names that suggest “broad strokes” when necessary and guide the reader (top to bottom) from the more general scenarios to the more specific. This, in my mind, is similar to putting the happy path cases before the various error handling ones.

Validating Collections

These examples might be a little too trivial but the impetus for this post came from similar scenarios where the test language talked about the outcome of the example itself rather than the behaviour of the logic in general. The knock-on effect of doing this, apart from making the intent of the example harder to comprehend in the future, was that it also became brittle as the specific scenario outcome was encoded in the test and any change in logic that might be orthogonal to it could break it unnecessarily. (As mentioned earlier, “Overly Prescriptive Tests” looks at brittle tests from a different angle.)

A common place where this shows up is when asserting behaviours around collections. For example imagine writing tests for querying the seats available in a cinema where there are seats in different price bands. When testing the “seat query” method for an exhausted price band you might be inclined to write:

[TestFixture]
public class when_querying_for_seats_and_none_left_in_band
{
  [Test]
  public void then_the_result_is_empty()
  {
    auditorium.Add(“Posh Seats”, new Seats[0]);

    var seats = auditorium.FindAvailableSeats();

    Assert.That(seats, Is.Empty);
  }
}

The example, being minimal in nature, means that technically in this scenario the result will be empty. However that is an artefact of the way the example is expressed and the test has been written. If I were to change the test set-up and add the following line, the test would break:

auditorium.Add(“Cheap Seats”, new Seats[100]);

While the outcome of the example above might be “empty”, that is not the general behaviour of the logic under test and our test language should be changed to describe that:

[Test]
public void then_no_seats_in_that_band_are_returned()

Now we’re not making a statement about what else might or might not be in that result, only what our expectations are for seats in the band in question. Once we have fixed the test language we can address how we validate that in the example. Instead of looking at what is in the collection we should be looking at what isn’t there as the test name tells us to expect that something should be absent, and the assert should reflect that language:

Assert.That(seats.Where(s => s.Band == “Posh Seats”), Is.Empty);

Now I should only be able to break this test by changing the data or logic specific to the example, orthogonal behaviours should not break it by accident. (See “Manual Mutation Testing” for more on how you can test the quality of your tests.)

Invest in Tests

If you’ve ever worked on a codebase with brittle tests you’ll know how frustrating it can be when your feature mushrooms because you broke a bunch of badly written tests. If we’re lucky we see the failed assertion and if it’s not obvious then we can look back at the test name to see if the scenario rings any bells. If we’re unlucky we have to immediately reach for the debugger and likely add “refactor tests” to the yak stack.

If you “pay it forward” by taking the time to write good tests up front you’ll find it easier to sustain delivery in the future.

 

[1] A company I once worked for used Fizz Buzz in their candidate early screening process. Despite being overkill in practice (as was pointed out to candidates) a suite of tests was requested as part of the submission to help get a feel for what style they used. IMHO the tests said much more about candidates than the production code.

[2] Yes, “property based testing” takes this entire concept a step further so that it exercises the behaviour with multiple examples generated differently each time. That’s the destination, this post is about one possible journey.

StatsModels: the first nail in R’s coffin

Derek Jones from The Shape of Code

In 2012, when I decided to write a book on evidence-based software engineering, R was the obvious system to use for data analysis. At the time, lots of new books had “using R” or “with R” added at the end of their titles; I chose “using R”.

When developers tell me they need to do some statistical analysis, and ask whether they should use Python or R, I tell them to use Python if statistics is a small part of the program, otherwise use R.

If I started work on the book today, I would till choose R. If I were starting five-years from now, I could be choosing Python.

To understand why I think Python will eventually take over the niche currently occupied by R, we need to understand the unique selling points of both systems.

R’s strengths are that it supports a way of thinking that is a good fit for doing data analysis and has an extensive collection of packages that simplify the task of applying a wide variety of analysis techniques to data.

Python also has packages supporting the commonly used data analysis techniques. But nearly all the Python packages provide a developer-mentality interface (i.e., they provide an API like any other package), R provides data-analysis-mentality interfaces. R supports a way of thinking that data analysts can identify with.

Python’s strengths, over R, are a much larger base of developers and language support for writing large programs (R is really a scripting language). Yes, Python has a package ecosystem supporting the full spectrum of application domains, this is not relevant for analysing a successful invasion of R’s niche market (but it is relevant for enticing new developers who are still making up their mind).

StatsModels is a Python package based around R’s data-analysis-mentality interface. When I discovered this package a few months ago, I realised the first nail had been hammered into R’s coffin.

Yes, today R has nearly all the best statistical analysis packages and a large chunk of the leading edge stuff. But packages can be reimplemented (C code can be copy-pasted, the R code mapped to Python); there is no magic involved. Leading edge has a short shelf life, and what proves to be useful can be duplicated; the market for leading edge code in a mature market (e.g., data analysis) is tiny.

A bunch of bright young academics looking to make a name for themselves will see the major trees in the R forest have been felled. The trees in the Python data-analysis-mentality forest are still standing; all it takes is a few people wanting to be known as the person who implemented the Python package that everybody uses for XYZ analysis.

A collection of packages supporting the commonly (and eventually not so commonly) used data analysis techniques, with a data-analysis-mentality interface, removes a major selling point for using R. Python is a bigger developer market with support for many other application domains.

The flow of developers starting out with R will slow down, casual R users will have nothing to lose from switching when the right project comes along. There will be groups where everybody uses R and will continue to use R because that is what everybody else in the group uses. Ten-Twenty years from now R, developers could be working in a ghost town.

We’re Not For Turning – a.k.

a.k. from thus spake a.k.

We have seen how it is possible to smoothly interpolate between a set of points (xi, yi), with the xi known as nodes and the yi as values, by specifying the gradients gi at the nodes and calculating values between adjacent pairs using the uniquely defined cubic polynomials that match the values and gradients at them.
We have also seen how extrapolating such polynomials beyond the first and last nodes can yield less than satisfactory results, which we fixed by specifying the first and last gradients and then adding new first and last nodes to ensure that the first and last polynomials would represent straight lines.
Now we shall see how cubic spline interpolation can break down rather more dramatically and how we might fix it.

What statistical techniques are useful for software engineering data?

Derek Jones from The Shape of Code

What statistical techniques are of general usefulness for analyzing software engineering data?

The answer depends on the kinds of data likely to be encountered, in software engineering, and the questions likely to be asked.

When I started working on a book, aiming to cover all worthwhile publicly available software engineering data, I was hoping to refer readers to a book (or two) that they ought to read to learn the appropriate techniques. Kabacoff’s “R in Action” comes closest to the book I had in mind as a basic introduction, but there was nothing covering a wider range of topics; so I ended up writing something; I found Crawley’s “The R book”, to be the best book on the subject.

My answer to the kinds of data likely to be available was to work with all the software engineering data I could get obtain (around 600 data sets to date).

What questions should be asked about the data? My selection of questions was driven by whether the data was used in the software engineering half of the book, or the statistical analysis techniques half.

The software engineering material consists of the chapters: Introduction, Human cognitive characteristics, Cognitive capitalism, Ecosystems, Projects, Reliability and Source code. The data appeared in one of these chapters if it could be used to make (what I thought was) a practical point about the topic being discussed.

Data appeared in the statistical analysis techniques chapters, if it could be used to illustrate the technique under discussion.

What happened in practice was the software engineering material was worked on for a year or two, on realizing that bespoke statistical analysis material was needed the existing data was plundered to create the necessary chapters; after this was released, work switched back to the software engineering material (using unplundered and newly acquired data), and of course the earlier chapters plundered data from the yet to be worked on chapters.

This seems to have worked surprisingly well, at least from my perspective of keeping the production line going.

Now most if the data has been analyzed, it’s time to take a global overview and where necessary shuffle things around. I may find that everything is a complete mess; we shall see.

What techniques have I found to be useful?

The number 1, most useful data analysis technique is building a regression model. The one thing I have been consistently able to do, when analyzing other people’s data, is extract more information from it than they did (unless they also built a regression model); at times it has been embarrassing.

At number 2, is bootstrapping. Many widely used techniques only give accurate answers if the data has a normal/gaussian distribution and use of these techniques can involve a lot of arm waving involving claims about the data having a good-enough gaussian-like distribution. This arm waving was necessary before computers became available, because the practical manual techniques required a gaussian distribution. Now we have computers and techniques that don’t require any particular distribution can be used, and which in some cases are more powerful techniques than those designed for manual implementation.

Sitting here, I cannot think of a number 3; there might be one.

What techniques are not generally useful? The various tests containing some combination of the names Wilcoxon, Mann and Whitney are well past their sell-by date. Searching the source of the book I see these names still appear in one or two places; this is a hangover from the early versions from many years ago (when I was following the clueless herd) and will soon be gone.

I thought that extreme value theory might apply to some data, but have only found one data-set to which it might be applied (so not generally useful).

I spent a lot of time watching out for zero-inflated data (data containing more zero values than expected by the common probability distributions). I saw four/five papers containing plots of data that looked zero-inflated and emailed the authors asking for the data (who kindly sent it to me). None of the data turned out to be zero-inflated (I’m not sure what the authors thought about being asked for data that somebody thought was zero-inflated). This does not mean that software engineering data is not zero-inflated, only that it is not common.

My zero-inflated search was motivated by the occasional appearance of zero-truncated data (data with that does not contain zero values). Zero-truncated data occurs when counting starts at one, rather than zero (I have one data-set that is 0/1 truncated; the counting starts at 2).

I was surprised that time-series did not turn out to be widely useful.

Sometimes we are all clueless button pushers, so machine learning gets a few pages. Anybody who knows what they are doing builds regression models.

I will eventually get around to counting how many times each technique is used on the data I have (watch this blog, but don’t hold your breath).

Source code chapter added to “Evidence-based software engineering using R”

Derek Jones from The Shape of Code

The Source Code chapter of my evidence-based software engineering book has been added to the draft pdf (download here).

This chapter has suffered from coming last and there is still lots of work to be done. Almost all the source code related data has been plundered to fill up earlier chapters. Some data did not make the cut-off for release of the draft; a global review will probably result in some data migrating back to this chapter.

When talking to developers about the book I am constantly being asked ‘what is empirical software engineering?’ My explanation uses the phrase ‘evidence-based’, which everybody seems to immediately understand. It is counterproductive having a title that has to be explained, so I have changed the title to “Evidence-based Software Engineering using R”.

What is the purpose of a chapter discussing source code in a book on evidence-based software engineering? Source code is obviously an essential component of the topics discussed in the other chapters, but what is so particular to source code that it could not be said elsewhere? Having spent most of my professional life studying source code, first as a compiler writer and then involved with static analysis, am I just being driven by an attachment to the subject?

My view of source code is very different from most other developers: when developers talk about code, they spend most of the time talking about how they do things, when I talk about code I spend most of the time talking about how other developers do things (I’m a mongrel writer of code). Developers’ blinkered view of code prevents them seeing bigger pictures. I take a Gricean view of code and refrain from using meaningless marketing terms such as maintainability, readability and testability.

I have lots of source code data of interest to compiler writers (who are not the target audience) and I have lots of data related to static analysis (tool developers are not the audience). The target audience is professional software developers and hopefully what has been written is of interest to that readership.

I have been promised all sorts of data. Hopefully some of it will arrive. If somebody tells you they promised to send me data, please encourage them to take some time to sort out the data and send it.

As always, if you know of any interesting software engineering data, please tell me.

Finalizing the statistical analysis material in the second half of the book (released almost two years ago) next.

Perl’s failure to grow and Python takes over

Derek Jones from The Shape of Code

Perl, once the most widely used scripting language, has been in decline for many years; the decline now looks terminal (many decades from now, when its die-hard users have died), what happened?

Python is what happened. Why was this? Did Perl have a major fail, did Python acquire pixie dust that could not be replicated, or something else?

Some commentators point to the failure to produce a timely release of Perl 6; a major reworking of the language announced in 2000 with a stumbling release made available around 2015.

I think the real issue is a failure for Perl to take off outside its core use as a systems language. Perl is famous for its one-liners, but not for writing large programs (yes, it can be done, but would many developers would really want to?); a glance of the categories in its module library shows; those 174,970 modules (at the time of writing) are not widely spread over application domains (i.e., not catering to a wide audience).

Perl 5 was failing to grow outside its base before Perl 6 began its protracted failure to launch.

Language use is a winner take-all game, developers create more packages, support tools, and new users who combine to attract more developers. Continuing support for minority languages comes from die-hard users, existing software that is worth somebody paying to maintain and niche advantages.

These days, language success is founded on the associated package ecosystem (Go and Rust have minuscule package ecosystems, which is why they are living on borrowed time, other languages will eventually take away their sheen of trendiness). Developers use languages to build stuff, the days of writing the code for almost everything are long gone; interesting software is created by taking advantage of packages written by others. Python was in the right place, at the right time to acquire a wide variety of commercial grade packages.

It’s difficult to see Python being displaced as the lingua franca of software development. Its language features are almost irrelevant, its package ecosystem is everything. The winner will eventually take all.

I’m sure the cycle of languages becoming popular for a few years, before disappearing, will continue. There have always been, and will always be, fashionable languages.