Software engineering research is a field of dots

Derek Jones from The Shape of Code

Software engineering research is a field of dots; people are fully focused on publishing papers about their chosen tiny little subject.

Where are the books joining the dots into even a vague outline?

Several software researchers have told me that writing books is not a worthwhile investment of their time, i.e., the number of citations they are likely to attract makes writing papers the only cost-effective medium (books containing an edited collection of papers continue to be published).

Butterfly collecting has become the method of study for many researchers. The butterflies in question often being Github repos that are collected together, based on some ‘interestingness’ metric, and then compared and contrasted in a conference paper.

The dots being collected are influenced by the problems that granting agencies consider to be important topics to fund (picking a research problem that will attract funding is a major consideration for any researcher). Fake research is one consequence of incentivizing people to use particular techniques in their research.

Whatever you think the aims of research in software engineering might be, funding the random collecting of dots does not seem like an effective strategy.

Perhaps it is just a matter of waiting for the field to grow up. Evidence-based software engineering research is still a teenager, and the novelty of butterfly collecting has yet to wear off.

My study of particular kinds of dots did not reveal many higher level patterns, although a number of folk theories were shown to be unfounded.

A review: Inhibitor Phase by Alastair Reynolds

Paul Grenyer from Paul Grenyer

Inhibitor Phase
by Alastair Reynolds
 ISBN-13 ‏ : ‎ 978-0316462761

* * * Warning Spoilers * * *

To say I was excited at the prospect of another core Revelation Space novel, more than a decade since Absolutely Gap, wouldn’t come close. In preparation I reread Absolution Gap and loved it on the second reading.

I wasn’t inspired by the description of the Miguel character hiding from the Wolves on an unknown planet, but it turns out this was just a minor distraction at the beginning and that The Inhibitor phase plays a major part in advancing the story. The scope and breadth, as you would expect from Alistair Reynolds is vast and intricate.

I was a little disappointed that the characters were ping ponging between some of the same old worlds, Ararat and Yellowstone, and the evolution of some of the survivors from Redemption Ark into Merpeople, but this didn’t detract in any way. It either wasn’t clear or I missed what happened to Ana Khouri - maybe she’s still on Hela. It was sad, but probably necessary to see the end to the Nostalgia for Infinity. I also missed how, following her death on Mars, Glass and Warren had reencountered each other and swum with the pattern Jugglers prior to Sun Hollow.

Overall I loved this story. I literally could not put it down! Alastair Reynolds is the master of descriptive exploration and constantly hints at more facets to the story I just have to know and have to keep reading for! The Inhibitor Phase does of course leave questions unanswered and sets up the next story, which I cannot wait for either!

How can I pin dependent packages when using use-package?

Timo Geusch from The Lone C++ Coder's Blog

I’ve been trying to up my use-package game recently and converted my hand rolled package check and installer to use-package. I usually prefer to use packages from melpa-stable so I pin the default package source used by use-package to melpa-stable and override it where necessary That’s working well in general and looks something like this: (setq use-package-always-pin "melpa-stable") (use-package js2-mode :ensure t :defer t :custom (progn (js-indent-level 2) (js2-include-node-externs t))) (use-package kotlin-mode :ensure t :pin melpa) So in other words, if I’m on a machine that doesn’t have js2-mode and kotlin-mode installed, use-package will install js2-mode from melpa-stable and kotlin-mode from melpa.

Estimation experiments: specification wording is mostly irrelevant

Derek Jones from The Shape of Code

Existing software effort estimation datasets provide information about estimates made within particular development environments and with particular aims. Experiments provide a mechanism for obtaining information about estimates made under conditions of the experimenters choice, at least in theory.

Writing the code is sometimes the least time-consuming part of implementing a requirement. At hackathons, my default estimate for almost any non-trivial requirement is a couple of hours, because my implementation strategy is to find the relevant library or package and write some glue code around it. In a heavily bureaucratic organization, the coding time might be a rounding error in the time taken up by meeting, documentation and testing; so a couple of months would be considered normal.

If we concentrate on the time taken to implement the requirements in code, then estimation time and implementation time will depend on prior experience. I know that I can implement a lexer for a programming language in half-a-day, because I have done it so many times before; other people take a lot longer because they have not had the amount of practice I have had on this one task. I’m sure there are lots of tasks that would take me many days, but there is somebody who can implement them in half-a-day (because they have had lots of practice).

Given the possibility of a large variation in actual implementation times, large variations in estimates should not be surprising. Does the possibility of large variability in subject responses mean that estimation experiments have little value?

I think that estimation experiments can provide interesting information, as long as we drop the pretence that the answers given by subjects have any causal connection to the wording that appears in the task specifications they are asked to estimate.

If we assume that none of the subjects is sufficiently expert in any of the experimental tasks specified to realistically give a plausible answer, then answers must be driven by non-specification issues, e.g., the answer the client wants to hear, a value that is defensible, a round number.

A study by Lucas Gren and Richard Berntsson Svensson asked subjects to estimate the total implementation time of a list of tasks. I usually ignore software engineering experiments that use student subjects (this study eventually included professional developers), but treating the experiment as one involving social processes, rather than technical software know-how, makes subject software experience a lot less relevant.

Assume, dear reader, that you took part in this experiment, saw a list of requirements that sounded plausible, and were then asked to estimate implementation time in weeks. What estimate would you give? I would have thrown my hands up in frustration and might have answered 0.1 weeks (i.e., a few hours). I expected the most common answer to be 4 weeks (the number of weeks in a month), but it turned out to be 5 (a very ‘attractive’ round number), for student subjects (code+data).

The professional subjects appeared to be from large organizations, who I assume are used to implementations including plenty of bureaucratic stuff, as well as coding. The task specification did not include enough detailed information to create an accurate estimate, so subjects either assumed their own work environment or played along with the fresh-faced, keen experimenter (sorry Lucas). The professionals showed greater agreement in that the range of value given was not as wide as students, but it had a more uniform distribution (with maximums, rather than peaks, at 4 and 7); see below. I suspect that answers at the high end were from managers and designers with minimal coding experience.

What did the experimenters choose weeks as the unit of estimation? Perhaps they thought this expressed a reasonable implementation time (it probably is if it’s not possible to use somebody else’s library/package). I think that they could have chosen day units and gotten essentially the same results (at least for student subjects). If they had chosen hours as the estimation unit, the spread of answers would have been wider, and I’m not sure whether to bet on 7 (hours in a working day) or 10 being the most common choice.

Fitting a regression model to the student data shows estimates increasing by 0.4 weeks per year of degree progression. I was initially baffled by this, and then I realized that more experienced students expect to be given tougher problems to solve, i.e., this increase is based on self-image (code+data).

The stated hypothesis investigated by the study involved none of the above. Rather, the intent was to measure the impact of obsolete requirements on estimates. Subjects were randomly divided into three groups, with each seeing and estimating one specification. One specification contained four tasks (A), one contained five tasks (B), and one contained the same tasks as (A) plus an additional task followed by the sentence: “Please note that R5 should NOT be implemented” (C).

A regression model shows that for students and professions the estimate for (A) is about 1-2 weeks lower than (B), while (A) estimates are 3-5 weeks lower than (C) estimated.

What are subjects to make of an experimental situation where the specification includes a task that they are explicitly told to ignore?

How would you react? My first thought was that the ignore R5 sentence was itself ignored, either accidentally or on purpose. But my main thought is that Relevance theory is a complicated subject, and we are a very long way away from applying it to estimation experiments containing supposedly redundant information.

The plot below shows the number of subjects making a given estimate, in days; exp0to2 were student subjects (dashed line joins estimate that include a half-hour value, solid line whole hour), exp3 MSc students, and exp4 professional developers (code+data):

Number of subjects making a given estimate.

I hope that the authors of this study run more experiments, ideally working on the assumption that there is no connection between specification and estimate (apart from trivial examples).

Another quick Isso setup tweak

Timo Geusch from The Lone C++ Coder's Blog

While I was implementing a few more changes on my web server - mostly adding the sorely needed blacklistd configuration for sshd - I noticed that NGINX’s log was showing occasional errors when trying to contact the Isso process. They all had one thing in common, namely that they were all trying to contact ISSO via IPV6 as the server has both stacks enabled. Turns out that isso only listens on an IPV4 socket and I could not find an obvious way to get it to listen on both.

Visual Lint has been released

Products, the Universe and Everything from Products, the Universe and Everything

Visual Lint has now been released.

This is a maintenance update for Visual Lint 8.0, and includes the following changes:

  • Fixed a bug in the handling of preprocessor symbol properties in Visual Studio projects.

  • VisualLintGui will now open files dropped on its main window.

  • Updated the PC-lint Plus compiler indirect files co-rb-vs2019.lnt and co-rb-vs2022.lnt to filter out errors in <xutility> when analysing some Visual Studio 2019 and 2022 projects.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2022.lnt to support Visual Studio 2022 v17.0.5.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2019.lnt to support Visual Studio 2019 v16.11.9.

  • Minor updates to the online help.

Download Visual Lint

semgrep: the future of static analysis tools

Derek Jones from The Shape of Code

When searching for a pattern that might be present in source code contained in multiple files, what is the best tool to use?

The obvious answer is grep, and grep is great for character-based pattern searches. But patterns that are token based, or include information on language semantics, fall outside grep‘s model of pattern recognition (which does not stop people trying to cobble something together, perhaps with the help of complicated sed scripts).

Those searching source code written in C have the luxury of being able to use Coccinelle, an industrial strength C language aware pattern matching tool. It is widely used by the Linux kernel maintainers and people researching complicated source code patterns.

Over the 15+ years that Coccinelle has been available, there has been a lot of talk about supporting other languages, but nothing ever materialized.

About six months ago, I noticed semgrep and thought it interesting enough to add to my list of tool bookmarks. Then, a few days ago, I read a brief blog post that was interesting enough for me to check out other posts at that site, and this one by Yoann Padioleau really caught my attention. Yoann worked on Coccinelle, and we had an interesting email exchange some 13-years ago, when I was analyzing if-statement usage, and had subsequently worked on various static analysis tools, and was now working on semgrep. Most static analysis tools are created by somebody spending a year or so working on the implementation, making all the usual mistakes, before abandoning it to go off and do other things. High quality tools come from people with experience, who have invested lots of time learning their trade.

The documentation contains lots of examples, and working on the assumption that things would be a lot like using Coccinelle, I jumped straight in.

The pattern I choose to search for, using semgrep, involved counting the number of clauses contained in Python if-statement conditionals, e.g., the condition in: if a==1 and b==2: contains two clauses (i.e., a==1, b==2). My interest in this usage comes from ideas about if-statement nesting depth and clause complexity. The intended use case of semgrep is security researchers checking for vulnerabilities in code, but I’m sure those developing it are happy for source code researchers to use it.

As always, I first tried building the source on the Github repo, (note: the Makefile expects a git clone install, not an unzipped directory), but got fed up with having to incrementally discover and install lots of dependencies (like Coccinelle, the code is written on OCaml {93k+ lines} and Python {13k+ lines}). I joined the unwashed masses and used pip install.

The pattern rules have a yaml structure, specifying the rule name, language(s), message to output when a match is found, and the pattern to search for.

After sorting out various finger problems, writing C rather than Python, and misunderstanding the semgrep output (some of which feels like internal developer output, rather than tool user developer output), I had a set of working patterns.

The following two patterns match if-statements containing a single clause (if.subexpr-1), and two clauses (if.subexpr-2). The option commutative_boolop is set to true to allow the matching process to treat Python’s or/and as commutative, which they are not, but it reduces the number of rules that need to be written to handle all the cases when ordering of these operators is not relevant (rules+test).

- id: if.subexpr-1
  languages: [python]
  message: if-cond1
   - pattern: |
      if $COND1:  # we found an if statement
   - pattern-not: |
      if $COND2 or $COND3: # must not contain more than one condition
   - pattern-not: |
      if $COND2 and $COND3:
  severity: INFO

- id: if.subexpr-2
  languages: [python]
   commutative_boolop: true # Reduce combinatorial explosion of rules
  message: if-cond2
   - patterns:
      - pattern: |
         if $COND1 or $COND2: # if statement containing two conditions
      - pattern-not: |
         if $COND3 or $COND4 or $COND5: # must not contain more than two conditions
      - pattern-not: |
         if $COND3 or $COND4 and $COND5:
   - patterns:
      - pattern: |
         if $COND1 and $COND2:
      - pattern-not: |
         if $COND3 and $COND4 and $COND5:
      - pattern-not: |
         if $COND3 and $COND4 or $COND5:
  severity: INFO

The rules would be simpler if it were possible for a pattern to not be applied to code that earlier matched another pattern (in my example, one containing more clauses). This functionality is supported by Coccinelle, and I’m sure it will eventually appear in semgrep.

This tool has lots of rough edges, and is still rapidly evolving, I’m using version 0.82, released four days ago. What’s exciting is the support for multiple languages (ten are listed, with experimental support for twelve more, and three in beta). Roughly what happens is that source code is mapped to an abstract syntax tree that is common to all supported languages, which is then pattern matched. Supporting a new language involves writing code to perform the mapping to this common AST.

It’s not too difficult to map different languages to a common AST that contains just tokens, e.g., identifiers and their spelling, literals and their value, and keywords. Many languages use the same operator precedence and associativity as C, plus their own extras, and they tend to share the same kinds of statements; however, declarations can be very diverse, which makes life difficult for supporting a generic AST.

An awful lot of useful things can be done with a tool that is aware of expression/statement syntax and matches at the token level. More refined semantic information (e.g., a variable’s type) can be added in later versions. The extent to which an investment is made to support the various subtleties of a particular language will depend on its economic importance to those involved in supporting semgrep (Return to Corp is a VC backed company).

Outside of a few languages that have established tools doing deep semantic analysis (i.e., C and C++), semgrep has the potential to become the go-to static analysis tool for source code. It will benefit from the network effects of contributions from lots of people each working in one or more languages, taking their semgrep skills and rules from one project to another (with source code language ceasing to be a major issue). Developers using niche languages with poor or no static analysis tool support will add semgrep support for their language because it will be the lowest cost path to accessing an industrial strength tool.

How are the VC backers going to make money from funding the semgrep team? The traditional financial exit for static analysis companies is selling to a much larger company. Why would a large company buy them, when they could just fork the code (other company sales have involved closed-source tools)? Perhaps those involved think they can make money by selling services (assuming semgrep becomes the go-to tool). I have a terrible track record for making business predictions, so I will stick to the technical stuff.

The difficulties of cascading OKRs

Allan Kelly from Allan Kelly

I almost despair when I hear people advocate cascading OKRs: the idea that someone, some team, some central planning department, can set OKRs which then flow down the organization with each “lower” group implementing some small part of some “higher” ask. What could be more waterfall like?

I admit, when I started working with OKRs I kind-of-expected to be shown the OKRs of the “above” before my team wrote theirs. But when I thought about it, and the more I thought about it, the more I realised if you did do it that way then it is decided unAgile. How can a team be really autonomous, self-organising and self-managing if they have goals handed down to them?

There was a point when I was wracked with self-doubt: am I interpretting OKRs differently to the rest of the world? How do I reconcile agile and cascading OKRs? What am I missing? – but, when you look around, I am not the only one. In fact, if you read, watch and listen to OKR commentators the majority agree with me: the teams delivering OKRs need the latitude to set their own OKRs.

Reconciling OKRs with agile is far from the biggest problem. In fact there are, at least, two bigger problems, one concerns team motivation. Can a team ever be motivated to do something they have no say in? Perhaps some can, I can’t and I know others who don’t. At the very least team members need to be asked.

Motivation becomes especially problematic if you want OKRs to be stretching. If you set someone a stretching goal and ask them to hit it without involving them then don’t be surprised if they shrug their shoulders.

Still, we haven’t got to the biggest problem.

The biggest problem with OKRs is not the metaphysical issues of motivation and whether one is truly agile or not. The biggest difficulty is simply: cascading OKRs are not practical.

First think about the timetable.

If every team is waiting for the team above them to issue OKRs before they set their own then you have a delay built into the system. And the more levels of hierarchy you have the greater the delay is going to be.

For example, suppose you have an executive team, and middle management team and several delivery teams. Then each cycle the exec team need to set some OKRs, once they have set their the middle management can set theirs, and then the delivery teams can set theirs. At each cascade point there needs to be communication, and each point creates the possibility of misunderstanding and mistakes.

Setting OKRs isn’t instantaneous, I think you need about a week to have a think, reflect overnight, iterate once or twice but, if you are well practices, and don’t hit any delays, you might do it in two days. Either way it is going to take at least a week, and possibly three, to get all three layers set. And if anyone runs late then it has a knock on effect.

I’ve heard it said that the Key Results of higher levels become the objectives of the next layer down. The key results of this layer the become the objectives of the one below them. But that assume that the OKRs themselves are a series of “items to do” and that each objective is made up of several pieces which are themselves things to do.

Sure, it sometimes happens that way. I may even have been guilty of interpreting them that way sometimes. But these days I see Key Results not as small pieces of work which, lego style, build into a bigger objective but as Acceptance Criteria: the parameters which the outcome needs to satisfy.

Now to some degree acceptance criteria can be translated into work items to do, and vice versa, but not always. Consider this:

Objective: Improve overnight batch processing to save 10% of work processing costs
Key result #1: Shorten batch processing time by 1 hour so staff do not need to wait for run to complete in the morning
Key result #2: Reduce false positive alerts by 100 per day so that staff waste less time

Now these key results could be packaged as individual work to do but perhaps they are the same piece or work. Perhaps a database upgrade could address both issues in one go. Which path you take is a design decision.

Seeing key results as acceptance criteria changes them from work to do into bounding conditions.

In Succeeding with OKRs in Agile I advise against having domino key results: don’t set key results so that failing to hit one makes others impossible to hit. So, for example, if the DB upgrade had been added to that previous example as key result #1 then the team would have been committed to doing it. And if the upgrade had failed then the other key results would have been lost. Leaving it out gives the team the decision on how to proceed: the people doing the work decide the best way of meeting the objective.

That advice is given within teams but it also applies between teams. If, the Middle Management team require three lesser teams to deliver work to build their own objective then, if any one team fail the middle management team will not only miss one key result but will therefore miss their objective.

Done like this the OKRs become fragile and a dependency nightmare. That will have two effects, first more time will be needed when setting OKRs to identify and mitigate the dependencies, then more time will be needed to manage the dependencies. Progress will only occur at the speed of the slowest.

Second, these problems will encourage people to play it safe and not set stretching and ambitious OKRs. Predictability and safety will be prioritised.

Now if we take the alternative approach and each team sets its OKRs independently then the time lag is removed, teams set OKRs in parallel and if someone is late it doesn’t matter. Dependencies may still exist but they have not been baked into the OKRs so teams can put effort into removing dependencies (reducing coupling and increasing cohesion) rather than putting that energy into managing the dependencies.

So, while we might argue about whether OKRs should, or should not, cascade down; and while we might argue about the psychological effects of being given an OKR by another, simply remember: cascading OKRs mean setting OKRs is going to be more complicated and take longer.

Photo by Alexander Hipp on Unsplash

Subscribe to my blog newsletter and download Continuous Digital for free

The post The difficulties of cascading OKRs appeared first on Allan Kelly.