Modular vs. monolithic programs: a big performance difference

Derek Jones from The Shape of Code

For a long time now I have been telling people that no experiment has found a situation where the treatment (e.g., use of a technique or tool) produces a performance difference that is larger than the performance difference between the subjects.

The usual results are that differences between people is the source of the largest performance difference, successive runs are the next largest (i.e., people get better with practice), and the smallest performance difference occurs between using/not using the technique or tool.

This is rather disheartening news.

While rummaging through a pile of books I had not looked at in many years, I (re)discovered the paper “An empirical study of the effects of modularity on program modifiability” by Korson and Vaishnavi, in “Empirical Studies of Programmers” (the first one in the series). It’s based on Korson’s 1988 PhD thesis, with the same title.

There were four experiments, involving seven people from industry and nine students, each involving modifying a 900(ish)-line program in some way. There were two versions of each program, they differed in that one was written in a modular form, while the other was monolithic. Subjects were permuted between various combinations of program version/problem, but all problems were solved in the same order.

The performance data was published in the paper, so I fitted various regressions models to it (code+data). There is enough information in the data to separate out the effects of modular/monolithic, kind of problem and subject differences. Because all subjects solved problems in the same order, it is not possible to extract the impact of learning on performance.

The modular/monolithic performance difference was around twice as large as the difference between subjects (removing two very poorly performing subjects reduces the difference to 1.5). I’m going to have to change my slides.

Would the performance difference have been so large if all the subjects had been experienced developers? There is not a lot of well written modular code out there, and so experienced developers get lots of practice with spaghetti code. But, even if the performance difference is of the same order as the difference between developers, that is still a very worthwhile difference.

Now there are lots of ways to write a program in modular form, and we don’t know what kind of job Korson did in creating, or locating, his modular programs.

There are also lots of ways of writing a monolithic program, some of them might be easy to modify, others a tangled mess. Were these programs intentionally written as spaghetti code, or was some effort put into making them easy to modify?

The good news from the Korson study is that there appears to be a technique that delivers larger performance improvements than the difference between people (replication needed). We can quibble over how modular a modular program needs to be, and how spaghetti-like a monolithic program has to be.

Gradle: what is a task, and how can I make a task depend on another task?

Andy Balaam from Andy Balaam's Blog

In an insane world, Gradle sometimes seems like the sanest choice for building a Java or Kotlin project. But what on Earth does all the stuff inside build.gradle actually mean? And when does my code run? And how do you make a task? And how do you persuade a task to depend on another task?

Setting up

To use Gradle, get hold of any version of it for long enough to create a local gradlew file, and the use that.
$ mkdir gradle-experiments
$ cd gradle-experiments
$ sudo apt install gradle  # Briefly install the system version of gradle
...
$ gradle wrapper --gradle-version=5.2.1
$ sudo apt remove gradle   # Optional - uninstalls the system version
$ ./gradlew tasks
... If all is good, this should ...
... print a list of available tasks. ...
It is normal for gradlew and the whole gradle directory it creates to be checked into source control. This means everyone who fetches the code from source control will have a predictable Gradle version.

What is build.gradle?

build.gradle is a Groovy program that Gradle runs within a context that it has set up for you. That context means that you are actually calling methods of a Project object, and modifying its properties. The fact that Groovy lets you miss out a lot of punctuation makes that harder to see, but it's true. The first thing to get your head around is that Gradle actually runs your code immediately, so if your build.gradle looks like this (and only this):
println("Hello")
when you run Gradle your code runs:
$ ./gradlew -q 
Hello
... more guff ...
So that code runs even if you don't ask Gradle to run a task containing that code. It runs at "configuration time" - i.e. when Gradle is understanding your build.gradle file. Actually, "understanding" it means executing it. Remember when I said this code runs in the context of a Project? What that means is that if you have something like this in your build.gradle:
repositories {
    jcenter()
}
what it really means is something like this:
project.repositories(
    {
        it.jcenter()
    }
)
You are calling the repositories method on the project object. The argument to the repositories method is a Groovy closure, which is a blob of code that will get run later. I've used the magic it name above to demonstrate that jcenter is just a method being called on the object that is the context for the closure when it is run. When does it run? Let's find out:
println("before")
project.repositories( {
    println("within")
    jcenter()
})
println("after")
$ ./gradlew -q
before
within
after
... more guff ...
This surprised me - it means the closure you pass in to repositories is actually run immediately, as part of running repositories, before execution gets to the line after that call. As we'll see later, some closures you create do not run immediately like this one. Once you know that build.gradle is actually modifying a Project object, you have starting point for understanding the Gradle reference documentation.

How do you make a task

You probably shouldn't do it very often, but it was instructive for me to understand how to make my own custom task. Here's an example:
tasks.register("mytask") {
    doLast {
        println("running mytask")
    }
}
This creates a new task by calling the register method on the tasks property of the Project object. Register takes two arguments: a name for the task ("mytask" here), and a closure with some code in it to run when we decide we need this task. That closure gets run in a context that can't see the Project object, but instead can see a Task object which it is helping to make. That Task object has a doLast method that we call, and passing it a closure that will be run when the task is actually executed (not immediately). If we remove some of the syntactic sugar the above build.gradle looks like this:
tasks.register(
    "mytask",
    {
        it.doLast {
            println("running mytask")
        }
    }
)
Above we can see that register really does take two arguments as I said above - the first version uses a Groovy feature where if you miss out the last argument and write a closure immediately afterwards the closure is passed as the last argument. Confusing, eh? Again, notice that doLast is a method on the Task object that is implicitly available when the closure is run. So we have created a task that we can run:
 ./gradlew -q mytask
running mytask

How do you make a task depend on another task?

If I want to run my code formatting before my compile (for example) I sometimes need to modify a task to make it depend on another one. This can done for tasks you create or for pre-existing ones. Here's an example:
plugins {
    id "java"
}
tasks.register("mytask") {
    doLast {
        println("running mytask")
    }
}
compileJava {
    dependsOn tasks.named("mytask")
}
So, calling the plugins on the Project at the top with a closure that ran the id method on something modified the Project so that it had a new method called compileJava which we called at the bottom, passing it a closure to run. That closure ran in the context of a Task object (similar to when we created a task, but now allow us to modify a pre-existing one). We called the dependsOn method of the Task object, passing in another Task object which we had got by calling the named method on the tasks object. [Side note: the register method actually returns a Task object that we could have passed to dependsOn without looking it up again using named, but Groovy doesn't provide a very convenient way of holding on to that reference, so we did do it. The Kotlin example below shows that this is quite simple in Kotlin.]

How do I do all this in Kotlin?

Because one DSL that hides what's really going on wasn't enough for you, Gradle now provides a second DSL that hides what's going on in subtly different ways, which is a program written in Kotlin instead of Groovy. This is marginally better, because Kotlin doesn't let you do quite so many stupid tricks as Groovy does. Below are all our examples in Kotlin. You get started exactly the same way, by following "Setting up" above. Remember to name your build file build.gradle.kts.

Say hello in Gradle Kotlin

println("Hello")
This is identical to the Groovy version.)

Use jcenter repo in Gradle Kotlin

repositories {
    jcenter()
}
This is identical to the Groovy version, and with the same meaning: repositories is a method on the implicitly-available Project object. The "unsugared" version looks like this in Kotlin:
this.repositories(
    {
        this.jcenter()
    }
)
[Note that the word this is used to access the implicit context. The word it has a different meaning in Kotlin from in Groovy. In Groovy it means the implicit context, but in Kotlin it means the first argument. We didn't pass any arguments to jcenter when we called it, so we can't use it, but we were being run in a context, which we can refer to using this. Simple. huh?]

Execution order in Gradle Kotlin

We this build.gradle.kts:
println("before")
project.repositories( {
    println("within")
    jcenter()
})
println("after")
We see this behaviour:
$ ./gradlew -q
before
within
after
which is all identical to the Groovy version.

Making a new task in Gradle Kotlin

tasks.register("mytask") {
    doLast {
        println("running mytask")
    }
}
This is identical to the Groovy version, but slightly different when unsugared:
tasks.register(
    "mytask",
    {
        this.doLast(
            {
                println("running mytask")
            }
        )
    }
)
Notice that Kotlin lets you do the same trick as Groovy: providing an extra argument to a function that is a closure by writing it immediately after it looks like you've finished calling it. It's good for people who dislike closing brackets hanging around longer than they're welcome. As someone who likes Lisp, I'm OK with it, but what do I know?

One task depending on another in Gradle Kotlin

plugins {
    java
}
val mytask = tasks.register("mytask") {
    doLast {
        println("running mytask")
    }
}
tasks.compileJava {
    dependsOn(mytask)
}
This differs slightly from the Groovy version, even though the meaning is the same: we start off in the context of a Project object that we call methods on. The code to make one task depend on another gets hold of the Task object called compileJava from inside the tasks property of the Project, and calls it (because it's a callable object). We pass in a closure that runs in the context of this Task object, calling its dependsOn method, and passing in a reference to the mytask object, which is a Task and was created in the code above.

Corrections and clarifications welcome

The above is what I have worked out by experimentation and trying to read the Gradle documentation. Please add comments that clear up confusions and correct mistakes.

Visual Lint 6.5.7.304 has been released

Products, the Universe and Everything from Products, the Universe and Everything

This is a recommended maintenance update for Visual Lint 6.0 and 6.5. The following changes are included:

  • Added support for Texas Instruments Code Composer Studio 7.x and 8.x. Note that when a project for CCS 6.x or later is opened, the latest installed version of the IDE will be assumed.
  • Fixed a bug in the parsing of Texas Instruments Code Composer 5.x/6.x/7.x/8.x projects which could cause them to be inconsistently classified as vanilla Eclipse projects in some circumstances.
  • Fixed a crash which could occur in VisualLintGui if saved PC-lint or PC-lint Plus issue category data became corrupted.
  • The generated PC-lint/PC-lint Plus command line now includes a -width(0) directive at the beginning. This ensures that any fatal errors are reported on a single line.
  • Added a PC-lint Plus indirect file (co-rb-gcc.lnt) for GCC to the installer.
  • Added a PC-lint Plus indirect file (co-rb-ti-arm.lnt) for the Texas Instruments ARM compiler to the installer.
  • Added details of the /silent and /verbose switches to the VisualLintConsole help screen and documentation.

Download Visual Lint 6.5.7.304

Visual Lint 6.5.7.304 has been released

Products, the Universe and Everything from Products, the Universe and Everything

This is a recommended maintenance update for Visual Lint 6.0 and 6.5. The following changes are included:
  • Added support for Texas Instruments Code Composer Studio 7.x and 8.x. Note that when a project for CCS 6.x or later is opened, the latest installed version of the IDE will be assumed.
  • Fixed a bug in the parsing of Texas Instruments Code Composer 5.x/6.x/7.x/8.x projects which could cause them to be inconsistently classified as vanilla Eclipse projects in some circumstances.
  • Fixed a crash which could occur in VisualLintGui if saved PC-lint or PC-lint Plus issue category data became corrupted.
  • The generated PC-lint/PC-lint Plus command line now includes a -width(0) directive at the beginning. This ensures that any fatal errors are reported on a single line.
  • Added a PC-lint Plus indirect file (co-rb-gcc.lnt) for GCC to the installer.
  • Added a PC-lint Plus indirect file (co-rb-ti-arm.lnt) for the Texas Instruments ARM compiler to the installer.
  • Added details of the /silent and /verbose switches to the VisualLintConsole help screen and documentation.
Download Visual Lint 6.5.7.304

Code your way out of a paper bag

Frances Buontempo from BuontempoConsulting


I attended nor(dEV):con, a tech conference in Norwich, last week. I gave a 45 minute talk which I called "Code your way out of a paper bag". A majority of my recent talks involve getting out of, and once into, a paper bag. I've used this as a vehicle to demonstrate some machine learning, AI and other algorithms.

I interviewed a candidate for a role a while ago, and the other interviewer commented the interviewee couldn't code their way out of a paper bag afterwards. Jeff Atwood, the co-founder of Stack Overflow, has made similar comments:

We're tired of talking to candidates who can't program their way out of a paper bag.

It's a shorthand way for saying people can't do something simple. First, programming isn't always simple. Second, how do you prove you can code your way out of a paper bag?

Use Genetic Algorithms

You could do something straightforward, like make a line zoom out of a paper bag, provided you have a way to draw things.



Alternatively, you could do something more complicated. I used genetic algorithms. If you fire a cannon ball, choosing the angle and initial velocity upfront, it may, or may not end up outside a paper bag, provided you use a tiny, virtual cannon, placed at the bottom of a paper bag. The cannon balls might just fire through the bag, but perhaps you can get them to go over the side if you choose the right numbers. There's a free excerpt from my book here talking this through.

Now, you could guess a few (angle, velocity) pairs and see what happens.  I got three people in the audience to do this. You could then swap the angle and velocity of the better tries, and maybe change one or both of the numbers slightly. This covers the essence of how genetic algorithms work. The swap is called crossover, drawing on the idea of chromosomes recombining when living organisms breed. The slight tweak is called mutation, and draws on the idea of random fluctuations in DNA . Darwin's evolution says these random changes give rise to new species. The fitter ones survive. All you then need is a function to decide how fit a solution is, often called a fitness function. Start with some random pairs, select some fitter pairs, use crossover and mutation for a while. In the end, the attempts might get better.

Over-engineered?


You could argue my approach is somewhat over-engineered. You'd be right. However, it's a nice self-contained example to demonstrate gentic algorithms. It was filmed, so will be up online soon. I'm not sure how clear the audience coming out with their attempts will be on film. Hopefully enough for you to get the idea.

How do you   out of a paper bag? Tell twitter, or tell me.

Have a look at my book too.



Audio & free eBook of Project Myopia

Allan Kelly from Allan Kelly Associates

A little bit of time sensitive news, albeit at the risk of overwhelming my regular readers – especially those who take this blog on the mailing list. Sorry.

The #NoProjects book, Project Myopia is available as a free ebook from Amazon (Amazon USA) for two days, Tuesday 26 February and Wednesday 27 – well, strictly speaking midnight Monday/Tuesday till midnight Wednesday/Thursday Pacific time, so 8am Tuesday to 8am Thursday GMT.

This is to celebrate the release of the audio version of Project Myopia which is now available on Audible and Amazon (where the free ebook is too), and possibly some other places where Audible has distributed it.

Unfortunately, Project Myopia will not be free in every Amazon. My full apologies, I think it is going to be free for UK and US Amazon customers, and probably Germany customer too. Elsewhere I’m not sure, Amazon don’t make it easy to find out.

I know I have readers elsewhere so I’m really sorry about this. At some time I’ll see what I can do about this restriction.

The post Audio & free eBook of Project Myopia appeared first on Allan Kelly Associates.

Evidence-based election campaigning

Derek Jones from The Shape of Code

I was at a hackathon on evidence-based election campaigning yesterday, organized by Campaign Lab.

My previous experience with political oriented hackathons was a Lib Dem hackathon; the event was only advertised to party members and I got to attend because a fellow hackathon-goer (who is a member) invited me along. We spent the afternoon trying to figure out how to extract information on who turned up to vote, from photocopies of lists of people eligible to vote marked up by the people who hand out ballot papers.

I have also been to a few hackathons where the task was to gather and analyze information about forthcoming, or recent, elections. There did not seem to be a lot of information publicly available, and I had assumed that the organization, and spending power, of the UK’s two main parties (i.e., Conservative and Labour) meant that they did have data.

No, the main UK political parties don’t have lots of data, in fact they don’t have very much at all, and make hardly any use of what they do have.

I had a really interesting chat with Campaign Lab’s Morgan McSweeney, about political campaigning, and how they have not been evidence-based. There were lots of similarities with evidence-based software engineering, e.g., a few events (such the Nixon vs. Kennedy and Bill Clinton elections) created campaigning templates that everybody else now follows. James Moulding drew diagrams showing Labour organization and Conservative non-organization (which looked like a Dalek) and Hannah O’Rourke spoiled us with various kinds of biscuits.

An essential component of evidence-based campaigning is detailed knowledge of the outcome, such as: how many votes did each candidate get? Based on past hackathon experience, I thought this data was only available for recent elections, but Morgan showed me that Wikipedia had constituency level results going back many years. Here was a hackathon task; collect together constituency level results, going back decades, in one file.

Following the Wikipedia citations led me to Richard Kimber’s website, which had detailed results at the constituency level going back to 1945. The catch was that there was a separate file for each constituency; I emailed Richard, asking for a file containing everything (Richard promptly replied, the only files were the ones on the website).

Pivot.

The following plot was created using some of the data made available during a hackathon at the Office of National Statistics (sometime in 2015). We (Pavel+others and me) did not make much use of this plot at the time, but it always struck me as interesting. I showed it to the people at this hackathon, who sounded interested. The plot shows the life-expectancy for people living in a constituency where Conservative(blue)/Labour(red) candidate won the 2015 general election by a given percentage margin, over the second-placed candidate.

Life-expectancy for people living in a constituency where Conservative/labour won by a given percentage margin.

Rather than scrape the election data (added to my TODO list), I decided to recreate the plot and tidy up the associated analysis code; it’s now available via the CampaignLab Github repo

My interpretation of the difference in life-expectancy is that the Labour strongholds are in regions where there is (or once was) lots of heavy industry and mining; the kind of jobs where people don’t live long after retiring.

Begin and End with range-based for loops

Anthony Williams from Just Software Solutions Blog

On slack the other day, someone mentioned that lots of companies don't use range-based for loops in their code because they use PascalCase identifiers, and their containers thus have Begin and End member functions rather than the expected begin and end member functions.

Having recently worked in a codebase where this was the case, I thought it would be nice to provide a solution to this problem.

The natural solution would be to provide global overloads of the begin and end functions: these are always checked by range-based for if the member functions begin() and end() are not found. However, when defining global function templates, you need to be sure that they are not too greedy: you don't want them to cause ambiguity in overload resolution or be picked in preference to std::begin or std::end.

My first thought was to jump through metaprogramming hoops checking for Begin() and End() members that return iterators, but then I thought that seemed complicated, so looked for something simpler to start with.

The simplest possible solution is just to declare the functions the same way that std::begin() and std::end() are declared:

template <class C> constexpr auto begin(C &c) -> decltype(c.Begin()) {
    return c.Begin();
}
template <class C> constexpr auto begin(const C &c) -> decltype(c.Begin()) {
    return c.Begin();
}

template <class C> constexpr auto end(C &c) -> decltype(c.End()) {
    return c.End();
}
template <class C> constexpr auto end(const C &c) -> decltype(c.End()) {
    return c.End();
}

Initially I thought that this would be too greedy, and cause problems, but it turns out this is fine.

The use of decltype(c.Begin()) triggers SFINAE, so only types which have a public member named Begin which can be invoked with empty parentheses are considered; for anything else these functions are just discarded and not considered for overload resolution.

The only way this is likely to be a problem is if the user has also defined a begin free function template for a class that has a suitable Begin member, in which case this would potentially introduce overload resolution ambiguity. However, this seems really unlikely in practice: most such function templates will end up being a better match, and any non-template functions are almost certainly a better match.

So there you have it: in this case, the simplest solution really is good enough! Just include this header and you're can freely use range-based for loops with containers that use Begin() and End() instead of begin() and end().

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Clean git blame history

Tim Pizey from Tim Pizey

Place the following in a command file, run it within your repo:

#!/bin/sh

git filter-branch --env-filter '

an="$GIT_AUTHOR_NAME"
am="$GIT_AUTHOR_EMAIL"
cn="$GIT_COMMITTER_NAME"
cm="$GIT_COMMITTER_EMAIL"

if [ "$GIT_AUTHOR_EMAIL" = "timp@paneris.org" ]
then
an="Tim Pizey"
am="timp21337@paneris.org"
fi

if [ "$GIT_COMMITTER_EMAIL" = "timp@paneris.org" ]
then
cn="Tim Pizey"
cm="timp21337@paneris.org"
fi

export GIT_AUTHOR_NAME="$an"
export GIT_AUTHOR_EMAIL="$am"
export GIT_COMMITTER_NAME="$cn"
export GIT_COMMITTER_EMAIL="$cm"
'
Then git push -f origin master Note that this process is very slow so do not repeat for every change: add all the changes you want to make and perform in one pass.