Replication: not always worth the effort

Derek Jones from The Shape of Code

Replication is the means by which mistakes get corrected in science. A researcher does an experiment and gets a particular result, but unknown to them one or more unmeasured factors (or just chance) had a significant impact. Another researcher does the same experiment and fails to get the same results, and eventually many experiments later people have figured out what is going on and what the actual answer is.

In practice replication has become a low status activity, journals want to publish papers containing new results, not papers backing up or refuting the results of previously published papers. The dearth of replication has led to questions being raised about large swathes of published results. Most journals only published papers that contain positive results, i.e., something was shown to some level of statistical significance; only publishing positive results produces publication bias (there have been calls for journals that publishes negative results).

Sometimes, repeating an experiment does not seem worth the effort. One such example is: An Explicit Strategy to Scaffold Novice Program Tracing. It looks like the authors ran a proper experiment and did everything they are supposed to do; but, I think the reason that got a positive result was luck.

The experiment involved 24 subjects and these were randomly assigned to one of two groups. Looking at the results (figures 4 and 5), it appears that two of the subjects had much lower ability that the other subjects (the authors did discuss the performance of these two subjects). Both of these subjects were assigned to the control group (there is a 25% chance of this happening, but nobody knew what the situation was until the experiment was run), pulling down the average of the control, making the other (strategy) group appear to show an improvement (i.e., the teaching strategy improved student performance).

Had one, or both, low performers been assigned to the other (strategy) group, no experimental effect would have shown up in the results, significantly reducing the probability that the paper would have been accepted for publication.

Why did the authors submit the paper for publication? Well, academic performance is based on papers published (quality of journal they appear in, number of citations, etc), a positive result is reason enough to submit for publication. The researchers did what they have been incentivized to do.

I hope the authors of the paper continue with their experiments. Life is full of chance effects and the only way to get a solid result is to keep on trying.

ACCU Conference 2018

Products, the Universe and Everything from Products, the Universe and Everything

We had an absolute blast at this year's ACCU Conference, and if you were there we imagine you did too.

For us the highlight had to be the launch of #include <C++>, a new global, inclusive, and diverse community for developers interested in C++.

#include C++ logo

This is an initiative that has been brewing for a while, and we're very happy to be a part of. Above all else #include <C++> is designed to be a safe place for developers irrespective of their background, ethnicity, gender identity or sexuality. The group runs a Discord server which is moderated to ensure that it remains a safe space and which you are welcome to join.


On the technical front, one unexpected highlight of the conference was Benjamin Missel's wonderful short talk on writing a C compiler for a BBC Micro, during which he demonstrated SSHing into a BBC Model B through the serial port!

Most conference sessions were recorded so even if you weren't there you can still watch them.

See you at ACCU 2019!

ACCU Conference 2018

Products, the Universe and Everything from Products, the Universe and Everything

We had an absolute blast at this year's ACCU Conference, and if you were there we imagine you did too. For us the highlight had to be the launch of #include <C++>, a new global, inclusive, and diverse community for developers interested in C++.
#include C++ logo
This is an initiative that has been brewing for a while, and we're very happy to be a part of. Above all else #include <C++> is designed to be a safe place for developers irrespective of their background, ethnicity, gender identity or sexuality. The group runs a Discord server which is moderated to ensure that it remains a safe space and which you are welcome to join.
On the technical front, one unexpected highlight of the conference was Benjamin Missel's wonderful short talk on writing a C compiler for a BBC Micro, during which he demonstrated SSHing into a BBC Model B through the serial port! Most conference sessions were recorded so even if you weren't there you can still watch them. See you at ACCU 2019!

ResOrg 2.0.7.27 has been released

Products, the Universe and Everything from Products, the Universe and Everything

ResOrg 2.0.7.27 has just been released. This is a maintenance update for ResOrg 2.0, and is compatible with all ResOrg 2.0 licence keys.

The following changes are included:

  • Fixed a bug in the Symbols Display which could cause some "OK" symbols to be incorrectly shown in the "Problem Symbols Only" view.
  • Corrected the upper range limit for control symbols from 28671 (0x6FFF) to 57343 (0xDFFF).
  • ResOrg binaries are now dual signed with both SHA1 and SHA256.
  • Added support for Visual Studio 2017.
  • Corrected the File Save Dialog filters used by the ResOrgApp "File | Export" command.
  • The ResOrgApp "File | Export", "File | Save", "File | Save As" and "File | Properties" commands (which apply only to symbol file views) are now disabled when the active view is a report.
  • Fixed a crash in the Symbol File Properties Dialog.
  • Fixed a typo on the Symbol File "Next Values" page.
  • Various minor improvements to the installer.

Download ResOrg 2.0.7.27

ResOrg 2.0.7.27 has been released

Products, the Universe and Everything from Products, the Universe and Everything

ResOrg 2.0.7.27 has just been released. This is a maintenance update for ResOrg 2.0, and is compatible with all ResOrg 2.0 licence keys. The following changes are included:
  • Fixed a bug in the Symbols Display which could cause some "OK" symbols to be incorrectly shown in the "Problem Symbols Only" view.
  • Corrected the upper range limit for control symbols from 28671 (0x6FFF) to 57343 (0xDFFF).
  • ResOrg binaries are now dual signed with both SHA1 and SHA256.
  • Added support for Visual Studio 2017.
  • Corrected the File Save Dialog filters used by the ResOrgApp "File | Export" command.
  • The ResOrgApp "File | Export", "File | Save", "File | Save As" and "File | Properties" commands (which apply only to symbol file views) are now disabled when the active view is a report.
  • Fixed a crash in the Symbol File Properties Dialog.
  • Fixed a typo on the Symbol File "Next Values" page.
  • Various minor improvements to the installer.
Download ResOrg 2.0.7.27

The Product Owner is dead, long live the Product Owner!

Allan Kelly from Allan Kelly Associates

3ProductOwners-2018-04-26-17-33.jpg

For years I have been using this picture to describe the Product Owner role. For years I have been saying:

“The title Product Owner is really an alias. Nobody should have Product Owner on their business cards. Product Owner is a Scrum defined role which is usually filled by a Product Manager or Business Analyst, sometimes it is filled by a Domain Expert (also known as a Subject Matter Expert, SME) and sometimes by someone else.”

Easy right?

In telling us about the Product Owner Scrum tells us what one of these people will be doing within the Scrum setting. Scrum doesn’t tell us how the Product Owner knows what they need to know to make those decisions – that comes by virtue of the fact that underneath they are really a Product Manager, BA or expert in the field.

In the early descriptions of Scrum there was a tangible feel that the Product Owner really had the authority to make decisions – they were the OWNER. I still hope that is true but more often than not these days the person playing Product Owner is more likely to be a proxy for one or more real customers.

I go on to say:

“In a software company, like Microsoft or Adobe, Product Managers normally fill the role of Product Owner. The defining feature of the Product Manager role is that their customers are not in the building. The first task facing a new Product Manager is to work out who their customers are – or should be – and then get out to meet them. By definition customers are external.”

“Conversely in a corporate setting, like HSBC, Lufthansa, Proctor and Gamble, a Product Owner is probably a Business Analyst. There job is to analyse some aspect of the business and make it better. By definition their customers are in the building.”

With me so far?

Next I point out that having set up this nice model these roles are increasingly confused because software product companies increasingly sell their software as a service. And corporate customer interact with their customers online, which means customer contact is now through the computer.

Consider the airline industry: twenty years ago the only people who interacted with airline systems from United, BA, Lufthansa, etc. were airline employees. If you wanted to book a flight you went to a travel agent and a nice lady used a green screen to tell you what was available.

Today, whether you book with Lufthansa, SouthWest or Norwegian may well come down to which has the best online booking system.

Business Analyst need to be able to think like Product Managers and Product Managers need to be able to think like Business Analysts.

I regularly see online posts proclaiming “Product Managers are not Product Owners” or “Business Analysts are not Product Owners.” I’ve joined in with this, my alias argument says “they might be but there is so much more to those roles.”

It makes me sad to see the Product Manager role reduced to a Product Owner: the Product Owner role as defined by Scrum is a mere shadow of what a good Product Manager should be.

But the world has moved on, things have changed.

The world has decided that Product Owner is the role, the person who deals with the demand side, the person decides what is needed and what is to be built.

I think its time to change my model. The collision between the world of Business Analysts and Product Managers is now complete. The result is an even bigger mess and a new role has appeared: “Digital Business Analyst” – the illegitimate love child of Business Analysis and Product Management.

The Product Owner is now a superset of Product Manager and Business Analyst.

ProductOwnerSkills-2018-04-26-17-33-1.jpg

Product Owners today may well need the skills of business analysis. They are even more likely to need the skills of Product Management. And they are frequently expected to know about the domain.

Today’s Product Owner may well have a Subject Matter Expert background, in which case they quickly need to learn about Product Ownership, Product Management and Business Analysis.

Or they may have a Business Analysis background and need to absorb Product Management skills. Conversely, Product Owners may come from a Product Management background and may quickly need to learn some Business Analysis. In either case they will learn about the domain but they may want to bring in a Subject Matter Expert too.

To make things harder, exactly which skills they need, and which skills are most important is going to vary from team to team and role to role.

The post The Product Owner is dead, long live the Product Owner! appeared first on Allan Kelly Associates.

Examples of SQL join types (LEFT JOIN, INNER JOIN etc.)

Andy Balaam from Andy Balaam&#039;s Blog

I have 2 tables like this:

> SELECT * FROM table_a;
+------+------+
| id   | name |
+------+------+
|    1 | row1 |
|    2 | row2 |
+------+------+

> SELECT * FROM table_b;
+------+------+------+
| id   | name | aid  |
+------+------+------+
|    3 | row3 |    1 |
|    4 | row4 |    1 |
|    5 | row5 | NULL |
+------+------+------+

INNER JOIN cares about both tables

INNER JOIN cares about both tables, so you only get a row if both tables have one. If there is more than one matching pair, you get multiple rows.

> SELECT * FROM table_a a INNER JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
+------+------+------+------+------+

It makes no difference to INNER JOIN if you reverse the order, because it cares about both tables:

> SELECT * FROM table_b b INNER JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
+------+------+------+------+------+

You get the same rows, but the columns are in a different order because we mentioned the tables in a different order.

LEFT JOIN only cares about the first table

LEFT JOIN cares about the first table you give it, and doesn’t care much about the second, so you always get the rows from the first table, even if there is no corresponding row in the second:

> SELECT * FROM table_a a LEFT JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
|    2 | row2 | NULL | NULL | NULL |
+------+------+------+------+------+

Above you can see all rows of table_a even though some of them do not match with anything in table b, but not all rows of table_b – only ones that match something in table_a.

If we reverse the order of the tables, LEFT JOIN behaves differently:

> SELECT * FROM table_b b LEFT JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
|    5 | row5 | NULL | NULL | NULL |
+------+------+------+------+------+

Now we get all rows of table_b, but only matching rows of table_a.

RIGHT JOIN only cares about the second table

a RIGHT JOIN b gets you exactly the same rows as b LEFT JOIN a. The only difference is the default order of the columns.

> SELECT * FROM table_a a RIGHT JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
| NULL | NULL |    5 | row5 | NULL |
+------+------+------+------+------+

This is the same rows as table_b LEFT JOIN table_a, which we saw in the LEFT JOIN section.

Similarly:

> SELECT * FROM table_b b RIGHT JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
| NULL | NULL | NULL |    2 | row2 |
+------+------+------+------+------+

Is the same rows as table_a LEFT JOIN table_b.

No join at all gives you copies of everything

If you write your tables with no JOIN clause at all, just separated by commas, you get every row of the first table written next to every row of the second table, in every possible combination:

> SELECT * FROM table_b b, table_a;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    3 | row3 | 1    |    2 | row2 |
|    4 | row4 | 1    |    1 | row1 |
|    4 | row4 | 1    |    2 | row2 |
|    5 | row5 | NULL |    1 | row1 |
|    5 | row5 | NULL |    2 | row2 |
+------+------+------+------+------+

Another way to use Emacs to convert DOS/Unix line endings

Timo Geusch from The Lone C++ Coder&#039;s Blog

I’ve previously blogged about using Emacs to convert line endings and use it as an alternative to the dos2unix/unix2dos tools. Using set-buffer-file-coding-system works well and has been my go-to conversion method. That said, there is another way to do the same conversion by using M-x recode-region. As the name implies, recode-region works on a region. […]

The post Another way to use Emacs to convert DOS/Unix line endings appeared first on The Lone C++ Coder's Blog.

Custom Alias Analysis in LLVM

Simon Brand from Simon Brand

At Codeplay I currently work on a compiler backend for an embedded accelerator. The backend is the part of the compiler which takes some representation of the source code and translates it to machine code. In my case I’m working with LLVM, so this representation is LLVM IR.

It’s very common for accelerators like GPUs to have multiple regions of addressable memory – each with distinct properties. One important optimisation I’ve implemented recently is extending LLVM’s alias analysis functionality to handle the different address spaces for our target architecture.

This article will give an overview of the aim of alias analysis – along with an example based around address spaces – and show how custom alias analyses can be expressed in LLVM. You should be able to understand this article even if you don’t work with compilers or LLVM; hopefully it gives some insight into what compiler developers do to make the tools you use generate better code. LLVM developers may find the implementation section helpful, but may want to read the documentation or examples linked at the bottom for more details.

What is alias analysis

Alias analysis is the process of determining whether two pointers can point to the same object (alias) or not. This is very important for some valuable optimisations.

Take this C function as an example:

int foo (int __attribute__((address_space(0)))* a,
         int __attribute__((address_space(1)))* b) {
    *a = 42;
    *b = 20;
    return *a;
}

Those __attribute__s specify that a points to an int in address space 0, b points to an int in address space 1. An important detail of the target architecture for this code is that address spaces 0 and 1 are completely distinct: modifying memory in address space 0 can never affect memory in address space 1. Here’s some LLVM IR which could be generated from this function:

define i32 @foo(i32 addrspace(0)* %a, i32 addrspace(1)* %b) #0 {
entry:
  store i32 42, i32 addrspace(0)* %a, align 4
  store i32 20, i32 addrspace(1)* %b, align 4
  %0 = load i32, i32* %a, align 4
  ret i32 %0
}

For those unfamiliar with LLVM IR, the first store is storing 42 into *a, the second storing 20 into *b. The %0 = ... line is like loading *a into a temporary variable, which is then returned in the final line.

Optimising foo

Now we want foo to be optimised. Can you see an optimisation which could be made?

What we really want is for that load from a (the line beginning %0 = ...) to be removed and for the final statement to instead return 42. We want the optimised code to look like this:

define i32 @foo(i32 addrspace(0)* %a, i32 addrspace(1)* %b) #0 {
entry:
  store i32 42, i32 addrspace(0)* %a, align 4
  store i32 20, i32 addrspace(1)* %b, align 4
  ret i32 42
}

However, we have to be very careful, because this optimisation is only valid if a and b do not alias, i.e. they must not point at the same object. Forgetting about the address spaces for a second, consider this call to foo where we pass pointers which do alias:

int i = 0;
int result = foo(&i, &i);

Inside the unoptimised version of foo, i will be set to 42, then to 20, then 20 will be returned. However, if we carry out desired optimisation then the two stores will occur, but 42 will be returned instead of 20. We’ve just broken the behaviour of our function.

The only way that a compiler can reasonably carry out the above optimisation is if it can prove that the two pointers cannot possibly alias. This reasoning is carried out through alias analysis.

Custom alias analysis in LLVM

As I mentioned above, address spaces 0 and 1 for our target architecture are distinct. However, this may not hold for some systems, so LLVM cannot assume that it holds in general: we need to make it explicit.

One way to achieve this is llvm::AAResultBase. If our target is called TAR then we can create a class called TARAAResult which inherits from AAResultBase<TARAAResult>1:

class TARAAResult : public AAResultBase<TARAAResult> {
public:
  explicit TARAAResult() : AAResultBase() {}
  TARAAResult(TARAAResult &&Arg) : AAResultBase(std::move(Arg)) {}

  AliasResult alias(const MemoryLocation &LocA, const MemoryLocation &LocB);
};

The magic happens in the alias member function, which takes two MemoryLocations and returns an AliasResult. The result indicates whether the locations can never alias, may alias, partially alias, or precisely alias. We want our analysis to say “If the address spaces for the two memory locations are different, then they can never alias”. The resulting code is surprisingly close to this English description:

AliasResult TARAAResult::alias(const MemoryLocation &LocA,
                               const MemoryLocation &LocB) {
  auto AsA = LocA.Ptr->getType()->getPointerAddressSpace();
  auto AsB = LocB.Ptr->getType()->getPointerAddressSpace();

  if (AsA != AsB) {
    return NoAlias;
  }

  // Forward the query to the next analysis.
  return AAResultBase::alias(LocA, LocB);
}

Alongside this you need a bunch of boilerplate for creating a pass out of this analysis (I’ll link to a full example at the end), but after that’s done you just register the pass and ensure that the results of it are tracked:

void TARPassConfig::addIRPasses() {
  addPass(createTARAAWrapperPass());
  auto AnalysisCallback = [](Pass &P, Function &, AAResults &AAR) {
    if (auto *WrapperPass = P.getAnalysisIfAvailable<TARAAWrapper>()) {
      AAR.addAAResult(WrapperPass->getResult());
    }
  }; 
  addPass(createExternalAAWrapperPass(AliasCallback));
  TargetPassConfig::addIRPasses();
}

We also want to ensure that there is an optimisation pass which will remove unnecessary loads and stores which our new analysis will find. One example is Global Value Numbering.

  addPass(createNewGVNPass());

After all this is done, running the optimiser on the LLVM IR from the start of the post will eliminate the unnecessary load, and the generated code will be faster as a result.

You can see an example with all of the necessary boilerplate in LLVM’s test suite. The AMDGPU target has a full implementation which is essentially what I presented above extended for more address spaces.

Hopefully you have got a taste of what alias analysis does and the kinds of work involved in writing compilers, even if I have not gone into a whole lot of detail. I may write some more short posts on other areas of compiler development if readers find this interesting. Let me know on Twitter or in the comments!


  1. This pattern of inheriting from a class and passing the derived class as a template argument is known as the Curiously Recurring Template Pattern