Another way to use Emacs to convert DOS/Unix line endings

Timo Geusch from The Lone C++ Coder's Blog

I’ve previously blogged about using Emacs to convert line endings and use it as an alternative to the dos2unix/unix2dos tools. Using set-buffer-file-coding-system works well and has been my go-to conversion method. That said, there is another way to do the same conversion by using M-x recode-region. As the name implies, recode-region works on a region. […]

The post Another way to use Emacs to convert DOS/Unix line endings appeared first on The Lone C++ Coder's Blog.

Custom Alias Analysis in LLVM

Simon Brand from Simon Brand

At Codeplay I currently work on a compiler backend for an embedded accelerator. The backend is the part of the compiler which takes some representation of the source code and translates it to machine code. In my case I’m working with LLVM, so this representation is LLVM IR.

It’s very common for accelerators like GPUs to have multiple regions of addressable memory – each with distinct properties. One important optimisation I’ve implemented recently is extending LLVM’s alias analysis functionality to handle the different address spaces for our target architecture.

This article will give an overview of the aim of alias analysis – along with an example based around address spaces – and show how custom alias analyses can be expressed in LLVM. You should be able to understand this article even if you don’t work with compilers or LLVM; hopefully it gives some insight into what compiler developers do to make the tools you use generate better code. LLVM developers may find the implementation section helpful, but may want to read the documentation or examples linked at the bottom for more details.

What is alias analysis

Alias analysis is the process of determining whether two pointers can point to the same object (alias) or not. This is very important for some valuable optimisations.

Take this C function as an example:

int foo (int __attribute__((address_space(0)))* a,
         int __attribute__((address_space(1)))* b) {
    *a = 42;
    *b = 20;
    return *a;
}

Those __attribute__s specify that a points to an int in address space 0, b points to an int in address space 1. An important detail of the target architecture for this code is that address spaces 0 and 1 are completely distinct: modifying memory in address space 0 can never affect memory in address space 1. Here’s some LLVM IR which could be generated from this function:

define i32 @foo(i32 addrspace(0)* %a, i32 addrspace(1)* %b) #0 {
entry:
  store i32 42, i32 addrspace(0)* %a, align 4
  store i32 20, i32 addrspace(1)* %b, align 4
  %0 = load i32, i32* %a, align 4
  ret i32 %0
}

For those unfamiliar with LLVM IR, the first store is storing 42 into *a, the second storing 20 into *b. The %0 = ... line is like loading *a into a temporary variable, which is then returned in the final line.

Optimising foo

Now we want foo to be optimised. Can you see an optimisation which could be made?

What we really want is for that load from a (the line beginning %0 = ...) to be removed and for the final statement to instead return 42. We want the optimised code to look like this:

define i32 @foo(i32 addrspace(0)* %a, i32 addrspace(1)* %b) #0 {
entry:
  store i32 42, i32 addrspace(0)* %a, align 4
  store i32 20, i32 addrspace(1)* %b, align 4
  ret i32 42
}

However, we have to be very careful, because this optimisation is only valid if a and b do not alias, i.e. they must not point at the same object. Forgetting about the address spaces for a second, consider this call to foo where we pass pointers which do alias:

int i = 0;
int result = foo(&i, &i);

Inside the unoptimised version of foo, i will be set to 42, then to 20, then 20 will be returned. However, if we carry out desired optimisation then the two stores will occur, but 42 will be returned instead of 20. We’ve just broken the behaviour of our function.

The only way that a compiler can reasonably carry out the above optimisation is if it can prove that the two pointers cannot possibly alias. This reasoning is carried out through alias analysis.

Custom alias analysis in LLVM

As I mentioned above, address spaces 0 and 1 for our target architecture are distinct. However, this may not hold for some systems, so LLVM cannot assume that it holds in general: we need to make it explicit.

One way to achieve this is llvm::AAResultBase. If our target is called TAR then we can create a class called TARAAResult which inherits from AAResultBase<TARAAResult>1:

class TARAAResult : public AAResultBase<TARAAResult> {
public:
  explicit TARAAResult() : AAResultBase() {}
  TARAAResult(TARAAResult &&Arg) : AAResultBase(std::move(Arg)) {}

  AliasResult alias(const MemoryLocation &LocA, const MemoryLocation &LocB);
};

The magic happens in the alias member function, which takes two MemoryLocations and returns an AliasResult. The result indicates whether the locations can never alias, may alias, partially alias, or precisely alias. We want our analysis to say “If the address spaces for the two memory locations are different, then they can never alias”. The resulting code is surprisingly close to this English description:

AliasResult TARAAResult::alias(const MemoryLocation &LocA,
                               const MemoryLocation &LocB) {
  auto AsA = LocA.Ptr->getType()->getPointerAddressSpace();
  auto AsB = LocB.Ptr->getType()->getPointerAddressSpace();

  if (AsA != AsB) {
    return NoAlias;
  }

  // Forward the query to the next analysis.
  return AAResultBase::alias(LocA, LocB);
}

Alongside this you need a bunch of boilerplate for creating a pass out of this analysis (I’ll link to a full example at the end), but after that’s done you just register the pass and ensure that the results of it are tracked:

void TARPassConfig::addIRPasses() {
  addPass(createTARAAWrapperPass());
  auto AnalysisCallback = [](Pass &P, Function &, AAResults &AAR) {
    if (auto *WrapperPass = P.getAnalysisIfAvailable<TARAAWrapper>()) {
      AAR.addAAResult(WrapperPass->getResult());
    }
  }; 
  addPass(createExternalAAWrapperPass(AliasCallback));
  TargetPassConfig::addIRPasses();
}

We also want to ensure that there is an optimisation pass which will remove unnecessary loads and stores which our new analysis will find. One example is Global Value Numbering.

  addPass(createNewGVNPass());

After all this is done, running the optimiser on the LLVM IR from the start of the post will eliminate the unnecessary load, and the generated code will be faster as a result.

You can see an example with all of the necessary boilerplate in LLVM’s test suite. The AMDGPU target has a full implementation which is essentially what I presented above extended for more address spaces.

Hopefully you have got a taste of what alias analysis does and the kinds of work involved in writing compilers, even if I have not gone into a whole lot of detail. I may write some more short posts on other areas of compiler development if readers find this interesting. Let me know on Twitter or in the comments!


  1. This pattern of inheriting from a class and passing the derived class as a template argument is known as the Curiously Recurring Template Pattern

A Python Data Science hackathon

Derek Jones from The Shape of Code

I was at the Man AHL Hackathon this weekend. The theme was improving the Python Data Science ecosystem. Around 15, or so, project titles had been distributed around the tables in the Man AHL cafeteria and the lead person for each project gave a brief presentation. Stable laws in SciPy sounded interesting to me and their room location included comfy seating (avoiding a numb bum is an under appreciated aspect of choosing a hackathon team and wooden bench seating is not numbing after a while).

Team Stable laws consisted of Andrea, Rishabh, Toby and yours truly. Our aim was to implement the Stable distribution as a Python module, to be included in the next release of SciPy (the availability had been announced a while back and there has been one attempt at an implementation {which seems to contain a few mistakes}).

We were well-fed and watered by Man AHL, including fancy cream buns and late night sushi.

A probability distribution is stable if the result of linear combinations of the distribution has the same distribution; the Gaussian, or Normal, distribution is the most well-known stable distribution and the central limit theorem leads many to think, that is that.

Two other, named, stable distributions are the Cauchy distribution and most interestingly (from my perspective this weekend) the Lévy distribution. Both distributions have very fat tails; the mean and variance of the Cauchy distribution is undefined (i.e., the values jump around as the sample size increases, never converging to a fixed value), while they are both infinite for the Lévy distribution.

Analytic expressions exist for various characteristics of the Stable distribution (e.g., probability distribution function), with the Gaussian, Cauchy and Lévy distributions as special cases. While solutions for implementing these expressions have been proposed, care is required; the expressions are ill-behaved in different ways over some intervals of their parameter values.

Andrea has spent several years studying the Stable distribution analytically and was keen to create an implementation. My approach for complicated stuff is to find an existing implementation and adopt it. While everybody else worked their way through the copious papers that Andrea had brought along, I searched for existing implementations.

I found several implementations, but they all suffered from using approaches that delivered discontinuities and poor accuracies over some range of parameter values.

Eventually I got lucky and found a paper by Royuela-del-Val, Simmross-Wattenberg and Alberola-López, which described their implementation in C: Libstable (licensed under the GPL, perfect for SciPy); they also provided lots of replication material from their evaluation. An R package was available, but no Python support.

No other implementations were found. Team Stable laws decided to create a new implementation in Python and to create a Python module to interface to the C code in libstable (the bit I got to do). Two implementations would allow performance and accuracy to be compared (accuracy checks really need three implementations to get some idea of which might be incorrect, when output differs).

One small fix was needed to build libstable under OS X (change Makefile to link against .so library, rather than .a) and a different fix was needed to install the R package under OS X (R patch; Windows and generic Unix were fine).

Python’s ctypes module looked after loading the C shared library I built, along with converting the NumPy arrays. My PyStable module will not win any Python beauty contest, it is a means of supporting the comparison of multiple implementations.

Some progress was made towards creating a new implementation, more than 24 hours is obviously needed (libstable contains over 4,000 lines of code). I had my own problems with an exception being raised in calls to stable_pdf; libstable used the GNU Scientific Library and I tracked the problem down to a call into GSL, but did not get any further.

We all worked overnight, my first 24-hour hack in a very long time (I got about 4-hours sleep).

After Sunday lunch around 10 teams presented and after a quick deliberation, Team Stable laws were announced as the winners; yea!

Hopefully, over the coming weeks a usable implementation will come into being.

On Quaker’s Dozen – student

student from thus spake a.k.

The Baron's latest wager set Sir R----- the task of rolling a higher score with two dice than the Baron should with one twelve sided die, giving him a prize of the difference between them should he have done so. Sir R-----'s first roll of the dice would cost him two coins and twelve cents and he could elect to roll them again as many times as he desired for a further cost of one coin and twelve cents each time, after which the Baron would roll his.
The simplest way to reckon the fairness of this wager is to re-frame its terms; to wit, that Sir R----- should pay the Baron one coin to play and thereafter one coin and twelve cents for each roll of his dice, including the first. The consequence of this is that before each roll of the dice Sir R----- could have expected to receive the same bounty, provided that he wrote off any losses that he had made beforehand.

Emacs 26.1-RC1 on the Windows Subsystem for Linux

Timo Geusch from The Lone C++ Coder&#039;s Blog

As posted in a few places, Emacs 26.1-RC1 has been released. Following up my previous experiments with running Emacs on the Windows Subsystem for Linux, I naturally had to see how the latest version would work out. For that, I built the RC1 on an up-to-date Ubuntu WSL. I actually built it twice – once […]

The post Emacs 26.1-RC1 on the Windows Subsystem for Linux appeared first on The Lone C++ Coder's Blog.

Influential philosophers of source code

Derek Jones from The Shape of Code

Who is the most important/influential philosopher of source code? Source code, as far as I know, is not a subject that philosophers claim to be studying; but, the study of logic, language and the mind is the study of source code.

For many, Ludwig Wittgenstein would probably be the philosopher that springs to mind. Wittgenstein became famous as the world’s first Perl programmer, with statements such as: “If a lion could talk, we could not understand him.” and “Whereof one cannot speak, thereof one must be silent.”

Noam Chomsky, a linguist, might be another choice, based on his specification of the Chomsky hierarchy (which neatly categorizes grammars). But generative grammars (for which he is famous in linguistics) is about generating language, not understanding what has been said/written.

My choice for the most important/influential philosopher of source code is Paul Grice. A name, I suspect, that is new to most readers. The book to quote (and to read if you enjoy the kind of books philosophers write) is “Studies in the Way of Words”.

Grice’s maxims, provide a powerful model for human communication; the tldr:

  • Maxim of quality: Try to make your contribution one that is true.
  • Maxim of quantity: Make your contribution as informative as is required.
  • Maxim of relation: Be relevant.

But source code is about human/computer communication, you say. Yes, but so many developers seem to behave as-if they were involved in human/human communication.

Source code rarely expresses what the developer means; source code is evidence of what the developer means.

The source code chapter of my empirical software engineering book is Gricean, with a Relevance theory accent.

More easily digestible books on Grice’s work (for me at least) are: “Relevance: Communication and Cognition” by Sperber and Wilson, and the more recent “Meaning and Relevance” by Wilson and Sperber.

What Product Owners should not do

Allan Kelly from Allan Kelly Associates

Noproductowners-2018-04-18-11-27.jpg

Last time I set out some of the things a Product Owner should be doing – or at least considering doing. Even a quick look at that list will tell you the Product Owner is going to be a busy person.

So in this post I’d like to suggest some thinigs Product Owners should NOT be doing.

Product Owners Cutting code should NOT be cutting code

Having a former coder in the Product Owner role can be a great boom. Not only do they know how to talk with the technical team and (hopefully) can command their respect but they can also see how technology can apply.

But to be an effective Product Owner they need to step away from the keyboard and stop writing code.

Two reasons.

One: time.
Product Owners add value by ensuring that the code which is written addresses the most valuable opportunities with the smallest, most elegant, delightful way possible.

Every minute spent coding is a minute not doing that.

Second: Product Owners need to empathise with the customer, with the business users, they need to eat-sleep-and-breath customers.

Being a good coder – let alone someone called an architect – is to empathise with code, the system, the mechanics of how a system works.

Importantly both requirements and code need to be able to come together and discuss what they see and find a way to bring the two – sometimes opposing – views together. It is a lot easier to have that discussion if the two sides are represented by different people.

Asking one person to divide their brain in two and discuss opposing views with themselves is unlikely to bring about the best result and is probably a recipe for confusion and stress.

Thats not to say both sides shouldn’t appreciate the other. I said before, former coders have a great advantage in being a Product Owner. And I want the technical team to meet customers. But I want discussions to be between two (or more) people.

(I might allow an exception here for Minimally Viable Teams but once the team moves beyond the MVT stage the PO should stop coding.)

Product Owners should NOT be line managers

OK, senior Product Owners should might line manage junior product owners but they certainly should not be line managing anyone else. Most certainly they should not be line managing the technical team.

Product Owner authority comes not from a line on an organization chart, or the ability to award (or deny) a pay rise or bonus. Product Owner authority stems from their specialist knowledge of what customers want from a product and what the organization considers valuable.

If the Product Owner cannot demonstrate their specialist knowledge in this way then either they should learn fast or they should consider if they are in the right role.

Product Owners need to trust the technical team and the technical team need to trust the Product Owner. Authority complicates this relationship because one side is allowed to issue orders when trust is absent and the other side has to obey.

And again, Product Owner simply don’t have the time to line manage anyone.

Being a good line manager requires empathy with employees and time to spend observing and talking to employees, helping them develop themselves, helping them with problems and so on.

Product Owners should not Make Promises for Other People to keep

Specifically that means they should not issue “Roadmaps” which list features with delivery dates based on effort estimates. The whole issue of estimation is a minefield, very few teams are in a position to estimate accurately and most humans are atrocious at time estimation anyway. So any plans based on effort estimation are a fantasy anyway. But even putting that to one side…

Issuing such plans commits other people to keep promises. That is just unfair.

Product Owners can create and share scenario plans about how the product – and world – might unfold in the future.

Product Owners can co-create and share capacity plans which should how an organization intends (strategically) to allocate resources. And Product Owners can work with teams in executing against those capacity plans in order to deliver functionality the Product Owner thinks should be delivered by a date the Product Owner thinks is necessary.

In other words: provided a Product Owner is making the promise that they intend to keep themselves (i.e. they have skin in the game) then they might issue some kind of forward plan.

Product Owners should dump outbound marketing at the first opportunity

Outbound marketing, e.g. advertising, press releases, public relations and product evangelism, often ends up on the Product Owner plate – particularly when the Product Owner is a Product Manager. And in a small company (think early stage start-up) this just needs to be accepted.

However, in a larger organization, or a growing start-up, Product Owners should seek to pass this work to a dedicated Product Marketing specialist as soon as possible. Both roles deserve enough time to do the job properly.

The Marketing Specialist and Product Owner will work closely together – they are after all two sides of the same coin, the Marketing coin. The Marketing Specialist handles outbound marketing (telling people about the product) and the Product Owner handles inbound marketing (what do people want from the product?). (Again, in organizations with established Product Management this is usually easier to see.)

Product Owners should dump pre-sales at the first opportunity

As with outbound marketing Product Owners often get dragged in as pre-sales support to account managers. And again this is more common in small companies and early stage start-ups.

There are some advantages to playing second fiddle to a sales person. The Product Owner might get actual customer contact (sales people too often block Product people from meeting customers.) And Product Owners should be exposed to some of the commercial pressures that sales people – and customers – encounter.

But doing pre-sales is very time consuming. And being wheeled in to help deliver a sales will distort the Product Owner’s view of the market – just ‘cos this customer wants the product in Orange doesn’t mean other customers want Orange.

And again, pre-sales is more effectively done by specialist staff as soon as the company can afford them.

Want to receive these posts by e-mail? – join the newsletter today and receive a free eBook: Xanpan: Team Centric Agile Software Development

The post What Product Owners should not do appeared first on Allan Kelly Associates.

The C++ committee has taken off its ball and chain

Derek Jones from The Shape of Code

A step change in the approach to updates and additions to the C++ Standard occurred at the recent WG21 meeting, or rather a change that has been kind of going on for a few meetings has been documented and discussed. Two bullet points at the start of “C++ Stability, Velocity, and Deployment Plans [R2]”, grab reader’s attention:

● Is C++ a language of exciting new features?
● Is C++ a language known for great stability over a long period?

followed by the proposal (which was agreed at the meeting): “The Committee should be willing to consider the design / quality of proposals even if they may cause a change in behavior or failure to compile for existing code.”

We have had 30 years of C++/C compatibility (ok, there have been some nibbling around the edges over the last 15 years). A remarkable achievement, thanks to Bjarne Stroustrup over 30+ years and 64 full-week standards’ meetings (also, Tom Plum and Bill Plauger were engaged in shuttle diplomacy between WG14 and WG21).

The C/C++ superset/different issue has a long history.

In the late 1980s SC22 (the top-level ISO committee for programming languages) asked WG14 (the C committee) whether a standard should be created for C++, and if so did WG14 want to create it. WG14 considered the matter at its April 1989 meeting, and replied that in its view a standard for C++ was worth considering, but that the C committee were not the people to do it.

In 1990, SC22 started a study group to look into whether a working group for C++ should be created and in the U.S. X3 (the ANSI committee responsible for Information processing systems) set up X3J16. The showdown meeting of what would become WG21, was held in London, March 1992 (the only ISO C++ meeting I have attended).

The X3J16 people were in London for the ISO meeting, which was heated at times. The two public positions were: 1) work should start on a standard for C++, 2) C++ was not yet mature enough for work to start on a standard.

The, not so public, reason given for wanting to start work on a standard was to stop, or at least slow down, changes to the language. New releases, rumored and/or actual, of Cfront were frequent (in a pre-Internet time sense). Writing large applications in a version of C++ that was replaced with something sightly different six months later was has developers in large companies pulling their hair out.

You might have thought that compiler vendors would be happy for the language to be changing on a regular basis; changes provide an incentive for users to pay for compiler upgrades. In practice the changes were so significant that major rework was needed by somebody who knew what they were doing, i.e., expensive people had to be paid; vendors were more used to putting effort into marketing minor updates. It was claimed that implementing a C++ compiler required seven times the effort of implementing a C compiler. I have no idea how true this claim might have been (it might have been one vendor’s approximate experience). In the 1980s everybody and his dog had their own C compiler and most of those who had tried, had run into a brick wall trying to implement a C++ compiler.

The stop/slow down changing C++ vs. let C++ “fulfill its destiny” (a rallying call from the AT&T rep, which the whole room cheered) finally got voted on; the study group became a WG (I cannot tell you the numbers; the meeting minutes are not online and I cannot find a paper copy {we had those until the mid/late-90s}).

The creation of WG21 did not have the intended effect (slowing down changes to the language); Stroustrup joined the committee and C++ evolution continued apace. However, from the developers’ perspective language change did slow down; Cfront changes stopped because its code was collapsing under its own evolutionary weight and usable C++ compilers became available from other vendors (in the early days, Zortech C++ was a major boost to the spread of usage).

The last WG21 meeting had 140 people on the attendance list; they were not all bored consultants looking for a creative outlet (i.e., exciting new features), but I’m sure many would be happy to drop the ball-and-chain (otherwise known as C compatibility).

I think there will be lots of proposals that will break C compatibility in one way or another and some will it into a published standard. The claim will be that the changes will make life easier for future C++ developers (a claim made by proponents of every language, for which there is zero empirical evidence). The only way of finding out whether a change has long term benefit is to wait a long time and see what happens.

The interesting question is how C++ compiler vendors will react to breaking changes in the language standard. There are not many production compilers out there these days, i.e., not a lot of competition. What incentive does a compiler vendor have to release a version of their compiler that will likely break existing code? Compiler validation, against a standard, is now history.

If WG21 make too many breaking changes, they could find C++ vendors ignoring them and developers asking whether the ISO C++ standards’ committee is past its sell by date.