East End Functions

Phil Nash from level of indirection

There has been a recent stirring of attention, in the C++ community, for the practice of always placing the const modifier to the right of the thing it modifies. The practice has even been gifted a catchy name: East Const (which, I think, is what has stirred up the interest).

As purely a matter of style it's fascinating that it seems to have split the community so strongly! There are cases for and against, but both sides seem to revolve around the idea of "consistency". For the East Const believers the consistency is in the sense that you can always apply one, simple, rule about what const means and where it goes. For the West Consters the consistency is with the majority of existing code out there - as well as the Core Guidelines recommendation!

Personally I've been an East Const advocate for many years (although not by that name, of course) - and converted the entire Catch codebase over to East Const quite early on.

But there's another style choice that I've not seen discussed quite as much, but has a number of parallels.

As with East vs West Const this is purely a matter of style (it doesn't change what the compiler generates), and one of the arguments in favour is consistency across application (there are some cases where you must do it this way) - but the main argument against is also consistency - with most existing code. Sound familiar? But what is it?

The issue is about where to specify return types on function signatures. For most of C++'s history the only choice has been to write the type before the name of the function (along with any modifiers). But since C++11 we've been able to write the type at the end of the function signature (after a -> - and the function must be prefixed with the keyword auto).

auto someFunc( int i ) -> std::string;
// instead of
std::string someFunc( int i );

So why would you prefer this style? Well, first there's that consistency argument. It's the only way to specify return types for lambdas. You're also required to use trailing return types if the type is a decltype that is dependent on the name of one of the function's arguments. Indeed, that's the motivating case for adding the syntax in the first place. e.g.:

template <typename Lhs, typename Rhs>
auto add( Lhs const& lhs, Rhs const& rhs ) -> decltype( lhs + rhs ) {
    return lhs + rhs;
}

A Foolish Consistency?

Given those cases where it is required, using the same syntax in all other cases would seem to be more consistent.

I'm not sure the consistency argument is as strong here as it is with East Const - there was never much confusion over what the return applied to, after all. But I think it's worth keeping in mind.

The next reason for is consistency with other languages. Many languages, especially functional programming languages, exclusively use the trailing syntax for return types. Quite a few, e.g. Swift, use the same -> syntax.

It's not a strong reason on its own, but combined with the internal consistency argument I think there's something there.

However, for me at least, the most compelling rationale is for readability. Why do I think it's more readable? There are actually two parts to this:

  1. Function declarations tend to line up. Certain qualifiers might spoil this effect, although one approach might be to group similarly qualified functions (e.g. all virtuals) together. This makes glancing through the list of function names much easier.

  2. The name of the function is usually the most important thing when you're browsing the code. If you're more interested in the return type it's usually because you already know which function you're interested in. So making the name the first thing you read (after the auto introducer) seems fitting.

auto doesItBlend() -> bool;
auto whatsYourFavouriteNumber() -> int;
auto add( double a, double b ) -> double;
void setTheControls();

(note that many who prefer this form, including myself, tend to still put void first)

For me the arguments for are compelling. The arguments against really boil down to the same argument against East Const - inconsistency with older code. As Jon Kalb deliberated on in A Foolish Consistency, this sort of thinking can hold us back.

I've been favouring this style for more than a couple of years now. In fact I tracked down a post to the ACCU mailing list (linked here, but I believe you have to be a subscriber to read it) where I talked about it - and made all the same points I'm making here. My opinion since then has not changed much. Other than feeling more confident that it's The Right Thing.

So I think it's time we gave it a catchy name. Unlike East Const it already has a name, "trailing return types". It's not especially galvanising, though. Given the parallels to East vs West Const - and the fact that it, also, relates to the thing in question being placed to the left or the right, I propose East End Functions (vs West End Functions).

What about the redundant auto keyword?

Think of auto, here, as the "function introducer". In other languages it might be spelt fun or func. If it makes you feel better you could always:

#define func auto

... actually don't. The point is, in languages that introduce a function with func, then have a trailing return type, nobody gives it a second thought. auto is the same number of characters as func. It's a shame it's not quite as expressive - but that's the price of legacy. It shouldn't mean we "can't have nice things".

East End Functions

Phil Nash from level of indirection

There has been a recent stirring of attention, in the C++ community, for the practice of always placing the const modifier to the right of the thing it modifies. The practice has even been gifted a catchy name: East Const (which, I think, is what has stirred up the interest).

As purely a matter of style it's fascinating that it seems to have split the community so strongly! There are cases for and against, but both sides seem to revolve around the idea of "consistency". For the East Const believers the consistency is in the sense that you can always apply one, simple, rule about what const means and where it goes. For the West Consters the consistency is with the majority of existing code out there - as well as the Core Guidelines recommendation!

Personally I've been an East Const advocate for many years (although not by that name, of course) - and converted the entire Catch codebase over to East Const quite early on.

But there's another style choice that I've not seen discussed quite as much, but has a number of parallels.

As with East vs West Const this is purely a matter of style (it doesn't change what the compiler generates), and one of the arguments in favour is consistency across application (there are some cases where you must do it this way) - but the main argument against is also consistency - with most existing code. Sound familiar? But what is it?

The issue is about where to specify return types on function signatures. For most of C++'s history the only choice has been to write the type before the name of the function (along with any modifiers). But since C++11 we've been able to write the type at the end of the function signature (after a -> - and the function must be prefixed with the keyword auto).

auto someFunc( int i ) -> std::string;
// instead of
std::string someFunc( int i );

So why would you prefer this style? Well, first there's that consistency argument. It's the only way to specify return types for lambdas. You're also required to use trailing return types if the type is a decltype that is dependent on the name of one of the function's arguments. Indeed, that's the motivating case for adding the syntax in the first place. e.g.:

template <typename Lhs, typename Rhs>
auto add( Lhs const& lhs, Rhs const& rhs ) -> decltype( lhs + rhs ) {
    return lhs + rhs;
}

A Foolish Consistency?

Given those cases where it is required, using the same syntax in all other cases would seem to be more consistent.

I'm not sure the consistency argument is as strong here as it is with East Const - there was never much confusion over what the return applied to, after all. But I think it's worth keeping in mind.

The next reason for is consistency with other languages. Many languages, especially functional programming languages, exclusively use the trailing syntax for return types. Quite a few, e.g. Swift, use the same -> syntax.

It's not a strong reason on its own, but combined with the internal consistency argument I think there's something there.

However, for me at least, the most compelling rationale is for readability. Why do I think it's more readable? There are actually two parts to this:

  1. Function declarations tend to line up. Certain qualifiers might spoil this effect, although one approach might be to group similarly qualified functions (e.g. all virtuals) together. This makes glancing through the list of function names much easier.

  2. The name of the function is usually the most important thing when you're browsing the code. If you're more interested in the return type it's usually because you already know which function you're interested in. So making the name the first thing you read (after the auto introducer) seems fitting.

auto doesItBlend() -> bool;
auto whatsYourFavouriteNumber() -> int;
auto add( double a, double b ) -> double;
void setTheControls();

(note that many who prefer this form, including myself, tend to still put void first)

For me the arguments for are compelling. The arguments against really boil down to the same argument against East Const - inconsistency with older code. As Jon Kalb deliberated on in A Foolish Consistency, this sort of thinking can hold us back.

I've been favouring this style for more than a couple of years now. In fact I tracked down a post to the ACCU mailing list (linked here, but I believe you have to be a subscriber to read it) where I talked about it - and made all the same points I'm making here. My opinion since then has not changed much. Other than feeling more confident that it's The Right Thing.

So I think it's time we gave it a catchy name. Unlike East Const it already has a name, "trailing return types". It's not especially galvanising, though. Given the parallels to East vs West Const - and the fact that it, also, relates to the thing in question being placed to the left or the right, I propose East End Functions (vs West End Functions).

What about the redundant auto keyword?

Think of auto, here, as the "function introducer". In other languages it might be spelt fun or func. If it makes you feel better you could always:

#define func auto

... actually don't. The point is, in languages that introduce a function with func, then have a trailing return type, nobody gives it a second thought. auto is the same number of characters as func. It's a shame it's not quite as expressive - but that's the price of legacy. It shouldn't mean we "can't have nice things".

GDPR has a huge impact on empirical software engineering research

Derek Jones from The Shape of Code

The EU’s General Data Protection Regulation (GDPR) is going to have a huge impact on empirical software engineering research. After 25 May 2018, analyzing source code will never be the same again.

I am not a lawyer and nothing qualifies me to talk about the GDPR.

People put their name in source code, bug tracking databases and discussion forums; this is personal identifying information.

Researchers use personal names to obtain information about a wide variety of activities, e.g., how much code did individuals write, how many bug reports did they process, contributions in discussions of one sort or another.

Open source licenses give others all kinds of rights (e.g., ability to use and modify source code), but they do not contain any provisions for processing personal data.

Adding a “I hereby give permission for anybody to process information about my name in any way they see fit.” clause to licenses is not going to help.

The GDPR requires (article 5: Principles relating to processing of personal data):

“Personal data shall be: … collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes;”

That is, personal data can only be processed for the specific reason it was collected, i.e., if you come up with another bright idea for analysis of data that has just been collected, it may be necessary to obtain consent, from those whose personal data it is, before trying out the bright idea.

It is not possible to obtain blanket permission (article 6, Lawfulness of processing):

“…the data subject has given consent to the processing of his or her personal data for one or more specific purposes;”, i.e., consent has to be obtained from the data subject for each specific purpose.

Github’s Global Privacy Practices shows that Github are intent on meeting the GDPR requirements, they include: “GitHub provides clear methods of unambiguous, informed consent at the time of data collection, when we do collect your personal data.”. Processing personal information, about an EU citizen, contained in source code appears to be a violation of Github’s terms of service.

The GDPR has many other requirements, e.g., right to obtain information on what information is held and right to be forgotten. But, the upfront killer is not being able to cheaply collect lots of code and then use personal information to help with the analysis.

There are exceptions for: Processing for archiving, scientific or historical research or statistical purposes. Can somebody who blogs and is writing a book claim to be doing scientific research? People who know more about these exceptions than me, tell me that there could be a fair amount of paperwork involved when making use of the exception, i.e., being able to show that privacy safeguards are in place.

Then, there is the issue of what constitutes personal information. Git’s hashing algorithm makes use of the committer’s name and/or email address. Is a git hash personal identifying information?

A good introduction to the GDPR for developers.

Can you get a deadlock with a single lock and an IO operation?

Timo Geusch from The Lone C++ Coder&#039;s Blog

Quite a while ago, I answered a question about the basic deadlock scenario on Stack Overflow. More recently, I got an interesting comment on it. The poster asked if it was possible to get a deadlock with a single lock and an I/O operation. My first gut reaction was “no, not really”, but it got […]

The post Can you get a deadlock with a single lock and an IO operation? appeared first on The Lone C++ Coder's Blog.

Can you get a deadlock with a single lock and an IO operation?

The Lone C++ Coder's Blog from The Lone C++ Coder&#039;s Blog

Quite a while ago, I answered a question about the basic deadlock scenario on Stack Overflow. More recently, I got an interesting comment on it. The poster asked if it was possible to get a deadlock with a single lock and an I/O operation. My first gut reaction was “no, not really”, but it got me thinking. So let’s try to unroll the scenario and see if we can reason at least about my gut feeling.

Reliability chapter added to “Empirical software engineering using R”

Derek Jones from The Shape of Code

The Reliability chapter of my Empirical software engineering book has been added to the draft pdf (download here).

I have been working on this draft for four months and it still needs lots of work; time to move on and let it stew for a while. Part of the problem is lack of public data; cost and schedule overruns can be rather public (projects chapter), but reliability problems are easier to keep quiet.

Originally there was a chapter covering reliability and another one covering faults. As time passed, these merged into one. The material kept evaporating in front of my eyes (around a third of the initial draft, collected over the years, was deleted); I have already written about why most fault prediction research is a waste of time. If it had not been for Rome I would not have had much to write about.

Perhaps what will jump out at people most, is that I distinguish between mistakes in code and what I call a fault experience. A fault_experience=mistake_in_code + particular_input. Most fault researchers have been completely ignoring half of what goes into every fault experience, the input profile (if the user does not notice a fault, I do not consider it experienced) . It’s incredibly difficult to figure out anything about the input profile, so it has been quietly ignored (one of the reasons why research papers on reported faults are such a waste of time).

I’m also missing an ‘interesting’ figure on the opening page of the chapter. Suggestions welcome.

I have not said much about source code characteristics. There is a chapter covering source code, perhaps some of this material will migrate to reliability.

All sorts of interesting bits and pieces have been added to earlier chapters. Ecosystems keeps growing and in years to come somebody will write a multi-volume tomb on software ecosystems.

I have been promised all sorts of data. Hopefully some of it will arrive.

As always, if you know of any interesting software engineering data, please tell me.

Source code chapter next.

No more pointers

Simon Brand from Simon Brand

One of the major changes at the most recent C++ standards meeting in Jacksonville was the decision to deprecate raw pointers in C++20, moving to remove them completely in C++23. This came as a surprise to many, with a lot of discussion as to how we’ll get by without this fundamental utility available any more. In this post I’ll look at how we can replace some of the main use-cases of raw pointers in C++20.

Three of the main reasons people use raw pointers are:

  • Dynamic allocation & runtime polymorphism
  • Nullable references
  • Avoiding copies

I’ll deal with these points in turn, but first, an answer to the main question people ask about this change.

The elephant in the room

What about legacy code? Don’t worry, the committee have come up with a way to both move forward the language boldly forward without breaking all the millions of lines of C++ which people have written over the years: opt-in extensions.

If you want to opt-in to C++20’s no-pointers feature, you use #feature.

#feature <no_pointers> //opt-in to no pointers
#feature <cpp20>       //opt-in to all C++20 features

This is a really cool new direction for the language. Hopefully with this we can slowly remove features like std::initializer_list so that new code isn’t bogged down with legacy as much as it is today.

Dynamic allocation & runtime polymorphism

I’m sure most of you already know the answer to this one: smart pointers. If you need to dynamically allocate some resource, that resource’s lifetime should be managed by a smart pointer, such as std::unique_ptr or std::shared_ptr. These types are now special compiler-built-in types rather than normal standard library types. In fact, std::is_fundamental<std::unique_ptr<int>>::value now evaluates to true!

Nullable references

Since references cannot be rebound and cannot be null, pointers are often used to fulfil this purpose. However, with C++20, we have a new type to fulfil this purpose: std::optional<T&>. std::optional was first introduced in C++17, but was plagued with no support for references and no monadic interface. C++20 has fixed both of these, so now we have a much more usable std::optional type which can fill the gap that raw pointers have left behind.

Avoiding copies

Some people like to use raw pointers to avoid copies at interface boundaries, such as returning some resource from a function. Fortunately, we have much better options, such as (Named) Return Value Optimization. C++17 made some forms of copy elision mandatory, which gives us even more guarantees for the performance of our code.

Wrapping up

Of course there are more use-cases for raw pointers, but this covers three of the most common ones. Personally, I think this is a great direction to see the language going in, and I look forward to seeing other ways we can slowly make C++ into a simpler, better language.