Guaranteed Copy Elision Does Not Elide Copies

Simon Brand from Simon Brand

This post is also available at the Microsoft Visual C++ Team Blog

C++17 merged in a paper called Guaranteed copy elision through simplified value categories. The changes mandate that no copies or moves take place in some situations where they were previously allowed, e.g.:

struct non_moveable {
    non_moveable() = default;
    non_moveable(non_moveable&&) = delete;
};
non_moveable make() { return {}; }
non_moveable x = make(); //compiles in C++17, error in C++11/14

You can see this behavior in compiler versions Visual Studio 2017 15.6, Clang 4, GCC 7, and above.

Despite the name of the paper and what you might read on the Internet, the new rules do not guarantee copy elision. Instead, the new value category rules are defined such that no copy exists in the first place. Understanding this nuance gives a deeper understanding of the current C++ object model, so I will explain the pre-C++17 rules, what changes were made, and how they solve real-world problems.

Value Categories

To understand the before-and-after, we first need to understand what value categories are (I’ll explain copy elision in the next section). Continuing the theme of C++ misnomers, value categories are not categories of values. They are characteristics of expressions. Every expression in C++ has one of three value categories: lvalue, prvalue (pure rvalue), or xvalue (eXpring value). There are then two parent categories: all lvalues and xvalues are glvalues, and all prvalues and xvalues are rvalues.

diagram expressing the taxonomy described above

For an explanation of what these are, we can look at the standard (C++17 [basic.lval]/1):

  • A glvalue ([generalised lvalue]) is an expression whose evaluation determines the identity of an object, bit-field, or function.
  • A prvalue is an expression whose evaluation initializes an object or a bit-field, or computes the value of an operand of an operator, as specified by the context in which it appears.
  • An xvalue is a glvalue that denotes an object or bit-field whose resources can be reused (usually because it is near the end of its lifetime).
  • An lvalue is a glvalue that is not an xvalue.
  • An rvalue is a prvalue or an xvalue.

Some examples:

std::string s; 
s //lvalue: identity of an object 
s + " cake" //prvalue: could perform initialization/compute a value 

std::string f(); 
std::string& g(); 
std::string&& h(); 

f() //prvalue: could perform initialization/compute a value 
g() //lvalue: identity of an object 
h() //xvalue: denotes an object whose resources can be reused 

struct foo { 
    std::string s; 
}; 

foo{}.s //xvalue: denotes an object whose resources can be reused

C++11

What are the properties of the expression std::string{"a pony"}?

It’s a prvalue. Its type is std::string. It has the value "a pony". It names a temporary.

That last one is the key point I want to talk about, and it’s the real difference between the C++11 rules and C++17. In C++11, std::string{"a pony"} does indeed name a temporary. From C++11 [class.temporary]/1:

Temporaries of class type are created in various contexts: binding a reference to a prvalue, returning a prvalue, a conversion that creates a prvalue, throwing an exception, entering a handler, and in some initializations.

Let’s look at how this interacts with this code:

struct copyable {
    copyable() = default;
    copyable(copyable const&) { /*...*/ }
};
copyable make() { return {}; }
copyable x = make();

make() results in a temporary. This temporary will be moved into x. Since copyable has no move constructor, this calls the copy constructor. However, this copy is unnecessary since the object constructed on the way out of make will never be used for anything else. The standard allows this copy to be elided by constructing the return value at the call-site rather than in make (C++11 [class.copy]). This is called copy elision.

The unfortunate part is this: even if all copies of the type are elided, the constructor still must exist.

This means that if we instead have:

struct non_moveable {
    non_moveable() = default;
    non_moveable(non_moveable&&) = delete;
};
non_moveable make() { return {}; }
auto x = make();

then we get a compiler error:

<source>(7): error C2280: 'non_moveable::non_moveable(non_moveable &&)': attempting to reference a deleted function
<source>(3): note: see declaration of 'non_moveable::non_moveable'
<source>(3): note: 'non_moveable::non_moveable(non_moveable &&)': function was explicitly deleted

Aside from returning non-moveable types by value, this presents other issues:

auto x = non_moveable{}; //compiler error
  • The language makes no guarantees that the constructors won’t be called (in practice this isn’t too much of a worry, but guarantees are more convincing than optional optimizations).
  • If we want to support some of these use-cases, we need to write copy/move constructors for types which they don’t make sense for (and do what? Throw? Abort? Linker error?)
  • You can’t pass non-moveable types to functions by value, in case you have some use-case which that would help with.

What’s the solution? Should the standard just say “oh, if you elide all copies, you don’t need those constructors”? Maybe, but then all this language about constructing temporaries is really a lie and building an intuition about the object model becomes even harder.

C++17

C++17 takes a different approach. Instead of guaranteeing that copies will be elided in these cases, it changes the rules such that the copies were never there in the first place. This is achieved through redefining when temporaries are created.

As noted in the value category descriptions earlier, prvalues exist for purposes of initialization. C++11 creates temporaries eagerly, eventually using them in an initialization and cleaning up copies after the fact. In C++17, the materialization of temporaries is deferred until the initialization is performed.

That’s a better name for this feature. Not guaranteed copy elision. Deferred temporary materialization.

Temporary materialization creates a temporary object from a prvalue, resulting in an xvalue. The most common places it occurs are when binding a reference to or performing member access on a prvalue. If a reference is bound to the prvalue, the materialized temporary’s lifetime is extended to that of the reference (this is unchanged from C++11, but worth repeating). If a prvalue initializes a class type of the same type as the prvalue, then the destination object is initialized directly; no temporary required.

Some examples:

struct foo {
    int i;
};

foo make();
auto& a = make();  //temporary materialized and lifetime-extended
auto&& b = make(); //ditto

foo{}.i //temporary materialized

auto c = make(); //no temporary materialized

That covers the most important points of the new rules. Now on to why this is actually useful past terminology bikeshedding and trivia to impress your friends.

Who cares?

I said at the start that understanding the new rules would grant a deeper understanding of the C++17 object model. I’d like to expand on that a bit.

The key point is that in C++11, prvalues are not “pure” in a sense. That is, the expression std::string{"a pony"} names some temporary std::string object with the contents "a pony". It’s not the pure notion of the list of characters “a pony”. It’s not the Platonic ideal of “a pony”.

In C++17, however, std::string{"a pony"} is the Platonic ideal of “a pony”. It’s not a real object in C++’s object model, it’s some elusive, amorphous idea which can be passed around your program, only being given form when initializing some result object, or materializing a temporary. C++17’s prvalues are purer prvalues.

If this all sounds a bit abstract, that’s okay, but internalising this idea will make it easier to reason about aspects of your program. Consider a simple example:

struct foo {};
auto x = foo{};

In the C++11 model, the prvalue foo{} creates a temporary which is used to move-construct x, but the move is likely elided by the compiler.

In the C++17 model, the prvalue foo{} initializes x.

A more complex example:

std::string a() {
    return "a pony";
}

std::string b() {
    return a();
}

int main() {
    auto x = b();
}

In the C++11 model, return "a pony"; initializes the temporary return object of a(), which move-constructs the temporary return object of b(), which move-constructs x. All the moves are likely elided by the compiler.

In the C++17 model, return "a pony"; initializes the result object of a(), which is the result object of b(), which is x.

In essence, rather than an initializer creating a series of temporaries which in theory move-construct a chain of return objects, the initializer is teleported to the eventual result object. In C++17, the code:

T a() { return /* expression */ ; }
auto x = a();

is identical to auto x = /* expression */;. For any T.

Closing

The “guaranteed copy elision” rules do not guarantee copy elision; instead they purify prvalues such that the copy doesn’t exist in the first place. Next time you hear or read about “guaranteed copy elision”, think instead about deferred temporary materialization.

std::accumulate vs. std::reduce

Simon Brand from Simon Brand

std::accumulate has been a part of the standard library since C++98. It provides a way to fold a binary operation (such as addition) over an iterator range, resulting in a single value. std::reduce was added in C++17 and looks remarkably similar. This post will explain the difference between the two and when to use one or the other.

Let’s start by looking at their interfaces, beginning with std::accumulate.

template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );

template< class InputIt, class T, class BinaryOperation >
T accumulate( InputIt first, InputIt last, T init,
              BinaryOperation op );

std::accumulate takes an iterator range and an initial value for the accumulation. You can optionally give it a binary operation to do the reduction, which will default to addition. It will call this operation on the initial value and the first element of the range, then on the result and the second element of the range, etc. Here are two equivalent calls:

auto sum1 = std::accumulate(begin(vec), end(vec), 0);
auto sum2 = std::accumulate(begin(vec), end(vec), 0, std::plus<>{});

std::reduce has a fair few more overloads to get your head round, but has a very similar interface once you understand them:

template<class InputIt>
typename std::iterator_traits<InputIt>::value_type reduce(
    InputIt first, InputIt last);

template<class ExecutionPolicy, class ForwardIt>
typename std::iterator_traits<ForwardIt>::value_type reduce(
    ExecutionPolicy&& policy,
    ForwardIt first, ForwardIt last);

template<class InputIt, class T>
T reduce(InputIt first, InputIt last, T init);

template<class ExecutionPolicy, class ForwardIt, class T>
T reduce(ExecutionPolicy&& policy,
         ForwardIt first, ForwardIt last, T init);

template<class InputIt, class T, class BinaryOp>
T reduce(InputIt first, InputIt last, T init, BinaryOp binary_op);

template<class ExecutionPolicy, class ForwardIt, class T, class BinaryOp>
T reduce(ExecutionPolicy&& policy,
         ForwardIt first, ForwardIt last, T init, BinaryOp binary_op);

The differences here are:

I’ll talk about these points in turn.

Execution policies

Execution policies are a C++17 feature which allows programmers to ask for algorithms to be parallelised. There are three execution policies in C++17:

  • std::execution::seq – do not parallelise
  • std::execution::par – parallelise
  • std::execution::par_unseq – parallelise and vectorise (requires that the operation can be interleaved, so no acquiring mutexes and such)

The idea behind execution policies is that you can change a serial algorithm to a parallel algorithm simply by passing an additional argument to the function:

auto sum1 = std::reduce(begin(vec), end(vec));                      //sequential
auto sum2 = std::reduce(std::execution::seq, begin(vec), end(vec)); //sequential
auto sum3 = std::reduce(std::execution::par, begin(vec), end(vec)); //parallel

Allowing parallelisation is the main reason for the addition of std::reduce. Let’s look at an example where we want to sum up all the elements in an array. With std::accumulate it looks like this:

accumulate plus

Note that each step of the computation relies on the previous computation, i.e. this algorithm will execute serially and we make no use of hardware parallel processing capabilities. If we use std::reduce with the std::execution::par policy then it could look like this:

reduce plus

This is a trivial amount of data for processing in parallel, but the benefit gained when the data size is scaled up should be clear: some of the operations can be executed independently of others, so they can be done in parallel.

A common question is: why do we need an entirely new algorithm for this? Why can’t we just overload std::accumulate? For an example of why we can’t do this, let’s use std::minus<> instead of std::plus<> as our reduction operation. With std::accumulate we get this:

accumulate minus

However, if we try to use std::reduce, we could get something like:

reduce minus

Uh oh. We just broke our code.

We got the wrong answer because of the mathematical properties of subtraction. You can’t arbitrarily reorder the operands, or compute the operations out of order when doing subtraction. This is formalised in the properties of commutativity and associativity.

A binary operation ∗ on a set S is associative if the following equation holds for all x, y, and z in S:

(x ∗ y) ∗ z = x ∗ (y ∗ z)

An operation is commutative if:

x ∗ y = y ∗ x

Unfortunately there’s no way for the compiler to reliably check that some operation is associative and commutative, so we’re stuck with having std::reduce as a separate algorithm. In concepts speak, this is called an axiom: a requirement imposed on semantics which cannot generally be statically verified.

Input vs. Forward Iterators

I couldn’t find any reasoning in the original standards papers for the separate overloads for input and forward iterators. However, my educated guess is that this is to allow library implementors to chunk up the data in order to distribute it to a thread pool. Indeed, this appears to be what MSVC’s implementation does.

The idea is that input iterators are single-pass, whereas forward iterators can be iterated through multiple times. An algorithm couldn’t chunk up data indexed by input iterators because by the time it had gone through the range to work out the sub-range boundaries, the data from the iterator could have already been consumed.

Optional Initial Element

std::reduce lets you not bother passing an initial element, in which case it will default-construct one using typename std::iterator_traits<InputIt>::value_type{}. I think this was a mistake. A default-constructed value is not always the identity element (such as in multiplication), and it’s very easy to for a programmer to miss out the initial element. The code will still compile, it will just give the wrong answer. I suspect that this choice will result in some hard-to-find bugs when this interface comes into heavier use.

Finishing Up

That covers the differences between std::reduce and std::accumulate. My three point guide to std::reduce is:

  • Use std::reduce when you want your accumulation to run in parallel
  • Ensure that the operation you want to use is both associative and commutative
  • Remember that the default initial value is produced by default construction, and that this may not be correct for your operation

C++17 – Why it’s better than you might think

Phil Nash from level of indirection

C++20 Horizon

From Mark Isaacson's Meeting C++ talk, "Exploring C++ and beyond"

I was recently interviewed for CppCast and one the news items that came up was a trip report from a recent C++ standards meeting (Issaquah, Nov 2016). This was one of the final meetings before the C++17 standard is wrapped up, so things are looking pretty set at this point. During the discussion I made the point that, despite initially being disappointed that so many headline features were not making it in (Concepts, Modules, Coroutines and Ranges - as well as dot operator and uniform call syntax), I'm actually very happy with how C++17 is shaping up. There are some very nice refinements and features (const expr if is looking quite big on its own) - and including a few surprise ones (structured bindings being the main one for me).

But the part of what I said that surprised even me (because I hadn't really thought of it until a couple of hours before we recorded) was that perhaps it is for the best that we don't get those bigger features just yet! The thinking was that if you take them all together - or even just two or three of them - they have the potential to change the language - and the way we write "modern C++" perhaps even more so than C++11 did - and that's really saying something! Now that's a good thing, in my opinion, but I do wonder if it would be too soon for such large scale changes just yet.

After the 98 standard C++ went into a thirteen year period in the wilderness (there was C++03, which fixed a couple of problems with the 98 standard - but didn't actually add any new features - except value initialisation). As this period coincided with the rise of other mainstream languages - Java and C# in particular - it seemed that C++ was a dying language - destined for a drawn out, Cobolesque, old-age at best.

But C++11 changed all that and injected a vitality and enthusiasm into the community not seen since the late 90s - if ever! Again the timing was a factor - with Moore's Law no longer influencing single-core performance there was a resurgence of interest in low/ zero overhead systems languages - and C++11 was getting modern enough to be palatable again. "There's no such thing as a free lunch" turns out to be true if you wait long enough.

So the seismic changes in C++11 were overdue, welcome and much needed at that time. Since then the standardisation process has moved to the "train model", which has settled on a new standard every three years. Whatever is ready (and fits) makes it in. If it's not baked it's dropped - or is moved into a TS that can be given more real-world testing before being reconsidered. This has allowed momentum to be maintained and reassures us that we won't be stuck without an update to the standard for too long again.

On the other hand many code-bases are still catching up to C++11. There are not many breaking changes - and you can introduce newer features incrementally and to only parts of the code-base - but this can lead to some odd looking code and once you start converting things you tend to want to go all in. Even if that's not true for your own codebase it may be true of libraries and frameworks you depend on! Those features we wanted in C++17 could have a similar - maybe even greater - effect and my feeling is that, while they would certainly be welcomed by many (me included) - there would also be many more that might start to see the churn on the language as a sign of instability. "What? We've only just moved on to C++11 and you want us to adopt these features too?". Sometimes it can be nice to just know where you are with a language - especially after a large set of changes. 2011 might seem like a long time ago but there's a long lag in compiler conformance, then compiler adoption, then understanding and usage of newer features. Those just starting to experiment with C++11 language features are still very common.

I could be wrong about this, but it feels like there's something in it based on my experience. And I think the long gap between C++98 and C++11 is responsible for at least amplifying the effect. People got used to C++ being defined a single way and now we have three standards already in use, with another one almost ready. It's a lot to keep up with - even for those of us that enjoy that sort of thing!

So I'm really looking forward to those bigger features that we'll hopefully get in C++20 (and don't forget you can even use the TS's now if your compiler supports them - and the Ranges library is available on GitHub) - but I'm also looking forward to updating the language with C++17 and the community gaining a little more experience with the new, rapidly evolving, model of C++ before the next big push.