Modeling visual studio C++ compile times

Derek Jones from The Shape of Code

Last week I spotted an interesting article on the compile-time performance of C++ compilers running under Microsoft Windows. The author had obviously put a lot of work into gathering the data, and had taken care to have multiple runs to reduce the impact of random effects (128 runs to be exact); but, as if often the case, the analysis of the data was lackluster. I posted a comment asking for the data, and a link was posted the next day :-)

The compilers benchmarked were: Visual Studio 2015, Visual Studio 2017 and clang 7.0.1; the compilers were configured to target: C++20, C++17, C++14, C++11, C++03, or C++98. The source code used was 100 system headers.

If we are interested in understanding the contribution of each component to overall compile-time, the obvious fist regression model to build is:

compile_time = header_x+compiler_y+language_z

where: header_x are the different headers, compiler_y the different compilers and language_z the different target languages. There might be some interaction between variables, so something more complicated was tried first; the final fitted model was (code+data):

compile_time = k+header_x+compiler_y+language_z+compiler_y*language_z

where k is a constant (the Intercept in R’s summary output). The following is a list of normalised numbers to plug into the equation (clang is the default compiler and C++03 the default language, and so do not appear in the list, the : symbol represents the multiplication; only a few of the 100 headers are listed, details are available):

                             Estimate Std. Error  t value Pr(>|t|)    
               (Intercept)                  headerany 
               1.000000000                0.051100398 
               headerarray             headerassert.h 
               0.522336397               -0.654056185 
...
            headerwctype.h            headerwindows.h 
              -0.648095154                1.304270250 
              compilerVS15               compilerVS17 
              -0.185795534               -0.114590143 
             languagec++11              languagec++14 
               0.032930014                0.156363433 
             languagec++17              languagec++20 
               0.192301727                0.184274629 
             languagec++98 compilerVS15:languagec++11 
               0.001149643               -0.058735591 
compilerVS17:languagec++11 compilerVS15:languagec++14 
              -0.038582437               -0.183708714 
compilerVS17:languagec++14 compilerVS15:languagec++17 
              -0.164031495                         NA 
compilerVS17:languagec++17 compilerVS15:languagec++20 
              -0.181591418                         NA 
compilerVS17:languagec++20 compilerVS15:languagec++98 
              -0.193587045                0.062414667 
compilerVS17:languagec++98 
               0.014558295 

As an example, the (normalised) time to compile wchar.h using VS15 with languagec++11 is:
1-0.514807638-0.183862162+0.033951731-0.059720131

Each component adds/substracts to/from the normalised mean.

Building this model didn’t take long. While waiting for the kettle to boil, I suddenly realised that an additive model was probably inappropriate for this problem; oops. Surely the contribution of each component was multiplicative, i.e., components have a percentage impact to performance.

A quick change to the form of the fitted model:

log(compile_time) = k+header_x+compiler_y+language_z+compiler_y*language_z

Taking the exponential of both side, the fitted equation becomes:

compile_time = e^{k}e^{header_x}e^{compiler_y}e^{language_z}e^{compiler_y*language_z}

The numbers, after taking the exponent, are:

               (Intercept)                  headerany 
              9.724619e+08               1.051756e+00 
...
            headerwctype.h            headerwindows.h 
              3.138361e-01               2.288970e+00 
              compilerVS15               compilerVS17 
              7.286951e-01               7.772886e-01 
             languagec++11              languagec++14 
              1.011743e+00               1.049049e+00 
             languagec++17              languagec++20 
              1.067557e+00               1.056677e+00 
             languagec++98 compilerVS15:languagec++11 
              1.003249e+00               9.735327e-01 
compilerVS17:languagec++11 compilerVS15:languagec++14 
              9.880285e-01               9.351416e-01 
compilerVS17:languagec++14 compilerVS15:languagec++17 
              9.501834e-01                         NA 
compilerVS17:languagec++17 compilerVS15:languagec++20 
              9.480678e-01                         NA 
compilerVS17:languagec++20 compilerVS15:languagec++98 
              9.402461e-01               1.058305e+00 
compilerVS17:languagec++98 
              1.001267e+00 

Taking the same example as above: wchar.h using VS15 with c++11. The compile-time (in cpu clock cycles) is:
9.724619e+08*3.138361e-01*7.286951e-01*1.011743e+00*9.735327e-01

Now each component causes a percentage change in the (mean) base value.

Both of these model explain over 90% of the variance in the data, but this is hardly surprising given they include so much detail.

In reality compile-time is driven by some combination of additive and multiplicative factors. Building a combined additive and multiplicative model is going to be like wrestling an octopus, and is left as an exercise for the reader :-)

Given a choice between these two models, I think the multiplicative model is probably closest to reality.

Guaranteed Copy Elision Does Not Elide Copies

Simon Brand from Simon Brand

This post is also available at the Microsoft Visual C++ Team Blog

C++17 merged in a paper called Guaranteed copy elision through simplified value categories. The changes mandate that no copies or moves take place in some situations where they were previously allowed, e.g.:

struct non_moveable {
    non_moveable() = default;
    non_moveable(non_moveable&&) = delete;
};
non_moveable make() { return {}; }
non_moveable x = make(); //compiles in C++17, error in C++11/14

You can see this behavior in compiler versions Visual Studio 2017 15.6, Clang 4, GCC 7, and above.

Despite the name of the paper and what you might read on the Internet, the new rules do not guarantee copy elision. Instead, the new value category rules are defined such that no copy exists in the first place. Understanding this nuance gives a deeper understanding of the current C++ object model, so I will explain the pre-C++17 rules, what changes were made, and how they solve real-world problems.

Value Categories

To understand the before-and-after, we first need to understand what value categories are (I’ll explain copy elision in the next section). Continuing the theme of C++ misnomers, value categories are not categories of values. They are characteristics of expressions. Every expression in C++ has one of three value categories: lvalue, prvalue (pure rvalue), or xvalue (eXpring value). There are then two parent categories: all lvalues and xvalues are glvalues, and all prvalues and xvalues are rvalues.

diagram expressing the taxonomy described above

For an explanation of what these are, we can look at the standard (C++17 [basic.lval]/1):

  • A glvalue ([generalised lvalue]) is an expression whose evaluation determines the identity of an object, bit-field, or function.
  • A prvalue is an expression whose evaluation initializes an object or a bit-field, or computes the value of an operand of an operator, as specified by the context in which it appears.
  • An xvalue is a glvalue that denotes an object or bit-field whose resources can be reused (usually because it is near the end of its lifetime).
  • An lvalue is a glvalue that is not an xvalue.
  • An rvalue is a prvalue or an xvalue.

Some examples:

std::string s; 
s //lvalue: identity of an object 
s + " cake" //prvalue: could perform initialization/compute a value 

std::string f(); 
std::string& g(); 
std::string&& h(); 

f() //prvalue: could perform initialization/compute a value 
g() //lvalue: identity of an object 
h() //xvalue: denotes an object whose resources can be reused 

struct foo { 
    std::string s; 
}; 

foo{}.s //xvalue: denotes an object whose resources can be reused

C++11

What are the properties of the expression std::string{"a pony"}?

It’s a prvalue. Its type is std::string. It has the value "a pony". It names a temporary.

That last one is the key point I want to talk about, and it’s the real difference between the C++11 rules and C++17. In C++11, std::string{"a pony"} does indeed name a temporary. From C++11 [class.temporary]/1:

Temporaries of class type are created in various contexts: binding a reference to a prvalue, returning a prvalue, a conversion that creates a prvalue, throwing an exception, entering a handler, and in some initializations.

Let’s look at how this interacts with this code:

struct copyable {
    copyable() = default;
    copyable(copyable const&) { /*...*/ }
};
copyable make() { return {}; }
copyable x = make();

make() results in a temporary. This temporary will be moved into x. Since copyable has no move constructor, this calls the copy constructor. However, this copy is unnecessary since the object constructed on the way out of make will never be used for anything else. The standard allows this copy to be elided by constructing the return value at the call-site rather than in make (C++11 [class.copy]). This is called copy elision.

The unfortunate part is this: even if all copies of the type are elided, the constructor still must exist.

This means that if we instead have:

struct non_moveable {
    non_moveable() = default;
    non_moveable(non_moveable&&) = delete;
};
non_moveable make() { return {}; }
auto x = make();

then we get a compiler error:

<source>(7): error C2280: 'non_moveable::non_moveable(non_moveable &&)': attempting to reference a deleted function
<source>(3): note: see declaration of 'non_moveable::non_moveable'
<source>(3): note: 'non_moveable::non_moveable(non_moveable &&)': function was explicitly deleted

Aside from returning non-moveable types by value, this presents other issues:

auto x = non_moveable{}; //compiler error
  • The language makes no guarantees that the constructors won’t be called (in practice this isn’t too much of a worry, but guarantees are more convincing than optional optimizations).
  • If we want to support some of these use-cases, we need to write copy/move constructors for types which they don’t make sense for (and do what? Throw? Abort? Linker error?)
  • You can’t pass non-moveable types to functions by value, in case you have some use-case which that would help with.

What’s the solution? Should the standard just say “oh, if you elide all copies, you don’t need those constructors”? Maybe, but then all this language about constructing temporaries is really a lie and building an intuition about the object model becomes even harder.

C++17

C++17 takes a different approach. Instead of guaranteeing that copies will be elided in these cases, it changes the rules such that the copies were never there in the first place. This is achieved through redefining when temporaries are created.

As noted in the value category descriptions earlier, prvalues exist for purposes of initialization. C++11 creates temporaries eagerly, eventually using them in an initialization and cleaning up copies after the fact. In C++17, the materialization of temporaries is deferred until the initialization is performed.

That’s a better name for this feature. Not guaranteed copy elision. Deferred temporary materialization.

Temporary materialization creates a temporary object from a prvalue, resulting in an xvalue. The most common places it occurs are when binding a reference to or performing member access on a prvalue. If a reference is bound to the prvalue, the materialized temporary’s lifetime is extended to that of the reference (this is unchanged from C++11, but worth repeating). If a prvalue initializes a class type of the same type as the prvalue, then the destination object is initialized directly; no temporary required.

Some examples:

struct foo {
    int i;
};

foo make();
auto& a = make();  //temporary materialized and lifetime-extended
auto&& b = make(); //ditto

foo{}.i //temporary materialized

auto c = make(); //no temporary materialized

That covers the most important points of the new rules. Now on to why this is actually useful past terminology bikeshedding and trivia to impress your friends.

Who cares?

I said at the start that understanding the new rules would grant a deeper understanding of the C++17 object model. I’d like to expand on that a bit.

The key point is that in C++11, prvalues are not “pure” in a sense. That is, the expression std::string{"a pony"} names some temporary std::string object with the contents "a pony". It’s not the pure notion of the list of characters “a pony”. It’s not the Platonic ideal of “a pony”.

In C++17, however, std::string{"a pony"} is the Platonic ideal of “a pony”. It’s not a real object in C++’s object model, it’s some elusive, amorphous idea which can be passed around your program, only being given form when initializing some result object, or materializing a temporary. C++17’s prvalues are purer prvalues.

If this all sounds a bit abstract, that’s okay, but internalising this idea will make it easier to reason about aspects of your program. Consider a simple example:

struct foo {};
auto x = foo{};

In the C++11 model, the prvalue foo{} creates a temporary which is used to move-construct x, but the move is likely elided by the compiler.

In the C++17 model, the prvalue foo{} initializes x.

A more complex example:

std::string a() {
    return "a pony";
}

std::string b() {
    return a();
}

int main() {
    auto x = b();
}

In the C++11 model, return "a pony"; initializes the temporary return object of a(), which move-constructs the temporary return object of b(), which move-constructs x. All the moves are likely elided by the compiler.

In the C++17 model, return "a pony"; initializes the result object of a(), which is the result object of b(), which is x.

In essence, rather than an initializer creating a series of temporaries which in theory move-construct a chain of return objects, the initializer is teleported to the eventual result object. In C++17, the code:

T a() { return /* expression */ ; }
auto x = a();

is identical to auto x = /* expression */;. For any T.

Closing

The “guaranteed copy elision” rules do not guarantee copy elision; instead they purify prvalues such that the copy doesn’t exist in the first place. Next time you hear or read about “guaranteed copy elision”, think instead about deferred temporary materialization.

The first commercially available (claimed) verified compiler

Derek Jones from The Shape of Code

Yesterday, I read a paper containing a new claim by some of those involved with CompCert (yes, they of soap powder advertising fame): “CompCert is the first commercially available optimizing compiler that is formally verified, using machine assisted mathematical proofs, to be exempt from miscompilation”.

First commercially available; really? Surely there are earlier claims of verified compilers being commercial availability. Note, I’m saying claims; bits of the CompCert compiler have involved mathematical proofs (i.e., code generation), so I’m considering earlier claims having at least the level of intellectual honesty used in some CompCert papers (a very low bar).

What does commercially available mean? The CompCert system is open source, so I guess it’s commercially available via free downloading (the paper does not define the term).

Computational Logic, Inc is the name that springs to mind, when thinking of commercial and formal verification. They were active from 1983 to 1997, and published some very interesting technical reports about their work (sadly there are gaps in the archive). One project was A Mechanically Verified Code Generator (in 1989) and their Gypsy system (a Pascal-like language+IDE) provided an environment for doing proofs of programs (I cannot find any reports online). Piton was a high-level assembler and there was a mechanically verified implementation (in 1988).

There is the Danish work on the formal specification of the code generators for their Ada compiler (while there was a formal specification of the Ada semantics in VDM, code generators tend to be much simpler beasts, i.e., a lot less work is needed in formal verification). The paper I have is: “Retargeting and rehosting the DDC Ada compiler system: A case study – the Honeywell DPS 6″ by Clemmensen, from 1986 (cannot find an online copy). This Ada compiler was used by various hardware manufacturers, so it was definitely commercially available for (lots of) money.

Are then there any earlier verified compilers with a commercial connection? There is A PRACTICAL FORMAL SEMANTIC DEFINITION AND VERIFICATION SYSTEM FOR TYPED LISP, from 1976, which has “… has proved a number of interesting, non-trivial theorems including the total correctness of an algorithm which sorts by successive merging, the total correctness of the McCarthy-Painter compiler for expressions, …” (which sounds like a code generator, or part of one, to me).

Francis Morris’s thesis, from 1972, proves the correctness of compilers for three languages (each language contained a single feature) and discusses how these features may be combined into a more “realistic” language. No mention of commercial availability, but I cannot see the demand being that great.

The definition of PL/1 was written in VDM, a formal language. PL/1 is a huge language and there were lots of subsets. Were there any claims of formal verification of a subset compiler for PL/1? I have had little contact with the PL/1 world, so am not in a good position to know. Anybody?

Over to you dear reader. Are there any earlier claims of verified compilers and commercial availability?

Adding a new scalar type to C

Derek Jones from The Shape of Code

I think the time has arrived for a new scalar type in C, which for want of a better name I shall call the compendium type.

On today’s processors a compendium type behaves a lot like an integer type, except that nobody really wants to include it in the list of supported integer types, e.g., 128-bit scalars.

Why is a new scalar type needed? The Standard supports extended integer types, why not treat a scalar object that supports integer arithmetic as an integer type?

The C Standard says (section 6.2.5 Types):
“There are five standard signed integer types, designated as signed char, short int, int, long int, and long long int. (These and other types may be designated in several additional ways, as described in 6.7.2.) There may also be implementation-defined extended signed integer types.38) The standard and extended signed integer types are collectively called signed integer types.39)”

There is corresponding wording for unsigned integer types.

The standard header allows implementations to define a whole menagerie of integer types: section 7.20.1.1 Exact-width integer types
“The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.”

This all sounds very feasible, but there is a catch. The Standard defines a greatest-width integer type, section 7.20.1.5 Greatest-width integer types
“The following type designates a signed integer type capable of representing any value of any signed integer type:
intmax_t

and various library functions have an argument type intmax_t (there is also an uintmax_t).

An ‘extra-large’ integer type is not something that can just sit there, in the list of available integer types, waiting to be used. Preprocessor arithmetic and a variety of library are based around the type intmax_t. An extra-large integer type would have a very visible impact on all developers, many of whom would want to ignore it.

GCC supports 128-bit integers, e.g., __int128. But some magic pixie dust is involved, this type has no connection with intmax_t.

What do developers do with these 128- and 256-bit scalar objects? Evaluating graphics algorithms, hashes and cryptographic calculations are obvious candidates; yes, perhaps even calculations involving integers that require this many bits. I have not seen any analysis of the uses of this kind of wide-integer-like type.

Extra-wide scalar types have a variety of uses and the term compendium type, captures this. Hardware support for such extra-width types is growing, with vendors looking to fill major niches.

Contorting existing wording, in the Standard, so accommodate these extra-wide types within the existing integer type machinery is a short term solution. Work on the upcoming revision of the C Standard should either do nothing and allow vendors to take the approach currently used by GCC, or create a new scalar type (perhaps using a TR).