strong_typedef – Create distinct types for distinct purposes

Anthony Williams from Just Software Solutions Blog

One common problem in C++ code is the use of simple types for many things: a std::string might be a filename, a person's name, a SQL query string or a piece of JSON; an int could be a count, an index, an ID number, or even a file handle. In his 1999 book "Refactoring" (which has a second edition as of January 2019), Martin Fowler called this phenomenon "Primitive Obsession", and recommended that we use dedicated classes for each purpose rather than built-in or library types.

The difficulty with doing so is that built-in types and library types have predefined sets of operations that can be done with them from simple operations like incrementing/decrementing and comparing, to more complex ones such as replacing substrings. Creating a new class each time means that we have to write implementations for all these functions every time. This duplication of effort raises the barrier to doing this, and means that we often decide that it isn't worthwhile.

However, by sticking to the built-in and library types, we can end up in a scenario where a function takes multiple parameters of the same type, with distinct meanings, and no clear reason for any specific ordering. In such a scenario, it is easy to get the parameters in the wrong order and not notice until something breaks. By wrapping the primitive type in a unique type for each usage we can eliminate this class of problem.

My strong_typedef class template aims to make this easier. It wraps an existing type, and associates it with a tag type to define the purpose, and which can therefore be used to make it unique. Crucially, it then allows you to specify which sets of operations you want to enable: it might not make sense to add ID numbers, but it might make perfect sense to add counters, even if both are represented by integers. You might therefore using jss::strong_typedef<struct IdTag,unsigned,jss::strong_typedef_properties::equality_comparable> for an ID number, but jss::strong_typedef<struct IndexTag,unsigned,jss::strong_typedef_properties::comparable | jss::strong_typedef_properties::incrementable | jss::strong_typedef_properties::decrementable> for an index type.

I've implemented something similar to this class for various clients over the years, so I decided it was about time to make it publicly available. The implementation on github condenses all of the solutions to this problem that I've written over the years to provide a generic implementation.

Basic Usage

jss::strong_typedef takes three template parameters: Tag, ValueType and Properties.

The first (Tag) is a tag type. This is not used for anything other than to make the type unique, and can be incomplete. Most commonly, this is a class or struct declared directly in the template parameter, and nowhere else, as in the examples struct IdTag and struct IndexTag above.

The second (ValueType) is the underlying type of the strong typedef. This is the basic type that you would otherwise be using.

The third (Properties) is an optional parameter that specifies the operations you wish the strong typedef to support. By default it is jss::strong_typedef_properties::none — no operations are supported. See below for a full list.

Declaring Types

You create a typedef by specifying these parameters:

using type1=jss::strong_typedef<struct type1_tag,int>;
using type2=jss::strong_typedef<struct type2_tag,int>;
using type3=jss::strong_typedef<struct type3_tag,std::string,
    jss::strong_typedef_properties::comparable>;

type1, type2 and type3 are now separate types. They cannot be implicitly converted to or from each other or anything else.

Creating Values

If the underlying type is default-constructible, then so is the new type. You can also construct the objects from an object of the wrapped type:

type1 t1;
type2 t2(42);
// type2 e2(t1); // error, type1 cannot be converted to type2

Accessing the Value

strong_typedef can wrap built-in or class type, but that's only useful if you can access the value. There are two ways to access the value:

  • Cast to the stored type: static_cast<unsigned>(my_channel_index)
  • Use the underlying_value member function: my_channel_index.underlying_value()

Using the underlying_value member function returns a reference to the stored value, which can thus be used to modify non-const values, or to call member functions on the stored value without taking a copy. This makes it particularly useful for class types such as std::string.

using transaction_id=jss::strong_typedef<struct transaction_tag,std::string>;

bool is_a_foo(transaction_id id){
    auto& s=id.underlying_value();
    return s.find("foo")!=s.end();
}

Other Operations

Depending on the properties you've assigned to your type you may be able to do other operations on that type, such as compare a == b or a < b, increment with ++a, or add two values with a + b. You might also be able to hash the values with std::hash<my_typedef>, or write them to a std::ostream with os << a. Only the behaviours enabled by the Properties template parameter will be available on any given type. For anything else, you need to extract the wrapped value and use that.

Examples

IDs

An ID of some description might essentially be a number, but it makes no sense to perform much in the way of operations on it. You probably want to be able to compare IDs, possibly with an ordering so you can use them as keys in a std::map, or with hashing so you can use them as keys in std::unordered_map, and maybe you want to be able to write them to a stream. Such an ID type might be declared as follows:

using widget_id=jss::strong_typedef<struct widget_id_tag,unsigned long long,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::hashable |
    jss::strong_typedef_properties::streamable>;

using froob_id=jss::strong_typedef<struct froob_id_tag,unsigned long long,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::hashable |
    jss::strong_typedef_properties::streamable>;

Note that froob_id and widget_id are now different types due to the different tags used, even though they are both based on unsigned long long. Therefore any attempt to use a widget_id as a froob_id or vice-versa will lead to a compiler error. It also means you can overload on them:

void do_stuff(widget_id my_widget);
void do_stuff(froob_id my_froob);

widget_id some_widget(421982);
do_stuff(some_widget);

Alternatively, an ID might be a string, such as a purchase order number of transaction ID:

using transaction_id=jss::strong_typedef<struct transaction_id_tag,std::string,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::hashable |
    jss::strong_typedef_properties::streamable>;

transaction_id some_transaction("GBA283-HT9X");

That works too, since strong_typedef can wrap any built-in or class type.

Indexes

Suppose you have a device that supports a number of channels, so you want to be able to retrieve the data for a given channel. Each channel yields a number of data items, so you also want to access the data items by index. You could use strong_typedef to wrap the channel index and the data item index, so they can't be confused. You can also make the index types incrementable and decrementable so they can be used in a for loop:

using channel_index=jss::strong_typedef<struct channel_index_tag,unsigned,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::incrementable |
    jss::strong_typedef_properties::decrementable>;

using data_index=jss::strong_typedef<struct data_index_tag,unsigned,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::incrementable |
    jss::strong_typedef_properties::decrementable>;

Data get_data_item(channel_index channel,data_index item);
data_index get_num_items(channel_index channel);
void process_data(Data data);

void foo(){
    channel_index const num_channels(99);
    for(channel_index channel(0);channel<num_channels;++channel){
        data_index const num_data_items(get_num_items(channel));
        for(data_index item(0);item<num_data_items;++item){
            process_data(get_data_item(channel,item));
        }
    }
}

The compiler will complain if you pass the wrong parameters, or compare the channel against the item.

Behaviour Properties

The Properties parameter specifies behavioural properties for the new type. It must be one of the values of jss::strong_typedef_properties, or a value obtained by or-ing them together (e.g. jss::strong_typedef_properties::hashable | jss::strong_typedef_properties::streamable | jss::strong_typedef_properties::comparable). Each property adds some behaviour. The available properties are:

  • equality_comparable => Can be compared for equality (st==st2) and inequality (st!=st2)
  • pre_incrementable => Supports preincrement (++st)
  • post_incrementable => Supports postincrement (st++)
  • pre_decrementable => Supports predecrement (--st)
  • post_decrementable => Supports postdecrement (st--)
  • addable => Supports addition (st+value, value+st, st+st2) where the result is convertible to the underlying type. The result is a new instance of the strong typedef.
  • subtractable => Supports subtraction (st-value, value-st, st-st2) where the result is convertible to the underlying type. The result is a new instance of the strong typedef.
  • ordered => Supports ordering comparisons (st<st2, st>st2, st<=st2, st>=st2)
  • mixed_ordered => Supports ordering comparisons where only one of the values is a strong typedef
  • hashable => Supports hashing with std::hash
  • streamable => Can be written to a std::ostream with operator<<
  • incrementable => pre_incrementable | post_incrementable
  • decrementable => pre_decrementable | post_decrementable
  • comparable => ordered | equality_comparable

Guideline and Implementation

I strongly recommend using strong_typedef or an equivalent implementation anywhere you would otherwise reach for a built-in or library type such as int or std::string when designing an interface.

My strong_typedef implementation is available on github under the Boost Software License.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Evidence on the distribution and diversity of Christianity: 1900-2000

Derek Jones from The Shape of Code

I recently read an article saying that Christianity had 33,830 denominations, with 150 having more than 1 million followers. Checking the references, World Christian Encyclopedia was cited as the source; David Barrett had spent 12 years traveling the world, talking to people to collect the data. An evidence-based man, after my own heart.

Checking the second-hand book sites, I found a copy of the 1982 edition available for a few pounds, and placed an order (this edition lists 20,800 denominations; how many more are there to be ‘discovered’).

The book that arrived was a bit larger than I had anticipated. This photograph shows just how large this book is, compared to other dead-tree data sources in my collection:

World Christian Encyclopedia.

My interest in a data-driven discussion of the spread and diversity of religions, was driven by wanting ideas for measuring the spread and diversity of programming languages. Bill Kinnersley’s language list contains information on 2,500 programming languages, and there are probably an order of magnitude more languages waiting to be written about.

The data is available to researchers, but is not public :-(

The World Christian Encyclopedia is way too detailed for my needs. I usually leave unwanted books on the book table of my local train station’s Coffee shop. I have left some unusual books there in the past, but this one feels like it needs a careful owner; I will see whether the local charity shop will take it in.

Identity over roles and responsibilities

Allan Kelly from Allan Kelly Associates

iStock-899732676-2019-05-23-15-36.jpg

When was the last time you read your “roles and responsibilities” description?

My guess is, it was about the time you applied for your current position. Of course, if your HR department, or agile coaches, have decided to change your description you might have read the new document but even then, did you?

My guess is most people remember little more then the job title.

Like so many documents, it goes in one eye and out the other. And the longer it is, the more detail the less you are likely to remember.

So it won’t surprise you when I say: I don’t think roles and responsibilities documents have much use.

And it might not surprise you when I say roles are pretty pointless too.

To my mind your personal sense of identity, your own idea of who you are and what you do, plays a much bigger role in the actions you take in work and the responsibilities you accept – and those you ignore.

If, for example, your business card says “Business Analyst” it is not because someone defines your work as a “Business Analyst” it is because you see yourself as a business analysts and your sought out a business analyst role. You are a business analyst because you see yourself as a business analyst. You are not a programmer because you didn’t apply for a programmers job because you are not a programmer.

Let me suggest that what you actually do has little to do with what it says in some document, rather what you do is determined by your sense of identity, your identity may well be entwined with a job title. Identity is a powerful motivator.

If you consider yourself to be a programmer, a software engineer, software developer or whatever, then you probably shun business cards altogether but the same idea applies: you are a programmer not because you read a job description and say “I like the sound of that”. You are a programmer because you like doing the things you think a programmer does.

Of course this can cause problems because different people see things differently. Things fall between the gaps – your manager expects you to update the user manual and you see that as someone else job. Or things get difficult because two people try to do the same thing: the programmers try to design the software and so too do the architects.

Things get more complicated when people – us consultants! – start trying to change things. Maybe we say “programmers should test the code they write” but programmers say “testing is a Testers job.” Its even more difficult when you try and remove responsibility for someone: “programmers are no longer responsible for updating the system documentation.”

This is why I despair when people tell me problems can be fixed by updating the “RACI matrix” – don’t ask me what R A C I stands for, it seems to be a tool where responsibilities are listed against roles. RACI doesn’t change identity.

But you know what? – I like the way the world is. I do not want a world were people do what is on their job description and don’t do what is not.

People who are doing what they think needs doing, and who are doing it with the aim of producing a better outcome are motivated.

People who are doing something because it is in a job description, or because they’ve been told it is now their responsibility are not so motivated. Job description documents are pretty useless.

Whether you agree with me or think that people “should” do what is in some document: simply stop expecting them to do what it says. Instead look at what they do.

Go further, if something is missing ask “Who would like to do this?”. Work with people who are motivated by asking them what they want to do.

Few, very few, people are truly disruptive and working against you – unless of course your final outcome is something evil. Work with them.

Delete the roles and responsibilities documents.


Like this post? – Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

Check out my latest books – Continuous Digital and Project Myopia – and now Project Myopia audio edition

The post Identity over roles and responsibilities appeared first on Allan Kelly Associates.

The Hydra Of Argos – baron m.

baron m. from thus spake a.k.

Ho there Sir R-----! Will you join me for a cold tankard of ale to refresh yourself on this warm spring evening?

And, might I hope, for a little sport?

I should not have doubted it for a moment sir!

This fine weather reminds me of the time I spent as the Empress's trade envoy to the market city of Argos, famed almost as much for the remarkable, if somewhat fragile, mechanical contraptions made by its artificers and the most reasonably priced jewellery sold by its goldsmiths as for its fashion for tiny writing implements.

Background checks on pointer values being considered for C

Derek Jones from The Shape of Code

DR 260 is a defect report submitted to WG14, the C Standards’ committee, in 2001 that was never resolved, then generally ignored for 10-years, then caught the attention of a research group a few years ago, and is now back on WG14’s agenda. The following discussion covers two of the three questions raised in the DR.

Consider the following fragment of code:

int *p, *q;

    p = malloc (sizeof (int)); assert (p != NULL);  // Line A
    (free)(p);                                      // Line B
    // more code
    q = malloc (sizeof (int)); assert (q != NULL);  // Line C
    if (memcmp (&p, &q, sizeof p) == 0)             // Line D
       {*p = 42;                                    // Line E
        *q = 43;}                                   // Line F

Section 6.2.4p2 of the C Standard says:
“The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.”

The call to free, on line B, ends the lifetime of the storage (allocated on line A) pointed to by p.

There are two proposed interpretations of the sentence, in 6.2.4p2.

  1. “becomes indeterminate” is treated as effectively storing a value in the pointer, i.e., some bit pattern denoting an indeterminate value. This interpretation requires that any other variables that had been assigned p‘s value, prior to the free, also have an indeterminate value stored into them,
  2. the value held in the pointer is to be treated as an indeterminate value (for instance, a memory management unit may prevent any access to the corresponding storage).

What are the practical implications of the two options?

The call to malloc, on line C, could return a pointer to a location that is identical to the pointer returned by the first call to malloc, i.e., the second call might immediately reuse the free‘ed storage.

Effectively storing a value in the pointer, in response to the call to free means the subsequent call to memcmp would always return a non-zero value, and the questions raised below do not apply; it would be a nightmare to implement, especially in a multi-process environment.

If the sentence in section 6.2.4p2 is interpreted as treating the pointer value as indeterminate, then the definition of malloc needs to be updated to specify that all returned values are determinate, i.e., any indeterminacy that may exist gets removed before a value is returned (the memory management unit must allow read/write access to the storage).

The memcmp, on line D, does a byte-wise compare of the pointer values (a byte-wise compare side-steps indeterminate value issues). If the comparison is exact, an assignment is made via p, line E, and via q, line F.

Does the assignment via p result in undefined behavior, or is the conformance status of the code unaffected by its presence?

Nobody is impuning the conformance status of the assignment via q, on line F.

There are people who think that the assignment via p, on line E, should be treated as undefined behavior, despite the fact that the values of p and q are byte-wise identical. When this issue was first raised (by those trouble makers in the UK ;-), yours truly was less than enthusiastic, but there were enough knowledgeable people in the opposing camp to keep the ball rolling for a while.

The underlying issue some people have with some subsequent uses of p is its provenance, the activities it has previously been associated with.

Provenance can be included in the analysis process by associating a unique number with the address of every object, at the start of its lifetime; these p-numbers are not reused.

The value returned by the call to malloc, on line A, would include a pointer to the allocated storage, plus an associated p-number; the call on line C could return a pointer having the same value, but its p-number is required to be different. Implementations are not required to allocate any storage for p-numbers, treating them purely as conceptual quantities. Your author knows of two implementations that do allocate storage for p-numbers (in a private area), and track usage of p-numbers; the Model Implementation C Checker was validated as handling all of C90, and Cerberus which handles a substantial subset of C11, and I don’t believe that the other tools that check array bounds and use after free are based on provenance (corrections welcome).

If provenance is included as part of a pointer’s value, the behavior of operators needs to be expanded to handle the p-number (conceptual or not) component of a pointer.

The rules might specify that p-numbers are conceptually compared by the call to memcmp, on line C; hence p and q are considered to never compare equal. There is an existing practice of regarding byte compares as just that, i.e., no magic ever occurs when comparing bytes (otherwise known as objects having type unsigned char).

Having p-numbers be invisible to memcmp would be consistent with existing practice. The pointer indirection operation on line E (generating undefined behavior) is where p-numbers get involved and cause the undefined behavior to occur.

There are other situations where pointer values, that were once indeterminate, can appear to become ‘respectable’.

For a variable, defined in a function, “… its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way.”; section 6.2.4p3.

In the following code:

int x;
static int *p=&x;

void f(int n)
{
   int *q = &n;
   if (memcmp (&p, &q, sizeof p) == 0)
      *p = 0;
   p = &n; // assign an address that will soon cease to exist.
} // Lifetime of pointed to object, n, terminates here

int main(void)
{
   f(1); // after this call, p has an indeterminate value
   f(2);
}

the pointer p has an indeterminate value after any call to f returns.

In many implementations, the second call to f will result in n having the same address it had on the first call, and memcmp will return zero.

Again, there are people who have an issue with the assignment involving p, because of its provenance.

One proposal to include provenance contains substantial changes to existing word in the C Standard. The rationale for is proposals looks more like a desire to change wording to make things clearer for those making the change, than a desire to address DR 260. Everybody thinks their proposed changes make the wording clearer (including yours truly), such claims are just marketing puff (and self-delusion); confirmation from the results of an A/B test would add substance to such claims.

It is probably possible to explicitly include support for provenance by making a small number of changes to existing wording.

Is the cost of supporting provenance (i.e., changing existing wording may introduce defects into the standard, the greater the amount of change the greater the likelihood of introducing defects), worth the benefits?

What are the benefits of introducing provenance?

Provenance makes it possible to easily specify that the uses of p, in the two previous examples (and a third given in DR 260), are undefined behavior (if that is WG14’s final decision).

Provenance also provides a model that might make it easier to reason about programs; it’s difficult to say one way or the other, without knowing what the model is.

Supporters claim that provenance would enable tool vendors to flag various snippets of code as suspicious. Tool vendors can already do this, they don’t need permission from the C Standard to flag anything they fancy.

The C Standard requires a conforming implementation to diagnose certain constructs. A conforming implementation can issue as many messages as it likes, for any other construct, e.g., for line A in the first example, a compiler might print “This is the 1,000,000’th call to malloc I have translated, ring this number to claim your prize!

Before any changes are made to wording in the C Standard, WG14 needs to decide what the behavior should be for these examples; it could decide to continue ignoring them for another 20-years.

Once a decision is made, the next question is how to update wording in the standard to specify the behavior that has been decided on.

While provenance is an interesting idea, the benefits it provides appear to be not worth the cost of changing the C Standard.

A prisoner’s dilemma when agreeing to a management schedule

Derek Jones from The Shape of Code

Two software developers, both looking for promotion/pay-rise by gaining favorable management reviews, are regularly given projects to complete by a date specified by management; the project schedules are sometimes unachievable, with probability p.

Let’s assume that both developers are simultaneously given a project, and the corresponding schedule. If the specified schedule is unachievable, High quality work can be only be performed by asking for more time, otherwise performing Low quality work is the only way of meeting the schedule.

If either developer faces an unachievable deadline, they have to immediately decide whether to produce High or Low quality work. A High quality decision requires that they ask management for more time, and incur a penalty they perceive to be C (saying they cannot meet the specified schedule makes them feel less worthy of a promotion/pay-rise); a Low quality decision is perceived to be likely to incur a penalty of Q_1 (because of its possible downstream impact on project completion), if one developer chooses Low, and Q_2, if both developers choose Low. It is assumed that: Q_1 < Q_2 < C.

This is a prisoner’s dilemma problem. The following mathematical results are taken from: “The Effects of Time Pressure on Quality in Software Development: An Agency Model”, by Robert D. Austin (cannot find a downloadable pdf).

There are two Nash equilibriums, for the decision made by the two developers: Low-Low and High-High (i.e., both perform Low quality work, or both perform High quality work). Low-High is not a stable equilibrium, in that on the next iteration the two developers may switch their decisions.

High-High is a pure strategy (i.e., always use it), when: 1-{Q_1}/C <= p

High-High is Pareto superior to Low-Low when: 1-{Q_2}/{C-Q_1+Q_2} < p < 1-{Q_1}/C

How might management use this analysis to increase the likelihood that a High-High quality decision is made?

Evidence shows that 50% of developer estimates, of task effort, underestimate the actual effort; there is sufficient uncertainty in software development that the likelihood of consistently produce accurate estimates is low (i.e., p is a very fuzzy quantity). Managers wanting to increase the likelihood of a High-High decision could be generous when setting deadlines (e.g., multiple developer estimates by 200%, when setting the deadline for delivery), but managers are often under pressure from customers, to specify aggressively short deadlines.

The penalty for a developer admitting that they cannot deliver by the specified schedule, C, could be set very low, by management. But this might encourage developers to always give this response. If all developers cooperated to always give this response, none of them would lose relative to the others; but there is an incentive for the more capable developers to defect, and the less capable developers to want to use this strategy.

Regular code reviews are a possible technique for motivating High-High, by increasing the likelihood of any lone Low decision being detected. A Low-Low decision may go unreported by those involved.

To summarise: an interesting analysis that appears to have no practical use, because reasonable estimates of the values of the variables involved are unavailable.

Installing leiningen on Manjaro Linux

Timo Geusch from The Lone C++ Coder&#039;s Blog

I like Lispy languages. One I’ve been playing with – and occasionally been using for smaller projects – is Clojure. Clojure projects usually use Leiningen for their build system. There are generally two ways to install leiningen – just download the script as per the Leiningen web site, or use the OS package manager. I […]

The post Installing leiningen on Manjaro Linux appeared first on The Lone C++ Coder's Blog.

C Standard meeting, April-May 2019

Derek Jones from The Shape of Code

I was at the ISO C language committee meeting, WG14, in London this week (apart from the few hours on Friday morning, which was scheduled to be only slightly longer than my commute to the meeting would have been).

It has been three years since the committee last met in London (the meeting was planned for Germany, but there was a hosting issue, and Germany are hosting next year), and around 20 people attended, plus 2-5 people dialing in. Some regular attendees were not in the room because of schedule conflicts; nine of those present were in London three years ago, and I had met three of those present (this week) at WG14 meetings prior to the last London meeting. I had thought that Fred Tydeman was the longest serving member in the room, but talking to Fred I found out that I was involved a few years earlier than him (our convenor is also a long-time member); Fred has attended more meeting than me, since I stopped being a regular attender 10 years ago. Tom Plum, who dialed in, has been a member from the beginning, and Larry Jones, who dialed in, predates me. There are still original committee members active on the WG14 mailing list.

Having so many relatively new meeting attendees is a good thing, in that they are likely to be keen and willing to do things; it’s also a bad thing for exactly the same reason (i.e., if it not really broken, don’t fix it).

The bulk of committee time was spent discussing the proposals contains in papers that have been submitted (listed in the agenda). The C Standard is currently being revised, WG14 are working to produce C2X. If a person wants the next version of the C Standard to support particular functionality, then they have to submit a paper specifying the desired functionality; for any proposal to have any chance of success, the interested parties need to turn up at multiple meetings, and argue for it.

There were three common patterns in the proposals discussed (none of these patterns are unique to the London meeting):

  • change existing wording, based on the idea that the change will stop compilers generating code that the person making the proposal considers to be undesirable behavior. Some proposals fitting this pattern were for niche uses, with alternative solutions available. If developers don’t have the funding needed to influence the behavior of open source compilers, submitting a proposal to WG14 offers a low cost route. Unless the proposal is a compelling use case, affecting lots of developers, WG14’s incentive is to not adopt the proposal (accepting too many proposals will only encourage trolls),
  • change/add wording to be compatible with C++. There are cost advantages, for vendors who have to support C and C++ products, to having the two language be as mutually consistent as possible. Embedded systems are a major market for C, but this market is not nearly as large for C++ (because of the much larger overhead required to support C++). I pointed out that WG14 needs to be careful about alienating a significant user base, by slavishly following C++; the C language needs to maintain a separate identity, for long term survival,
  • add a new function to the C library, based on its existence in another standard. Why add new functions to the C library? In the case of math functions, it’s to increase the likelihood that the implementation will be correct (maths functions often have dark corners that are difficult to get right), and for string functions it’s the hope that compilers will do magic to turn a function call directly into inline code. The alternative argument is not to add any new functions, because the common cases are already covered, and everything else is niche usage.

At the 2016 London meeting Peter Sewell gave a presentation on the Cerberus group’s work on a formal definition of C; this work has resulted in various papers questioning the interpretation of wording in the standard, i.e., possible ambiguities or inconsistencies. At this meeting the submitted papers focused on pointer provenance, and I was expecting to hear about the fancy optimizations this work would enable (which would be a major selling point of any proposal). No such luck, the aim of the work was stated as clearly specifying the behavior (a worthwhile aim), with no major new optimizations being claimed (formal methods researchers often oversell their claims, Peter is at the opposite end of the spectrum and could do with an injection of some positive advertising). Clarifying behavior is a worthwhile aim, but not at the cost of major changes to existing wording. I have had plenty of experience of asking WG14 for clarification of existing (what I thought to be ambiguous) wording, only to be told that the existing wording was clear and not ambiguous (to those reviewing my proposed defect). I wonder how many of the wording ambiguities that the Cerberus group claim to have found would be accepted by WG14 as a defect that required a wording change?

Winner of the best pub quiz question: Does the C Standard require an implementation to be able to exactly represent floating-point zero? No, but it is now required in C2X. Do any existing conforming implementations not support an exact representation for floating-point zero? There are processors that use a logarithmic representation for floating-point, but I don’t know if any conforming implementation exists for such systems; all implementations I know of support an exact representation for floating-point zero. Logarithmic representation could handle zero using a special bit pattern, with cpu instructions doing the right thing when operating on this bit pattern, e.g., 0.0+X == X, (I wonder how much code would break, if the compiler mapped the literal 0.0 to the representable value nearest to zero).

Winner of the best good intentions corrupted by the real world: intmax_t, an integer type capable of representing any value of any signed integer type (i.e., a largest representable integer type). The concept of a unique largest has issues in a world that embraces diversity.

Today’s C development environment is very different from 25 years ago, let alone 40 years ago. The number of compilers in active use has decreased by almost two orders of magnitude, the number of commonly encountered distinct processors has shrunk, the number of very distinct operating systems has shrunk. While it is not a monoculture, things appear to be heading in that direction.

The relevance of WG14 decreases, as the number of independent C compilers, in widespread use, decreases.

What is the purpose of a C Standard in today’s world? If it were not already a standard, I don’t think a committee would be set up to standardize the language today.

Is the role of WG14 now, the arbiter of useful common practice across widely used compilers? Documenting decisions in revisions of the C Standard.

Work on the Cobol Standard ran for almost 60-years; WG14 has to be active for another 20-years to equal this.

Nearest And Dearest – a.k.

a.k. from thus spake a.k.

Last time we saw how we could use the list of the nearest neighbours of a datum in a set of clustered data to calculate its strengths of association with their clusters. Specifically, we used the k nearest neighbours algorithm to define those strengths as the proportions of its k nearest neighbours that were members of each cluster or with a generalisation of it that assigned weights to the neighbours according to their positions in the list.
This time we shall take a look at a clustering algorithm that uses nearest neighbours to identify clusters, contrasting it with the k means clustering algorithm that we covered about four years ago.

Log driven development

Fran from BuontempoConsulting

Everyone knows attempting to figure out what's happening by resorting to print statements is desperation. Everyone knows you should use TDD, BDD or some xDD to write code well, don't they? I am desperate. I suggest PDD, print driven development is slow, error prone but sometimes helpful.

I recently shoe-horned a new feature into a large existing code base. It's hard to make it run a small set of work and I wanted to check a config change create my new class via the dependency injection framework. Now, I could put in a break point and wait a while for it to get hit, or not. Fortuenately, the code writes logs. By adding a log line to the constructor, I could tell much more quickly that I'd got the config correct. Write the code, leave it running, have a look later. I am tempted to name this Log Driven Development.

On one hand, adding the moral equivalents of prints is a bad thing. Teedy Deigh mused on logging in Overload 126, saying

It is generally arbitrary and unsustainable and few people are ever clear what the exact requirements are, so it spreads like a cat meme.

And yet, if I add informative logging as things happen, I can see how the whole system hangs together. Jonathan Boccara mentioned finding logs when trying to understand an unfamiliar code base, in his talk at the ACCU Conference. Existing logs can help you understand some code. Looking at this from a different direction begs the question are my logs useful? As a first step seeing a log entry when I wired in my class felt like a win. It told me what I wanted to know. I must make sure the logs I write out as my class starts doing something useful are themselves useful.

This is a step in the direction of a walking skeleton style approach - get things running end to end, outside-in, then fill in the details. I'm much more comfortable with a TDD approach, but keeping an eye on the whole system and how it hangs together is important. Building up the new feature by writing useful logs means the logs will be useful when it's released. LDD - log driven development. Why not?