std::accumulate vs. std::reduce

Simon Brand from Simon Brand

std::accumulate has been a part of the standard library since C++98. It provides a way to fold a binary operation (such as addition) over an iterator range, resulting in a single value. std::reduce was added in C++17 and looks remarkably similar. This post will explain the difference between the two and when to use one or the other.

Let’s start by looking at their interfaces, beginning with std::accumulate.

template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );

template< class InputIt, class T, class BinaryOperation >
T accumulate( InputIt first, InputIt last, T init,
              BinaryOperation op );

std::accumulate takes an iterator range and an initial value for the accumulation. You can optionally give it a binary operation to do the reduction, which will default to addition. It will call this operation on the initial value and the first element of the range, then on the result and the second element of the range, etc. Here are two equivalent calls:

auto sum1 = std::accumulate(begin(vec), end(vec), 0);
auto sum2 = std::accumulate(begin(vec), end(vec), 0, std::plus<>{});

std::reduce has a fair few more overloads to get your head round, but has a very similar interface once you understand them:

template<class InputIt>
typename std::iterator_traits<InputIt>::value_type reduce(
    InputIt first, InputIt last);

template<class ExecutionPolicy, class ForwardIt>
typename std::iterator_traits<ForwardIt>::value_type reduce(
    ExecutionPolicy&& policy,
    ForwardIt first, ForwardIt last);

template<class InputIt, class T>
T reduce(InputIt first, InputIt last, T init);

template<class ExecutionPolicy, class ForwardIt, class T>
T reduce(ExecutionPolicy&& policy,
         ForwardIt first, ForwardIt last, T init);

template<class InputIt, class T, class BinaryOp>
T reduce(InputIt first, InputIt last, T init, BinaryOp binary_op);

template<class ExecutionPolicy, class ForwardIt, class T, class BinaryOp>
T reduce(ExecutionPolicy&& policy,
         ForwardIt first, ForwardIt last, T init, BinaryOp binary_op);

The differences here are:

I’ll talk about these points in turn.

Execution policies

Execution policies are a C++17 feature which allows programmers to ask for algorithms to be parallelised. There are three execution policies in C++17:

  • std::execution::seq – do not parallelise
  • std::execution::par – parallelise
  • std::execution::par_unseq – parallelise and vectorise (requires that the operation can be interleaved, so no acquiring mutexes and such)

The idea behind execution policies is that you can change a serial algorithm to a parallel algorithm simply by passing an additional argument to the function:

auto sum1 = std::reduce(begin(vec), end(vec));                      //sequential
auto sum2 = std::reduce(std::execution::seq, begin(vec), end(vec)); //sequential
auto sum3 = std::reduce(std::execution::par, begin(vec), end(vec)); //parallel

Allowing parallelisation is the main reason for the addition of std::reduce. Let’s look at an example where we want to sum up all the elements in an array. With std::accumulate it looks like this:

accumulate plus

Note that each step of the computation relies on the previous computation, i.e. this algorithm will execute serially and we make no use of hardware parallel processing capabilities. If we use std::reduce with the std::execution::par policy then it could look like this:

reduce plus

This is a trivial amount of data for processing in parallel, but the benefit gained when the data size is scaled up should be clear: some of the operations can be executed independently of others, so they can be done in parallel.

A common question is: why do we need an entirely new algorithm for this? Why can’t we just overload std::accumulate? For an example of why we can’t do this, let’s use std::minus<> instead of std::plus<> as our reduction operation. With std::accumulate we get this:

accumulate minus

However, if we try to use std::reduce, we could get something like:

reduce minus

Uh oh. We just broke our code.

We got the wrong answer because of the mathematical properties of subtraction. You can’t arbitrarily reorder the operands, or compute the operations out of order when doing subtraction. This is formalised in the properties of commutativity and associativity.

A binary operation ∗ on a set S is associative if the following equation holds for all x, y, and z in S:

(x ∗ y) ∗ z = x ∗ (y ∗ z)

An operation is commutative if:

x ∗ y = y ∗ x

Unfortunately there’s no way for the compiler to reliably check that some operation is associative and commutative, so we’re stuck with having std::reduce as a separate algorithm. In concepts speak, this is called an axiom: a requirement imposed on semantics which cannot generally be statically verified.

Input vs. Forward Iterators

I couldn’t find any reasoning in the original standards papers for the separate overloads for input and forward iterators. However, my educated guess is that this is to allow library implementors to chunk up the data in order to distribute it to a thread pool. Indeed, this appears to be what MSVC’s implementation does.

The idea is that input iterators are single-pass, whereas forward iterators can be iterated through multiple times. An algorithm couldn’t chunk up data indexed by input iterators because by the time it had gone through the range to work out the sub-range boundaries, the data from the iterator could have already been consumed.

Optional Initial Element

std::reduce lets you not bother passing an initial element, in which case it will default-construct one using typename std::iterator_traits<InputIt>::value_type{}. I think this was a mistake. A default-constructed value is not always the identity element (such as in multiplication), and it’s very easy to for a programmer to miss out the initial element. The code will still compile, it will just give the wrong answer. I suspect that this choice will result in some hard-to-find bugs when this interface comes into heavier use.

Finishing Up

That covers the differences between std::reduce and std::accumulate. My three point guide to std::reduce is:

  • Use std::reduce when you want your accumulation to run in parallel
  • Ensure that the operation you want to use is both associative and commutative
  • Remember that the default initial value is produced by default construction, and that this may not be correct for your operation

Replicating results using research software

Derek Jones from The Shape of Code

The reproducibility of results, from scientific studies, has always been an important issue. Over the last few years software has become a hot topic in reproducibility circles; many researchers have an expectation that if they run the original researcher’s software, they will replicate the results. Reality has not lived up to their expectations and there is lots of flapping around looking for a solution. There is a solution, but first, why does the problem exist?

I have spent a lot of time porting software to different compilers (when I was in the compiler business, I wanted everybody to port their applications to the compiler I was working), different hardware (oh, the days when every major vendor had at least one distinct cpu; not like today where it’s x86, ARM, or embedded), different operating systems (umpteen flavors of Unix, all with slightly different header file contents and library behavior; the Unix wars were good for those in the porting business) and every now and again different languages (by translating).

The Wintel alliance wiped out variation in cpus and operating systems (they can still be found lurking in dark corners) and open source compilers created a near monoculture of compilers for the major languages.

The major software portability problems of 30 years ago have become rather minor. But software portability problems that once tended to be minor (at least for scientific software), have grown to become a major headache. Today’s major portability problems center around evolution of the libraries/packages being used, and longer term the evolution of the language(s) used.

Evolution has created development ecosystems where there are rampant dependencies on specific, or earlier than, or later than versions of libraries/packages. I have been out of the porting business for several decades, but talking to those doing it today, the story is the same; experience in porting from A to B is everything, second best is talking to somebody else who has gone in that direction and third best are the one-line forums such as stackoverflow.

Researchers are doing research on who-knows-what and probably have need-to-know knowledge of software and the libraries they are using, the researchers receiving a copy of the original software might know less. What is the probability that the originating and receiving researchers have exactly versions of libraries installed? The receiving researcher may not have any of the needed libraries installed, and promptly install the latest version (which may well be more recent than the one used by the original researcher).

A solution is available; distribute a duplicate of the researchers complete system as a container, e.g., a Docker image.

Containers solve the replication problem. But these days people want more, they actually think it should be possible to take research software and modify it to suite their own needs. Good luck with that.

Research software is written to solve a problem, often by people writing their first non-trivial programs (i.e., they are novices), with no incentive to produce something that is easy for others to use. When software is written by experienced developers, who have an incentive to build something that is easy for others to work with, multiple reimplementations are often still required to achieve something of decent quality. Creating robust software, that others can use, is very hard.

The problem with software is its invisibility; the difficulties are not visible. When the internal operations are visible, the difficulties of making changes are easier to see.

James Albert Bonsack's cigarette rolling machine

James Albert Bonsack’s cigarette rolling machine (from Wikipedia).

The Perils of DateTime.Parse()

Chris Oldwood from The OldWood Thing

The error message was somewhat flummoxing, largely because it was so generic, but also because the data all came from a database extract rather than manual input:

Input string was not in a correct format.

Naturally I looked carefully at all the various decimal and date values as I knew this was the kind of message you get when parsing those kind of values when they’re incorrectly formed, but none of them appeared to be at fault. The DateTime error message is actually slightly different [1] but I’d forgotten that at the time and so I eyeballed the dates as well as decimal values just in case.

Then I remembered that empty string values also caused this error, but lo-and-behold I was not missing any optional decimals or dates in my table either. Time to hit the debugger and see what was going on here [2].

The Plot Thickens

I changed the settings for the FormatException error type to break on throw, sent in my data to the service, and waited for it to trip. It didn’t take long before the debugger fired into life and I could see that the code was trying to parse a decimal value as a double but the string value was “0100/04/01”, i.e. the 1st April in the year 100. WTF!

I immediately went back to my table and checked my data again, aware that a date like this would have stood out a mile first time around, but I was happy to assume that I could have missed it. This time I used some regular expressions just to be sure my eyes were not deceiving me.

The thing was I knew what column the parser thought the value was in but I didn’t entirely trust that I hadn’t mucked up the file structure and added or removed an errant comma in the CSV input file. I didn’t appear to have done that and so the value that appeared to be causing this problem was the decimal number “100.04”, but how?

None of this made any sense and so I decided to debug the client code, right from reading in the CSV data file through to sending it across the wire to the service, to see what was happening. The service was invoked via a fairly simple WCF client assembly and as I stepped into that code I came across a method called NormaliseDate()...

The Mist Clears

What this method did was to attempt to parse the input string value as a date and if it was successful it would rewrite it in an unusual (to me) “universal” format – YYYY/MM/DD [3].

The first two parsing attempts it did were very specific, i.e. it used DateTime.ParseExact() to match the intended output format and the “sane” local time format of DD/MM/YYYY. So far, so good.

However the third and last attempt, for whatever reason, just used DateTime.Parse() in its no-frills form and that was happy to take a decimal number like “100.04” and treat it as a date in the format YYY.MM! At first I wondered if it was treating it as a serial or OLE date of some kind but I think it’s just more liberal in its choice of separators than the author of our method intended [4].

Naturally there are no unit tests for this code or any type of regression test suite that shows what kind of scenarios this method was intended to support. Due to lack of knowledge around deployment and use in the wild of the client library I was forced to pad the values in the input file with trailing zeroes in the short term to workaround the issue, yuck! [5]

JSON Parsers

This isn’t the first time I’ve had a run-in with a date parser. When I was working on REST APIs I always got frustrated by how permissive the JSON parser would be in attempting to coerce a string value into a date (and time). All we ever wanted was to keep it simple and only allow ISO-8601 format timestamps in UTC unless there was a genuine need to support other formats.

Every time I started writing the acceptance tests though for timestamp validation I’d find that I could never quite configure the JSON parser to reject everything but the desired format. In the earlier days of my time with ASP.Net even getting it to stop accepting local times was a struggle and even caused us a problem as we discovered a US/UK date format confusion error which the parser was hiding from us.

In the end we resorted to creating our own Iso8601DateTime type which used the .Net DateTimeOffest type under the covers but effectively allowed us to use our own custom JSON serializer methods to only support the exact format we wanted.

More recently JSON.Net has gotten better at letting you control the format and parsing of dates but it’s still not perfect and there are unit tests in past codebases that show variants that would unexpectedly pass, despite using the strictest settings. I wouldn’t be surprised if our Iso8601DateTime type was still in use as I can only assume everyone else is far less pedantic about the validation of datetimes and those that are have taken a similar route to ensure they control parsing.

A Dangerous Game

One should not lose sight though of the real issue here which the attempt to classify string values by attempting to parse them. Even if you limit yourself to a single locale you might get away with it but when you try and do that across arbitrary locales you’re just asking for trouble.

 

[1] “String was not recognized as a valid DateTime.”

[2] This whole fiasco falls squarely in the territory I’ve covered before in my Overload article “Terse Exception Messages”. Fixing this went to the top of my backlog, especially after I discovered it was a problem for our users too.

[3] Why they didn’t just pick THE universal format of ISO-8601 is anyone’s guess.

[4] I still need to go back and read the documentation for this method because it clearly caters for scenarios I just don’t normally see in my normal locale or user base.

[5] That’s what happens with tactical solutions, no one ever quite gets around to documenting anything because they never think it’ll survive for very long...

Closing the Product Owner mini-series: they are all different!

Allan Kelly from Allan Kelly Associates

StopStart-2018-05-9-09-45.jpg
With some final words I’d like to draw this mini-series on the Product Owner to a close and open a new chapter with a new book. I’ve written six blog posts in the last two months and I have drafts for more but there are other things I want to blog about.

I have drafts for more posts and ideas for even more. So its time to make this into another book: Product Ownership. This is on the LeanPub site now and you can buy it. So far it just contains a new prologue story but I’ll add these posts soon as the first chapters.

Ever since I wrote Little Book of User Stories I’ve thought there should be a companion volume: “Little Book of Product Ownership”. The intention is for the first part of the new book to discuss the product owner role – and whether it should even exist – and then quickly get into the tools of Product Ownership.

Now some closing words…

While I’ve suggest a lot of things that a Product Owner should do, and a few that they should not do, there are really no hard and fast rules about what a Product Owner should or should not do.

In the language of business schools: there is no contingent way of being a product owner, every product owner and organization is different and they need to find their own path. I cannot give you a flow chart for what a product owner does or should do, nor can I give you a set of rules to say “When the customer says Foo the Product Owner should do Bar.”

Every Product Owner has to work out what is right for them because every organization is different. And every organization will – rightly or wrongly – expect different things from the people it christens Product Owner.

Additionally every team is different and contains different skills and experience. As a result every team will differ in what it needs from the Product Owner(s) and how the team members can support the Product Owner and share the work.

And every Product Owner is themselves different and brings different skills, experience and insights to the role.

Job #1 for a newly appointed Product Owner is to sit down and decide what type of Product Owner they are expected to be and what type of Product Owner they want to be:

  • They may be a Backlog Administrator taking instructions from others.
  • They may be a Subject Matter Expert using their expert knowledge of the domain to decide what the right product to build is and help other team members understand the details of what is being built.
  • They may need to analyse internal process and business lines using the skills of Business Analysis.
  • They may need to get out on the road to meet customers – and potential customers – to understand the market and where the opportunities are using the skills of Product Management.
  • They may need to call on skills from other fields to: Project Management, Consulting and Entrepreneurship to name a few.

But a Product Owner is not some other things:

  • If they were a developer they need to accept they will not be coding any more. There simply isn’t time and anyway, they need to trust the team.
  • If they were a Project Manager, Development or Line Manager they need to resist any urge to tell people what to do or look too far into the future. They need to re-focus on value not time, and recognise that their authority comes from their competence not from a position on a chart.
  • Product Owners from a Business Analysis background need to look beyond Business Analysis, specifically they need to immerse themselves in the world of Product Management.
  • While Product Owners who were Product Managers probably have the easiest ride they too need to change, they need to think more about internal stakeholders, processes and delivery.

Every Product Owner and everyone working with Product Owners needs to read and reflect on the role. Hopefully some of the words in my recent posts – and the new book – will help with that – and hopefully some of you might like to hire me for advice or a training course – just call 🙂

Finally, I sincerely believe there are better Product Owners and not-so-good Product Owners, and that some organizations (teams, companies, enterprises) which offer a better environment for Product Ownership and equally there are those which are downright hostile to product ownership.

Want to receive these posts by e-mail? – join the newsletter today and receive a free eBook: Xanpan: Team Centric Agile Software Development

The post Closing the Product Owner mini-series: they are all different! appeared first on Allan Kelly Associates.

Do you still get the buzz? I do!

Paul Grenyer from Paul Grenyer


Whatever else I do to earn a living, I am a software engineer at the core. Outside of work other things give me a reason to smile - heavy metal bands, science fiction books or my family - but when it comes to work, writing software is what gives me the biggest buzz. Even after 33 years!

Recently I spent the weekend writing some software for a client. They have an app, which we built, that allows them to take photos and complete a questionnaire for installations so that they can record compliance. The software I wrote receives the photos and questionnaire responses from the app, generates a PDF document detailing the responses and attaches it to an email, along with the photos, to send to the client. Not a particularly exciting process most would agree.

It’s a straightforward piece of software (despite the security concerns and image processing which took a little while to get just right) which delivers exactly what the client needs, but we wanted to be doubly sure. So, in the early days of the software running for real (i.e. the client is using it, not just us testing it) we got copies of the emails the app generated so we could check everything was working as it should. And that’s where the buzz of being a software developer begins.

As developers, we’re not always able to monitor in what way or how frequently the software we write is being used by our clients. There are confidentiality issues to consider, as well as the practical aspects and cost concerns of implementing a suitable monitoring process. This means a lot of the time we rely on anecdotal responses from our clients, and of course feedback when something goes wrong (which thankfully, isn’t too often).

With this particular client we knew each and every time they used the software as an email would appear and we could see how the app was working until we, and they, were satisfied with the process. Even though it was such a simple thing, every time an email pinged through from the app I got a twinge of excitement and a flush of pride. To see something I’d created from scratch work successfully and be used by someone was a small but genuine reward for me and reminded me why I love doing what I do. The buzz of seeing software work.

What gives you that buzz every day and keeps you doing what you’re doing?

Type compatibility: name vs structural equivalence

Derek Jones from The Shape of Code

What are the rules for deciding when two types are the same, or compatibility?

This question needs to be answered to decide whether an object of type T1 can be assigned to an object of type T2, whether they can be compared, added together, etc.

A wide collection of rules have been combined together, by various languages, for type compatibility of scalar types (e.g., integer, character, etc), but for aggregate types two rules dominate: name equivalence, and structural equivalence, or some combination.

With name equivalence, two types are the same if they are declared using the same name (e.g., the name of the tag for a union type, in C)

With structural equivalence, two types are compatible if they have a compatible structure, i.e., their internal contents are type compatible (this requires walking over each field/member checking that it is compatible). For instance, an object declared to have an aggregate type containing three integers is compatible with another aggregate type containing three integers (assuming any type modifiers, such as const’ness or mutability, are the same).

Structural compatibility becomes interesting when pointer types are involved; the pointed to types need to be checked and loops can occur, e.g., type S1 contains a field that has a type pointer to S2, which contains a field that has type pointer to S1.

While most types are easily checked for structural compatibility, every now and again aggregate types connect together in a way that makes it non-trivial to figure out which types are structurally compatible (dot file; needs graphviz):

C struct types in complicated cyclic relationship

Handling the edge cases requires maintaining a stack of information about which pairs of types are currently being compared.

In C, type compatibility is a combination of name equivalence (for aggregate types in the same translation unit) and structural equivalence (for function types and aggregate types across translation units).

Function types have to use structural equivalence because the type in a function definition is anonymous (the function name that appears in the definition has this anonymous type), there is no name to compare.

Cycles cannot appear in function types (in C), because the identifier being defined in a typedef is not in scope until just after the completion of its declarator. It is not possible to refer to the identifier being defined insider its own definition (e.g., it is not possible to define a function that takes its own type as a parameter; in typedef int (*f)(f); the second f is a redundant parameter name, the scope of the type denoted by the first f begins just before the semicolon).

Structural equivalence across translation units is a hangover from the early days of C, when developers were sloppy when using (or not) tag names (with different people having different rules for using upper/lower case tag names); developers’ knew what the layout in memory was and created the necessary types for their use of this data.

Type compatibility via name equivalence is easy to explain and makes it explicit when developers are bending the rules (i.e., pointer to struct casts appear in the code).

Type compatibility via structural equivalence is the wild west, which still exists in some development environments.

Enjoy a summer feast with the hottest tech community in The East

Paul Grenyer from Paul Grenyer


Join the Norwich tech community on Friday 27 July 2018 for a tasty BBQ to celebrate #NorfolkDay.

SyncNorwich, Norfolk Developers and Hot Source have got together to organise a delicious BBQ at the lovely Unthank Arms pub. (Don’t worry, we’re leaving it to the professionals to do the cooking.) You will be able to enjoy your meal in the garden – or the covered courtyard (if you need shelter or shade).

We look forward to seeing you for what is sure to be a fun evening from 18:00 to 22:00.

There will be a paying bar.

If your business would like to sponsor this event (particularly the drinks), please get in touch with the organisers.

Tickets cost just £18.92 per person – including transaction fees – but excluding drinks. In return, you’ll be able to choose from the following freshly cooked BBQ food.

Choose a main from either: Juicy homemade burger, Salmon parcels with lemon and herb butter, Archer’s award winning sausages, Lemon and thyme marinated chicken fillet, or Halloumi and bbq vegetables

Choose a side from either: Homemade coleslaw, New potato and spring onion salad, Tomato and red onion salad, or Cucumber and yoghurt raita.

Book now: https://summer-bbq-2018.eventbrite.co.uk

Breakfast with Peter Brady – CEO and Founder of Orbital Media

Paul Grenyer from Paul Grenyer



What: Breakfast with Peter Brady – CEO and Founder of Orbital Media
When: Tuesday 5th June
Where: The Maids Head Hotel, Tombland, Norwich, NR3 1LB
How much: £13.95
RSVP: https://www.meetup.com/Norfolk-Developers-NorDev/events/qqwhznyxjbhb/

Peter Brady is CEO and founder of Orbital Media, one of the UK’s most innovative full service Digital Agencies. Founded in 2003, Orbital Media has worked with some of the world’s biggest organisations on national and global accounts (including Aviva, NHS, Sanofi, Nestle and Mitsubishi).

Although Orbital Media’s traditional focus was on projecting brand engagement and awareness through social and digital channels, over the last 10 years it has developed a strong innovation and technology focus, becoming the UK’s leading supplier of gamification apps to the healthcare industry.

Peter will talk about Orbital Media’s journey and how it has moved into large scale tech projects (including a collaborative Artificial Intelligence project with an NHS body and the University of Essex, to reduce the burden of minor ailments in primary care). Peter will also tell us about how Orbital Media is developing and exporting virtual / augmented / mixed reality projects in healthcare, education and pain therapy sectors into global markets.

A Measure Of Borel Weight – a.k.

a.k. from thus spake a.k.

In the last few posts we have implemented a type to represent Borel sets of the real numbers, which are the subsets of them that can be created with countable unions of intervals with closed or open lower and upper bounds. Whilst I would argue that doing so was a worthwhile exercise in its own right, you may be forgiven for wondering what Borel sets are actually for and so in this post I shall try to justify the effort that we have spent on them.