Using atomics for thread synchronization in C++

Anthony Williams from Just Software Solutions Blog

In my previous blog post I wrote about spin locks, and how compilers must not move the locking loop above a prior unlock.

After thinking about this done more, I realised that is not something specific to locks — the same issue arises with any two step synchronization between threads.

Consider the following code

std::atomic<bool> ready1{false};
std::atomic<bool> ready2{false};

void thread1(){
  ready1.store(true, std::memory_order_release);
  while(!ready2.load(std::memory_order_acquire)){}
}

void thread2() {
  while(!ready1.load(std::memory_order_acquire)) {}
  ready2.store(true, std::memory_order_release);
}

thread1 sets ready1 to true, then waits for thread2 to set ready2 to true. Meanwhile, thread2 waits for ready1 to be true, then sets ready2 to true.

This is almost identical to the unlock/lock case from the previous blog post, except the waiting thread is just using plain load rather than exchange.

If the compiler moves the wait loop in thread1 above the store then both threads will hang forever. However it cannot do this for the same reason the spinlocks can't deadlock in the previous post: the store has to be visible to the other thread in a finite period if time, so must be issued before the wait loop. https://eel.is/c++draft/intro.multithread#intro.progress-18

An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

If the optimizer moved the store across the loop in thread1, then it could not guarantee that the value became visible to the other thread in a finite period of time. Therefore such an optimization is forbidden.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Another nail for the coffin of past effort estimation research

Derek Jones from The Shape of Code

Programs are built from lines of code written by programmers. Lines of code played a starring role in many early effort estimation techniques (section 5.3.1 of my book). Why would anybody think that it was even possible to accurately estimate the number of lines of code needed to implement a library/program, let alone use it for estimating effort?

Until recently, say up to the early 1990s, there were lots of different computer systems, some with multiple (incompatible’ish) operating systems, almost non-existent selection of non-vendor supplied libraries/packages, and programs providing more-or-less the same functionality were written more-or-less from scratch by different people/teams. People knew people who had done it before, or even done it before themselves, so information on lines of code was available.

The numeric values for the parameters appearing in models were obtained by fitting data on recorded effort and lines needed to implement various programs (63 sets of values, one for each of the 63 programs in the case of COCOMO).

How accurate is estimated lines of code likely to be (this estimate will be plugged into a model fitted using actual lines of code)?

I’m not asking about the accuracy of effort estimates calculated using techniques based on lines of code; studies repeatedly show very poor accuracy.

There is data showing that different people implement the same functionality with programs containing a wide range of number of lines of code, e.g., the 3n+1 problem.

I recently discovered, tucked away in a dataset I had previously analyzed, developer estimates of the number of lines of code they expected to add/modify/delete to implement some functionality, along with the actuals.

The following plot shows estimated added+modified lines of code against actual, for 2,692 tasks. The fitted regression line, in red, is: Actual = 5.9Estimated^{0.72} (the standard error on the exponent is pm 0.02), the green line shows Actual==Estimated (code+data):

Estimated and actual lines of code added+modified to implement a task.

The fitted red line, for lines of code, shows the pattern commonly seen with effort estimation, i.e., underestimating small values and over estimating large values; but there is a much wider spread of actuals, and the cross-over point is much further up (if estimates below 50-lines are excluded, the exponent increases to 0.92, and the intercept decreases to 2, and the line shifts a bit.). The vertical river of actuals either side of the 10-LOC estimate looks very odd (estimating such small values happen when people estimate everything).

My article pointing out that software effort estimation is mostly fake research has been widely read (it appears in the first three results returned by a Google search on software fake research). The early researchers did some real research to build these models, but later researchers have been blindly following the early ‘prophets’ (i.e., later research is fake).

Lines of code probably does have an impact on effort, but estimating lines of code is a fool’s errand, and plugging estimates into models built from actuals is just crazy.

On Twenty-Niner – student

student from thus spake a.k.

The Baron's most recent wager set Sir R----- the task of placing tokens upon spaces numbered from zero to nine according to the outcome of a twenty sided die upon which was inscribed two of each of those numbers. At a cost of one coin per roll of the die, Sir R-----'s goal was to place a token upon every space for which he should receive twenty nine coins and twenty nine cents from the Baron.

Can non-overlapping spinlocks deadlock in C++?

Anthony Williams from Just Software Solutions Blog

There has been discussion on Twitter recently about whether or not the C++ memory model allows spinlocks to deadlock if they just use memory_order_acquire in lock and memory_order_release in unlock, due to compiler optimizations. The case in question is where a thread locks one mutex, unlocks it, and locks a second: can the compiler reorder the second lock above the first unlock? If it does, and another thread does the same in the reverse order, with the same optimization, then sequential locks could deadlock.

Here is the code in question, with all the lock/unlock code inlined.

std::atomic<bool> mutex1{false};
std::atomic<bool> mutex2{false};

int x=0;
int y=0;

void thread1(){
  while(mutex1.exchange(true,std::memory_order_acquire)){}  // #1
  x=1;
  mutex1.store(false,std::memory_order_release); // #2

  while(mutex2.exchange(true,std::memory_order_acquire)){} // #3
  y=1;
  mutex2.store(false,std::memory_order_release); // #4
}

void thread2(){
  while(mutex2.exchange(true,std::memory_order_acquire)){} // #5
  x=2;
  mutex2.store(false,std::memory_order_release); // #6

  while(mutex1.exchange(true,std::memory_order_acquire)){} // #7
  y=2;
  mutex1.store(false,std::memory_order_release); // #8
}

For there to even be the possibility of deadlock, thread1 must successfully execute line #1 before thread2 successfully executes line #7, and thread2 must successfully execute line #5 before thread1 successfully executes line #3. Because these are RMW operations, the threads must agree on the ordering.

The modification order of mutex1 must thus be #1(success), #2, #7(success), #8. Similarly, the modification order of mutex2 must be #5(success), #6, #3(success), #4.

All threads must agree on these modification orders. https://eel.is/c++draft/intro.multithread#intro.races-4

From the point of view of thread1, everything must run in program order: compilers can only optimize things as long as they run "as if" in program order.

The store to mutex1 at #2 is guaranteed to be visible to thread2 in "a finite period of time". https://eel.is/c++draft/intro.multithread#intro.progress-18

Consequently, thread2 must eventually see that store at line #7, even if it executes line #7 a large number of times first.

Therefore, the compiler cannot move line #3 completely above line #2, since doing so would not guarantee the visibility of #2 to other threads in a finite period of time. It can move an arbitrary number of executions of line #3 above line #2 (all of which will see that mutex2 is still true), but not all the executions of line #3.

Given that thread2 eventually sees the store from #2 at line #7, the exchange at line #7 will eventually succeed, and thread2 will eventually complete.

Likewise, the store at #6 must become visible to thread1 in a finite period of time. Therefore the exchange at line #3 will eventually see the value stored by

6, the exchange will succeed, and thread1 will complete, and the compiler is

not allowed to move all the executions of line #7 above #6.

No amount of compiler optimization is allowed to break this, so no: spinlocks cannot deadlock if they don't overlap.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Pricing by quantity of source code

Derek Jones from The Shape of Code

Software tool vendors have traditionally licensed their software on a per-seat basis, e.g., the cost increases with the number of concurrent users. Per-seat licensing works well when there is substantial user interaction, because the usage time is long enough for concurrent usage to build up. When a tool can be run non-interactively in the cloud, its use is effectively instantaneous. For instance, a tool that checks source code for suspicious constructs. Charging by lines of code processed is a pricing model used by some tool vendors.

Charging by lines of code processed creates an incentive to reduce the number of lines. This incentive was once very common, when screens supporting 24 lines of 80 characters were considered a luxury, or the BASIC interpreter limited programs to 1023 lines, or a hobby computer used a TV for its screen (a ‘tiny’ CRT screen, not a big flat one).

It’s easy enough to splice adjacent lines together, and halve the cost. Well, ease of splicing depends on programming language; various edge cases have to be handled (somebody is bound to write a tool that does a good job).

How does the tool vendor respond to a (potential) halving of their revenue?

Blindly splicing pairs of lines creates some easily detectable patterns in the generated source. In fact, some of these patterns are likely to be flagged as suspicious, e.g., if (x) a=1;b=2; (did the developer forget to bracket the two statements with { }).

The plot below shows the number of lines in gcc 2.95 containing a given number of characters (left, including indentation), and the same count after even-numbered lines (with leading whitespace removed) have been appended to odd-numbered lines (code+data, this version of gcc was using in my C book):

North Star Horizon with cover removed.

The obvious change is the introduction of a third straight’ish line segment (the increase in the offset of the sharp decline might be explained away as a consequence of developers using wider windows). By only slicing the ‘right’ pairs of lines together, the obvious patterns won’t be present.

Using lines of codes for pricing has the advantage of being easy to explain to management, the people who sign off the expense, who might not know much about source code. There are other metrics that are much harder for developers to game. Counting tokens is the obvious one, but has developer perception issues: Brackets, both round and curly. In the grand scheme of things, the use/non-use of brackets where they are optional has a minor impact on the token count, but brackets have an oversized presence in developer’s psyche.

Counting identifiers avoids the brackets issue, along with other developer perceptions associated with punctuation tokens, e.g., a null statement in an else arm.

If the amount charged is low enough, social pressure comes into play. Would you want to work for a company that penny pinches to save such a small amount of money?

As a former tool vendor, I’m strongly in favour of tool vendors making a healthy profit.

Creating an effective static analysis requires paying lots of attention to lots of details, which is very time-consuming. There are lots of not particularly good Open source tools out there; the implementers did all the interesting stuff, and then moved on. I know of several groups who got together to build tools for Java when it started to take-off in the mid-90s. When they went to market, they quickly found out that Java developers expected their tools to be free, and would not pay for claimed better versions. By making good enough Java tools freely available, Sun killed the commercial market for sales of Java tools (some companies used their own tools as a unique component of their consulting or service offerings).

Could vendors charge by the number of problems found in the code? This would create an incentive for them to report trivial issues, or be overly pessimistic about flagging issues that could occur (rather than will occur).

Why try selling a tool, why not offer a service selling issues found in code?

Back in the day a living could be made by offering a go-faster service, i.e., turn up at a company and reduce the usage cost of a company’s applications, or reducing the turn-around time (e.g., getting the daily management numbers to appear in less than 24-hours). This was back when mainframes ruled the computing world, and usage costs could be eye-watering.

Some companies offer bug-bounties to the first person reporting a serious vulnerability. These public offers are only viable when the source is publicly available.

There are companies who offer a code review service. Having people review code is very expensive; tools are good at finding certain kinds of problem, and investing in tools makes sense for companies looking to reduce review turn-around time, along with checking for more issues.

OKRs in Agile infrographic

Allan Kelly from Allan Kelly Associates

I am indebted to Yoan Thirion for created this infographic to illustrate Succeeding with OKRs in Agile. He’s done a brilliant job on both the graphics and the summary – undoubtedly better graphics than I could have done and probably a better summary than I would too, sometimes one can be too close to a thing.


Subscribe to my blog newsletter and download Continuous Digital for free – normal price $9.99/£9.95/€9.95

The post OKRs in Agile infrographic appeared first on Allan Kelly Associates.

Reading ‘Let over Lambda’ – thoughts on fundamentals and why I keep reading classic CS texts

Timo Geusch from The Lone C++ Coder&#039;s Blog

Another one for my computer science reading list for this year. I do try to work my way through at least one classic computer science book annually and picked up Let Over Lambda a few weeks ago. Colour one of the cats not impressed, but then again she’s got more free time than I do and probably already read it. Not massively impressed by yet another old book

The first computer I owned

Derek Jones from The Shape of Code

The first computer I owned was a North Star Horizon. I bought it in kit form, which meant bags of capacitors, resistors, transistors, chips, printed circuit boards, etc, along with the circuit diagrams for each board. These all had to be soldered in the right holes, the chips socketed (no surface mount soldering for such a low volume system), and wires connected. I was amazed when the system booted the first time I powered it up; debugging with the very basic equipment I had would have been a nightmare. The only missing component was the power supply transformer, and a trip to the London-based supplier sorted that out. I saved a months’ salary by building the kit (which cost me 4-months salary, and I was one of the highest paid people in my circle).

North Star Horizon with cover removed.

The few individuals who bought a computer in the late 1970s bought either a Horizon or a Commodore Pet (which was more expensive, but came with an integrated monitor and keyboard). Computer ownership really started to take off when the BBC micro came along at the end of 1981, and could be bought for less than a months’ salary (at least for a white-collar worker).

My Horizon contained a Z80A clocking at 4MHz, 32K of RAM, and two 5 1/4-inch floppy drives (each holding 360K; the Wikipedia article says the drives held 90K, mine {according to the labels on the floppies, MD 525-10} are 40-track, 10-sector, double density). I later bought another 32K of memory; the system ROM was at 56K, and contained 4K of code, various tricks allowed the 4K above 60K to be used (the consistent quality of the soldering on one of the boards below identifies the non-hand built board).

North Star Horizon underside of boards.

The OS that came with the system was CP/M, renamed to CP/M-80 when the Intel 8086 came along, and will be familiar to anybody used to working with early versions of MS-DOS.

As a fan of Pascal, my development environment of choice was UCSD Pascal. The C compiler of choice was BDS C.

Horizon owners are total computer people :-) An emulator, running under Linux and capable of running Horizon disk images, is available for those wanting a taste of being a Horizon owner. I didn’t see any mention of audio emulation in the documentation; clicks and whirls from the floppy drive were a good way of monitoring compile progress without needing to look at the screen (not content with using our Horizon’s at home, another Horizon owner and I implemented a Horizon emulator in Fortran, running on the University’s Prime computers). I wonder how many Nobel-prize winners did their calculations on a Horizon?

The Horizon spec needs to be appreciated in the context of its time. When I worked in application support at the University of Surrey, users had a default file allocation of around 100K’ish (memory is foggy). So being able to store stuff on a 360K floppy, which could be purchased in boxes of 10, was a big deal. The mainframe/minicomputers of the day were available with single-digit megabytes, but many previous generation systems had under 100K of RAM. There were lots of programs out there still running in 64K. In terms of cpu power, nearly all existing systems were multi-user, and a less powerful, single-user, cpu beats sharing a more powerful cpu with 10-100 people.

In terms of sheer weight, visual appearance and electrical clout, the Horizon power supply far exceeds those seen in today’s computers, which look tame by comparison (two of those capacitors are 4-inches tall):

North Star Horizon power supply.

My Horizon has been sitting in the garage for 32-years, and tucked away in unused rooms for years before that. The main problem with finding out whether it still works is finding a device to connect to the 25-pin serial port. I have an old PC with a 9-pin serial port, but I have spent enough of my life fiddling around with serial-port cables and Kermit to be content trying a simpler approach. I connect the power supply and switched it on. There was a loud crack and a flash on the disk-controller board; probably a tantalum capacitor giving up the ghost (easy enough to replace). The primary floppy drive did spin up and shutdown after some seconds (as expected), but the internal floppy engagement arm (probably not its real name) does not swing free when I open the bay door (so I cannot insert a floppy).

I am hoping to find a home for it in a computer museum, and have emailed the two closest museums. If these museums are not interested, the first person to knock on my door can take it away, along with manuals and floppies.

Big Friendly GiantS – a.k.

a.k. from thus spake a.k.

In the previous post we saw how we could perform a univariate line search for a point that satisfies the Wolfe conditions meaning that it is reasonably close to a minimum and takes a lot less work to find than the minimum itself. Line searches are used in a class of multivariate minimisation algorithms which iteratively choose directions in which to proceed, in particular those that use approximations of the Hessian matrix of second partial derivatives of a function to do so, similarly to how the Levenberg-Marquardt multivariate inversion algorithm uses a diagonal matrix in place of the sum of the products of its Hessian matrices for each element and the error in that element's current value, and in this post we shall take a look at one of them.