Online Concurrency Workshop at C++ on Sea 2021

Anthony Williams from Just Software Solutions Blog

The restrictions brought upon us by COVID-19 are not over yet, and C++ on Sea is the latest conference that will be running as an online-only conference.

I will be running my More Concurrent Thinking class as an online workshop for C++ on Sea on 30th June and 1st July 2021.

The workshop will run from 09:30 UTC to 18:15 UTC each day. For attendees from North and South America, this is likely quite an early morning, and may be a late night for attendees from the far East, so please check the times in your local timezone.

Tickets include the full day of "normal" conference presentations on 2nd July 2021. Get yours from the C++ On Sea tickets page.

I hope to see you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Impact of native language on variable naming

Derek Jones from The Shape of Code

When creating a variable name, to what extent are developers influenced by their native human language?

There is lots of evidence that variable names are either English words, abbreviations of English words, or some combination of these two. Source code containing a large percentage of identifiers using words from other languages does exist, but it requires effort to find; there is a widely expressed view that source should be English based (based on my experience of talking to non-native English speakers, and even the odd paper discussing the issue, e.g., Language matters).

Given that variable names can prove information that reduces the effort needed to understand code, and that most code is only ever read by the person who wrote it, developers should make the most of their expertise in using their native language.

To what extent do non-native English-speaking developers make use of their non-English native language?

I have found it very difficult to even have a discussion around this question. When I broach the subject with non-native English speakers, the response is often along the lines of “our develo0pers speak good English.” I am careful to set the scene by telling them of my interest in naming, and that I think there are benefits for developers to make use of their native language. The use of non-English languages in software development is not yet a subject that is open for discussion.

I knew that sooner or later somebody would run an experiment…

How Developers Choose Names is another interesting experiment involving Dror Feitelson (the paper rather confusingly refers to it as a survey, a post on an earlier experiment).

What makes this experiment interesting is that bilingual subjects (English and Hebrew) were used, and the questions were in English or Hebrew. The 230 subjects (some professional, some student) were given a short description and asked to provide an appropriate variable/function/data-structure name; English was used for 26 of the question, and Hebrew for the other 21 questions, and subjects answered a random subset.

What patterns of Hebrew usage are present in the variable names?

Out of 2017 answers, 14 contained Hebrew characters, i.e., not enough for statistical analysis. This does not mean that all the other variable names were only derived from English words, in some cases Hebrew words appeared via transcription using the 26 English letters. For instance, using “pinuk” for the Hebrew word that means “benefit” in English. Some variables were created from a mixture of Hebrew and English words, e.g., deservedPinuks and pinuksUsed.

Analysing this data requires someone who is fluent in Hebrew and English. I am not a fluent, or even non-fluent, Hebrew speaker. My role in this debate is encouraging others, and at last I have some interesting data to show people.

The paper spends time showing how for personal preferences result in a wide selection of names being chosen by different people for the same quantity. I cannot think of any software engineering papers that have addressed this issue for variable names, but there is lots of evidence from other fields; also see figure 7.33.

Those interested in searching source code for the impact of native-language might like to look at the names of variables appearing as operands of the bitwise and logical operators. Some English words occur much more frequently in the names of these variable, compared to variables that are operands of arithmetic operators, e.g., flag, status, and signal. I predict that non-native English-speaking developers will make use of corresponding non-English words.

A PR Exercise – a.k.

a.k. from thus spake a.k.

In the last few posts we've been looking at the BFGS quasi-Newton algorithm for minimising multivariate functions. This uses iteratively updated approximations of the Hessian matrix of second partial derivatives in order to choose directions in which to search for univariate minima, saving the expense of calculating it explicitly. A particularly useful property of the algorithm is that if the line search satisfies the Wolfe conditions then the positive definiteness of the Hessian is preserved, meaning that the implied locally quadratic approximation of the function must have a minimum.
Unfortunately for large numbers of dimension the calculation of the approximation will still be relatively expensive and will require a significant amount of memory to store and so in this post we shall take a look at an algorithm that only uses the vector of first partial derivatives.

Pomodoros worked during a day: an analysis of Alex’s data

Derek Jones from The Shape of Code

Regular readers know that I am always on the lookout for software engineering data. One search technique is to feed a ‘magic’ phrase into a search engine, this can locate data hiding in plain sight. This week the magic phrase: “record of pomodoros” returned pages discussing two collections of daily Pomodoros worked, each over a year, plus several possible collections, i.e., not explicitly stated. My email requests for data have so far returned one of the collections, kindly made available by Alex Altair, and this post discusses Alex’s data (I have not discussed the data with Alex, who I’m hoping won’t laugh too loud at the conclusions I have reached).

Before analyzing data I always make predictions about what I expect to see. I know from the email containing the data that it consisted of two columns: date and Pomodoro’s worked, i.e., no record of task names. The first two predictions for this data were the two most common patterns seen in estimation data, i.e., use of round numbers, and a weekend-effect (most people don’t work during the weekend and the autocorrelation of the daily counts contain peaks at lags of 6 and 7). The third prediction was that over time the daily total Pomodoro counts would refine into counts for each of the daily tasks (I had looked at the first few lines of the data and seen totals for the daily Pomodoros worked.

The Renzo Pomodoro dataset is my only previous experience analysing Pomodoro data. Renzo created a list of tasks for the day, estimated the number of Pomodoros for each task would take, and recorded how many it actually took. For comparison, the SiP effort estimation dataset estimates software engineering tasks in hours.

Alex uses Pomodoros as a means of focusing his attention on the work to be done, and the recorded data is a measure of daily Pomodoro work done.

I quickly discovered that all my predictions were wrong, i.e., no obvious peaks showing use of round numbers, no weekend effect, and always daily totals. Ho-hum.

The round number effect is very prominent in estimates, but is not always visible in actuals; unless people are aiming to meet targets or following Parkinson’s law.

How many days had one Pomodoro worked, how many two Pomodoro, etc? The plot below shows the number of days for which a given number of Pomodoros were worked (the number of days with zero Pomodoros is not shown); note the axis are log scaled. The blue points are for all days in 2020, and the green points are all days in 2020+178 days of 2021. The red lines are two sets of two fitted power laws (code+data):

Number of days on which a given number of Pomodoros were worked, with fitted power laws.

Why the sudden change of behavior after seven Pomodoro? Given a Pomodoro of 25-minutes (Alex says he often used this), seven of them is just under 3-hours, say half a day. Perhaps Alex works half a day, for every day of the week.

Why the change of behavior since the end of 2020 (i.e., exponent of left line changes from 0.3 to -0.1, exponent of right line is -3.0 in both cases)? Perhaps Alex is trying out another technique. The initial upward trend is consistent with the Renzo Pomodoro dataset.

The daily average Pomodoros worked is unchanged at around 5.6. The following plot shows daily Pomodoros worked over the 543 days, red line is a fitted loess model.

Daily Pomodoros worked over 543 days.

The weekend effect might not be present, but there is a strong correlation between adjacent days (code+data). The best fitting ARIMA model gives the equation: P_t=0.37+0.93*P_{t-1}+w_t-0.74*w_{t-1}, where: P_t is the Pomodoros worked on day t, P_{t-1} Pomodoros worked on the previous day, w_t is white noise (e.g., a Normal distribution) with a zero mean and a standard deviation of 4 (in this case) on day t, and w_{t-1} the previous day’s noise (see section 11.10 of my book for technical time series details).

This model is saying that the number of Pomodoros worked today is strongly influenced by yesterday’s Pomodoro worked, modulated by a large random component that could be large enough to wipe out the previous days influence. Is this likely to be news to Alex, or to anybody looking at the plot of Pomodoros over time? Probably not.

For me, the purpose of data analysis is to find patterns of behavior that are of use to those involved in the processes that generated the data (for many academics, at least in software engineering, the purpose appears to be to find patterns that can be used to publish papers, and given enough searching, it is always possible to find patterns in data). What patterns of behavior might Alex be interested in?

Does more Pomodoro work get done at the start of the week, compared to the end of the week? The following heatmap is based on the number of week days on which a given number of Pomodoros were worked. The redder the region, the more likely that value is likely to occur (code+data):

Heatmap of number of days on which a given number of Pomodoros were worked on a given day of the week.

There are certainly more days near the end of the week having little or no Pomodoro work, and the high Pomodoro work days appear to be nearer the start of the week. I need to find a statistical technique that quantifies these observations.

I think that the middle plot is the most generally useful, it illustrates how variable the work done during a day can be.

Is Alex’s Pomodoro work typical or unusual? We will have to wait for a lot more data before that question can be addressed.

If you are a Pomodoro user, and have ideas for possible patterns in the data, please let me know.

As always, pointers to more data, Pomodoro or otherwise, most welcome.

Matrix is the only (chat) game in town

Andy Balaam from Andy Balaam's Blog

On my phone and computer I use WhatsApp, Signal, Slack, Keybase, Discord, IRC, XMPP/Jabber and Element/Matrix. In addition, I occasionally use the messaging features of Mastodon, Twitter and even LinkedIn. I’ve never used Telegram, Line, WeChat, Session, Wire or Status chat, but they exist too, along with many others.

It would be better if I could chat with people using the app I prefer, rather than the one I am forced to use.

Of course, the only useful chat app is the one your friends and family are on, so it’s pointless to debate the finer points in each service’s favour, but here I go anyway.

Only Matrix is:

The importance of decentralisation has been re-emphasised for me this week after the freenode IRC debacle. A single controlling entity, even when it is currently benign (as some people believe Signal is) is not a guarantee that things will stay this way. Thank goodness you can connect your usual IRC network to imagine what would happen to Signal users if they realised someone unscrupulous had acquired control.

Matrix does not solve all our problems. Notably:

  • Its security is probably not good enough for people threatened by powerful interests – at the moment it’s quite easy to see who’s talking to whom, and when.
  • Not all clients support end-to-end encryption, and not all turn it on by default (but the most-used ones do).

Despite these limitations, Matrix is the only chat network that is even attempting to provide what users need, and it seems to be doing a pretty good job of it.

I think we should work together to address its weaknesses, and adopt it wherever we can.

So, I recommend Matrix (specifically for group and individual chat.

Cascading OKRs and White Space OKRs

Allan Kelly from Allan Kelly Associates

A couple of weeks ago I blogged about the top-down or bottom-up question – “OKRs top-down? bottom-up? or ripples on a pond?”

The idea of top-down OKRs keeps cropping up: it needs a name. So please let me introduce Cascading OKRs, or C-OKRs for short.

I only just invented the term so I don’t use it in Succeeding with OKRs in Agile – although I do warn against the idea. The meme is in books and blogs I’ve read, in podcasts I’ve heard and it comes up again and again in Q&A sessions when I do presentations.

The Cascading OKRs idea goes like this: the people at the top of the organisation set OKRs. These are shared with people and teams “below” them. Those teams then write OKRs to support the delivery of the those above them. Their OKRs are in turn shared with “lower” individuals and teams who repeat the processes.

I’ve even heard it suggested that teams take the OKRs from above and use the key results as their objective(s). The key results they create around these objectives can then be used by “lower” teams as their objectives. Hence OKRs cascade down the organisation. (And we all know what Cascades look like don’t we?)

Undoubtedly this interpretation has its own logic – both in the top setting the master OKRs and the lower levels implementing them. It is after all functional decomposition. And I must believe from what I hear that some companies do it this way even if I have never seen it myself. One hopes that it works for these companies, I think it can be better.

C-OKRs are incompatible with the agile mindset because it deprives teams of autonomy. Each team must implement the objectives given to them regardless of what the team believes, regardless of what the team’s customers are asking for, irrespective of the research the product owner/manager has done.

In reducing, even eliminating, autonomy motivation is going to fall too, teams are no longer their own masters.

Nor will this way increase agility because each team must move in lockstep – or perhaps one step behind – the team above them. The cascading hierarchy injects delay.

Cascading OKRs may be easy to grasp, they may be easy to sell, they may follow the logic of hierarchy and management-by-objective but that also means they represent a lost opportunity to integrate OKRs and agile.

Having named Cascading OKRs I need to name the alternative: broadly the approach I advocate in Succeeding with OKRs.

I name this approach White Space OKRs, WS-OKRs.

Organisational leaders should set the vision, the big-hairy-audacious-goal, the ultimate objective, the massively transformative purpose. They should name the mission, they should set the culture and talk about the purpose of the organisation.

And they should leave copious amounts of white space – space for teams to fill.

Those visions should be light on how; they should be light on orders, instructions and mandates. That may seem odd but only by leaving these things out – by leaving white space – can individuals and teams, at all levels, decide how best they can support that mission, goal, purpose or whatever you call it. Planning is disabling.

Because teams decide how to support those goals – while supporting existing customers, legacy business and technology, plus other (potentially completing) demands – team retain autonomy, and autonomy creates motivation and flexibility.

There is one more assumption underlying this which deserves mentioning.

White Space OKRs assume that the teams already exist. With WS-OKRs leaders don’t need to create new teams to deliver their goals because those teams already exist. In other words, the organisation is operating a post-projects model, e.g. product teams, continuous digital, Spotify, or maybe SAFe. That raises an issue of gaps and I’ll return to this another day.

Subscribe to my blog newsletter and download Continuous Digital for free – normal price $9.99/£9.95/€9.95

White space photo from Katie Doherty on Unsplash.

The post Cascading OKRs and White Space OKRs appeared first on Allan Kelly Associates.

Sorry about the OKRs

Allan Kelly from Allan Kelly Associates

Quick word of appoogy to regular readers who may not be interested in OKRs: sorry for all the posts about OKRs.

Its probably a natural effect of pubishing a book on the subject and then doing a bunch of presentations, webinars, workshops and Q&A sessions. Give me a few weeks and this will pass. Normal service will be resumed, with thoughts on agile, technology and such – I just don’t know when!

The post Sorry about the OKRs appeared first on Allan Kelly Associates.

Where are the industrial strength R compilers?

Derek Jones from The Shape of Code

Why don’t compiler projects for the R language make it into production use? The few that have been written have remained individual experimental products, e.g., RLLVMCompile.

Most popular languages attract many compiler implementations. I’m not saying that any of these implementations have more than a handful of users, that they implement the full language (a full implementation is not common), or that they fulfil any need other than their implementers desire to build something.

A commonly heard reason for the lack of production R compilers is that it is not worth the time and effort, because most of an R program’s time is spent in the library code which is written in a compiled language (e.g., C or Fortran). The fact that it is probably not worth the time and effort has not stopped people writing compilers for other languages, but then I think that the kind of people who use R tend not to be the kind of people who want to spend their time writing compilers. On the whole, they are the kind of people who are into statistics and data analysis.

Is it true that that most R programs spend most of their time executing library code? It’s certainly true for me. But I have noticed that a lot of the library functions executed by my code are written in R. Also, if somebody uses R for all their programming needs (it might be the only language they know), then their code might not be heavily library dependent.

I was surprised to read about Tierney’s byte code compiler, because his implementation is how I thought the R-core’s existing implementation worked (it does now). The internals of R is based on 1980s textbook functional techniques, and like many book implementations of the day, performance is dependent on the escape hatch of compiled code. R’s implementers wisely spent their time addressing user concerns, which revolved around statistics and visual presentation, i.e., not internal implementation technicalities.

Building an R compiler is easy, the much harder and time-consuming part is the runtime system.

Threaded code is a quick and simple approach to compiler implementation. R source gets mapped to a sequence of C function calls, with these functions proving a wrapper to library functions implementing the appropriate basic functionality, e.g., add two vectors. This approach has been the subject of at least one Master’s thesis. Thesis implementations rarely reach production use because those involved significantly underestimate the work that remains to be done, which is usually a lot more than the original implementation.

A simple threaded code approach provides a base for subsequent optimization, with the base having a similar performance to an interpreter. Optimizing requires figuring out details of the operations performed and replacing generic function calls with ones designed to be fast for specific cases, or even better replacing calls with inline code, e.g., adding short vectors of integers. There is a lot of existing work for scripting languages and a few PhD thesis researching R (e.g., Wang). The key technique is static analysis of R source.

Jan Vitek is running what appears to be the most active R compiler research group, at the moment e.g., the Ř project. Research can be good for uncovering language usage and trying out different techniques, but it is not intended to produce industry strength code. Lots of the fancy optimizations in early versions of the gcc C compiler started life as a PhD thesis, with the respective individual sometimes going on to spend a few years creating a production quality version for the released compiler.

The essential ingredient for building a production compiler is persistence. There are an awful lot of details that need to be sorted out (this is why research project code does not directly translate to production code, they ignore ‘minor’ details in order to concentrate on the ‘interesting’ research problem). Is there a small group of people currently beavering away on a production quality compiler for R? If there is, I can understand being discrete, on long-term projects it can be very annoying to have people regularly asking when the software is going to be released.

To have a life, once released, a production compiler needs to attract users, who are often loyal to their current compiler (because they know that their code works for this compiler); there needs to be a substantial benefit to entice people to switch. The benefit of compiling R to machine code, rather than interpreting, is performance. What performance improvement is needed to attract a viable community of users (there is always a tiny subset of users who will pay lots for even small performance improvements)?

My R code is rarely cpu bound, so I am not in the target audience, no matter what the speed-up. I don’t have any insight in the performance problems experienced by the R community, and have no idea whether a factor of two, five, ten or more would be enough.

PG Webhooks Tool (beta) – I want your feedback!

Paul Grenyer from Paul Grenyer

PG Webhooks Tool

PG Webhooks is a tool designed to help with the testing of webhooks. It allows you to receive and view messages from a webhook until you're ready to setup the server which will receive the real webhooks for your application.

Register the URL uniquely generated for you by PG Webhooks with your webhook. Fire the webhook and see the messages appear. Select each message to see the details, including the body and header.

PG Webhooks sits on a free Heroku instance so may take a few moments to start the first time you use it. It is backed by a non-persistent Redis database, so your messages are only stored temporarily.

PG Webhooks beta is very much a prototype and has plenty of rough edges. I would appreciate any feedback in the comments below.