Career progression: an invisible issue in software development

Derek Jones from The Shape of Code

Career progression is an important issue in the development of some software systems, but its impact is rarely discussed, let along researched. A common consequence of career progression is that a project looses a member of staff, e.g., they move to work on a different project, or leave the company. Hiring staff and promoting staff are related neglected research areas.

Understanding the initial and ongoing development of non-trivial software systems requires an understanding of the career progression, and expectations of progression, of the people working on the system.

Effectively working on a software system requires some amount of knowledge of how it operates, or is intended to operate. The loss of a person with working knowledge of a system reduces the rate at which a project can be further developed. It takes time to find a suitable replacement, and for that person’s knowledge of the behavior of the existing system to reach a workable level.

We know that most software is short-lived, but know almost nothing about the involvement-lifetime of those who work on software systems.

There has been some research studying the durations over which people have been involved with individual Open source projects. However, I don’t believe the findings from this research, because I think that non-paid involvement on an Open source project has very different duration and motivation characteristics than a paying job (there are also data cleaning issues around the same person using multiple email addresses, and people working in small groups with one person submitting code).

Detailed employment data, in bulk, has commercial value, and so is rarely freely available. It is possible to scrape data from the adverts of job websites, but this only provides information about the kinds of jobs available, not the people employed.

LinkedIn contains lots of detailed employment history, and the US courts have ruled that it is not illegal to scrape this data. It’s on my list of things to do, and I keep an eye out for others making such data available.

The National Longitudinal Survey of Youth has followed the lives of 10k+ people since 1979 (people were asked to detail their lives in periodic surveys). Using this data, Joseph, Boh, Ang, and Slaughter investigated the job categories within the career paths of 500 people who had worked in a technical IT role. The plot below shows the career paths of people who had spent at least five years working in an IT role (code+data):

The job categories contained within the seven career paths in which people spent at least five years working in a technical IT role.

Employment history provides an upper bound for the time that a person is likely to have worked on a project (being employed to work on an Open source project while, over time, working at multiple companies is an edge case).

A company may have employees simultaneously working on multiple projects, spending a percentage of their time on each. How big a resource impact is the loss of such a person? Were they simply the same kind of cog in multiple projects, or did they play an important synchronization role across projects? Details on all the projects a person worked on would help answer some questions.

Building a software system involves a lot more than writing the code. Technical managers working on high level, broad brush, issues. The project knowledge that technical managers have contributes to ongoing work, and the impact of loosing a technical manager is probably more of a longer term issue than loosing a coding-developer.

There are systems that are developed and maintained by essentially one person over many years. These get written about and celebrated, but are comparatively rare.

One of the more reliable ways of estimating developer productivity is to measure the impact of them leaving a project.

Glory! Hammer!

Paul Grenyer from Paul Grenyer

Glory! Hammer! Were fantastic!

They played well and were lots of fun as you’d expect.  I mean who doesn’t like a gig to start with a cardboard cutout of Tom Jones and Delilah playing on the PA. The band were all dressed up - it must have been very hot - and playing their parts.

I did find some of the gaps between songs and the interplay with the audience felt a little too Steel Panther. It was too frequent, superfluous and added time to a set which could have been shorter.

Sozos Michael is a phenomenal singer and makes it seem effortless and perfect. I’m a big fan of widdly guitar and it doesn’t stand out as much on record as it did live which was a really nice surprise.

It’ll be great to see them again when the promised new album is out and they tour again.




Programming Languages: History and Fundamentals

Derek Jones from The Shape of Code

Programming Languages: History and Fundamentals by Jean E. Sammet is often cited in discussions of language history, but very rarely read (I appreciate that many oft cited books have not been read by those citing them, but age further reduces the likelihood that anybody has read this book; it was published in 1969). I read this book as an undergraduate, but did not think much of it. For around five years it has been on my list of books to buy, should a second-hand copy become available below £10 (I buy anything vaguely interesting below this price, with most ending up left on trains or the book table of coffee shops).

Thanks to Adam Gashlin the Internet Archive now contains a downloadable copy.

The list of 120 languages covered contains a handful of the 28 languages covered in an article from 1957. Sammet says that of the 120, 20 are already dead or on obsolete computers (i.e., it is unlikely that another compiler will be written), and that about 15 are widely used/implemented).

Today, the book is no longer a discussion of the recent past, but a window in to the Cambrian explosion of programming languages that happened in the 1960s (almost everything since then has been a variation on a theme); languages from the 1950s are also included.

How does the material appear to me from a 2022 vantage-point?

The organization of the book reminded me that programming languages were once categorized by application domain, i.e., scientific/engineering users, business users, and string & list processing (i.e., academic users). This division reflected the market segmentation for computer hardware (back then, personal computers were still in the realm of science fiction). Modern programming language books (e.g., Scott’s “Programming Language Pragmatics”) often organize material based on implementation details, e.g., lexical analysis, and scoping rules.

The overview of programming languages given in the first three chapters covers nearly all the basic issues that beginners are taught today, but the emphasis is different (plus typographical differences, such as keyword spelt ‘key word’).

Two major language constructs are missing: Dynamic storage allocation is not discussed: Wirth’s book Algorithms + Data Structures = Programs is seven years in the future, and Kernighan and Ritchie’s The C Programming Language nine years; Simula gets a paragraph, but no mention of the object-oriented concepts it introduced.

What is a programming language, and what are the distinguishing features that make some of them high-level programming languages?

These questions may sound pointless or obvious today, but people used to spend lots of time arguing over what was, or was not, a high-level language.

Sammet says: “… the first characteristic of a programming language is that the user can write a program without knowing much—if anything—about the physical characteristics of the machine on which the program is to be run.”, and goes on to infer: “… a major characteristic of a programming language is that there must be a reasonable potential of having a source program written in that language run on two computers with different machine codes without rewriting the source program. … In most programming languages, some—but often very little—rewriting of the source program is necessary.”

The reason that some rewriting of the source was likely to be needed is that there were often a lot of small variations between compilers for the same language. Compilers tended to be bespoke, i.e., the Fortran compiler for the X cpu running OS Y was written specifically for that combination. Retargetting an existing compiler to a new cpu or OS was much talked about, but it was more fun to write a new compiler (and anyway, support for new features was needed, and it was simpler to start from scratch; page 149 lists differences in Fortran compilers across IBM machines). It didn’t help that there was also a lot of variation in fundamental quantities such as word length, e.g., 16, 18, 20, 24, 32, 36, 40, 48, 60 bit words; see page 18 of Dictionary of Computer Languages.

Sammet makes the distinction: “One of the prime differences between assembly and higher level languages is that to date the latter do not have the capability of modifying themselves at execution time.”

Sammet then goes on to list the advantages and disadvantages of what she calls higher level languages. Most of the claimed advantages will be familiar to readers: “Ease of Learning”, “Ease of Coding and Understanding”, “Ease of Debugging”, and “Ease of Maintaining and Documenting”. The disadvantages included: “Time Required for Compiling” (the issue here is converting assembler source to object code is much faster than compiling a high-level language), “Inefficient Object Code” (the translation process was often a one-to-one mapping of what was written, e.g., little reuse of register contents), “Difficulties in Debugging Without Learning Machine Language” (symbolic debuggers are still in the future).

Sammet’s observation: “In spite of the fact that higher level languages have been with us for over 10 years, there has been relatively little quantitative or qualitative analysis of their advantages and disadvantages.” is still true 50 years later.

If you enjoy learning about lots of different languages, you will like this book. The discussion of specific languages contains copious examples, which for me brought things to life.

Sites such as the Internet Archive and Bitsavers make the book’s references accessible (there are a few I had not seen before), and offer readers a path to pre-Cambrian times.

Saul Rosen’s 1967 book “Programming Systems and Languages” is sometimes cited in discussions of programming language history. This book is a collection of papers that discuss a variety of languages and the operating systems that support them. Fewer languages are covered, but in more depth, along with lots of implementation details. Again, lots of interesting references.

A review of React Cookbook: Recipes for Mastering the React Framework

Paul Grenyer from Paul Grenyer

React Cookbook: Recipes for Mastering the React Framework

by David Griffiths and Dawn Griffiths
ISBN: 978-1492085843

This is a book of about 100 recipes across 11 sections. The sections range from the basics, such as creating React apps, routing and managing state to the more involved topics such as security, accessibility and performance.

I was especially pleased to see that the section on creating apps looked at create-react-app, nextjs and a number of other getting started tools and libraries, rather than just sticking with create-react-app.

I instantly liked the way each recipe laid out the problem it was solving, the solution and then had a discussion on different aspects of the solution. It immediately felt a bit like a patterns book. For example, after describing how to use create-react-app, the discussion section explains in more depth what it really is, how it works, how to use it to maintain your app and how to get rid of it.

As with a lot of React developers, the vast majority of the work I do is maintaining existing applications, rather than creating new ones from scratch. I frequently forget about how to setup things like routing scratch and would usually reach for Google. However, with a book like this I can see myself reaching for the easy to find recipes again and again.

Task backlog waiting times are power laws

Derek Jones from The Shape of Code

Once it has been agreed to implement new functionality, how long do the associated tasks have to wait in the to-do queue?

An analysis of the SiP task data finds that waiting time has a power law distribution, i.e., numTasks approx waitingTime^{-1}, where numTasks is the number of tasks waiting a given amount of time; the LSST:DM Sprint/Story-point/Story has the same distribution. Is this a coincidence, or does task waiting time always have this form?

Queueing theory analyses the properties of systems involving the arrival of tasks, one or more queues, and limited implementation resources.

A basic result of queueing theory is that task waiting time has an exponential distribution, i.e., not a power law. What software task implementation behavior is sufficiently different from basic queueing theory to cause its waiting time to have a power law?

As always, my first line of attack was to find data from other domains, hopefully with an accompanying analysis modelling the behavior. It’s possible that my two samples are just way outside the norm.

Eventually I found an analysis of the letter writing response time of Darwin, Einstein and Freud (my email asking for the data has not yet received a reply). Somebody writes to a famous scientist (the scientist has to be famous enough for people to want to create a collection of their papers and letters), the scientist decides to add this letter to the pile (i.e., queue) of letters to reply to, eventually a reply is written. What is the distribution of waiting times for replies? Yes, it’s a power law, but with an exponent of -1.5, rather than -1.

The change made to the basic queueing model is to assign priorities to tasks, and then choose the task with the highest priority (rather than a random task, or the one that has been waiting the longest). Provided the queue never becomes empty (i.e., there are always waiting tasks), the waiting time is a power law with exponent -1.5; this behavior is independent of queue length and distribution of priorities (simulations confirm this behavior).

However, the exponent for my software data, and other data, is not -1.5, it is -1. A 2008 paper by Albert-László Barabási ( detailed analysis)showed how a modification to the task selection process produces the desired exponent of -1. Each of the tasks currently in the queue is assigned a probability of selection, this probability is proportional to the priority of the corresponding task (i.e., the sum of the priorities/probabilities of all the tasks in the queue is assumed to be constant); task selection is weighted by this probability.

So we have a queueing model whose task waiting time is a power law with an exponent of -1. How well does this model map to software task selection behavior?

One apparent difference between the queueing model and waiting software tasks is that software tasks are assigned to a small number of priorities (e.g., Critical, Major, Minor), while each task in the model queue has a unique priority (otherwise a tie-break rule would have to be specified). In practice, I think that the developers involved do assign unique priorities to tasks.

Why wouldn’t a developer simply select what they consider to be the highest priority task to work on next?

Perhaps each developer does select what they consider to be the highest priority task, but different developers have different opinions about which task has the highest priority. The priority assigned to a task by different developers will have some probability distribution. If task priority assignment by developers is correlated, then the behavior is effectively the same as the queueing model, i.e., the probability component is supplied by different developers having different opinions and the correlation provides a clustering of priorities assigned to each task (i.e., not a uniform distribution).

If this mapping is correct, the task waiting time for a system implemented by one developer should have a power law exponent of -1.5, just like letter writing data.

The number of sprints that a story is assigned to, before being completely implemented, is a power law whose exponent varies around -3. An explanation of this behavior based on priority queues looks possible; we shall see…

The queueing models discussed above are a subset of the field known as bursty dynamics; see the review paper Bursty Human Dynamics for human behavior related aspects.

2-day More Concurrent Thinking class at CppCon 2022

Anthony Williams from Just Software Solutions Blog

I am excited to be going to CppCon again this year, where I will be running a 2-day class: More Concurrent Thinking in C++: Beyond the Basics.

The class is onsite at the conference venue in Aurora, Colorado, USA, on Saturday 10th September 2022 and Sunday 11th September 2022, immediately before the main conference.

I'll be teaching you to think about the synchronization properties of the constructs you use, and how to work with the low level facilities provided by the C++20 standard to build high level abstractions. We'll look at actors, thread pools and executors, and how to design code that works with those abstractions.

We'll look at how to use low-level atomic operations to build lock-free data structures, and the issues that can arise when doing so. We'll also look at how scalability concerns can impact your design choices.

Finally, we'll look at the important issue of testing multithreaded code, and the tools we can use to help us find the causes of problems.

If this sounds interesting, sign up for the class when you register for the main conference.

I'll also be doing a presentation on Tuesday: An Introduction to Multithreading in C++20. This is a whistle-stop tour of the multithreading features of C++20, briefly covering each feature and what it is used for, and which you should choose by default.

Whether you come to my class and presentation or not, I hope to see you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Outreachy August 2022 update

Andy Balaam from Andy Balaam's Blog

I had the pleasure of being a mentor this summer for an Outreachy internship for the Matrix organisation. Outreachy provides internships to people subject to systemic bias and impacted by underrepresentation in the technical industry where they are living.

Outreachy is a fantastic organisation doing a brilliant job to try and make our sometimes terrible industry a little bit better.

Outreachy logo

Mentoring was great fun, mainly because it was such a pleasure working with my awesome intern Usman. There is lots of support available for interns and mentors through Outreachy’s Zulip chat (when will we persuade them to use Matrix? ;-) so you always have somewhere to turn if you have questions.

If you want to read more about the internship from Usman’s point of view, check out his blog posts:

  • Outreachy Blog – Introducing Myself
  • Wrap-up: Summary of my journey to being an outreachy intern at Element

    We talked every day on video calls, and really enjoyed working together. Some days we would just chat, sometimes I would give pointers for things to try in the code, or people to talk to. Some days we worked through some code together, and that was the most fun. Usman is incredibly enthusiastic and bright, so it was very satisfying making suggestions and seeing him put them into practice.

    Success!

    The work went very well, and Usman succeeded in creating a prototype that will help us design the Favourite Messages feature:

    Note: the feature isn’t ready to be fully release because it needs to be implemented on mobile platforms as well as changing where it stores its information: currently we use the browser’s local storage, but we plan to store things in Matrix, meaning it is automatically synchronised between devices.

    Things that went well

    • Meeting every day: we talked on a short video call every morning. This meant if we misunderstood each other it was quickly resolved, without lots of time being wasted.
    • Having a clear list of tasks: we kept a tracking issue on Github. This meant were clear what Usman was supposed to be doing now, and what was coming next.
    • Being flexible: we worked together to change the list of tasks every week or so. This meant we were being realistic about what could be achieved, and able to change in response to things we found out, or feedback from others.
    • Getting design input: we talked to Element’s designers several times during the project, showing them prototypes and early implementations. This meant we didn’t waste time implementing things that would need redesign later.
    • Support for me: I was able to work with Thib, who is our Outreachy Matrix community organiser, especially during the selection process. This meant I was not making decisions in isolation, and had support if anything tricky came up.
    • The Element Web community: Usman got loads of support from our community. Special thanks to Šimon, Olivier, Shay and t3chguy for your help!
    • Element the company: Element paid for this internship, and gave great support to Usman, integrating him into all our systems, inviting him to introductory meetings etc. He had every opportunity to see what working at Element is like, and to make an impression on everyone here. Element did a great job here.

    Things I would do differently

    • Managing the contribution period: before the project began, applicants are invited to contribute to the projects, allowing us to choose an intern based on those contributions. I felt slightly disorganised at this stage, and there was a lot of activity in issues and pull requests in the project from applicants. I think I should have warned our community and explained what was going to happen up-front, and maybe enlisted help from people willing to triage the contributions a little. Contributions varied in quality and understanding level, so having some volunteers who were primed to spend a little more time explaining and helping contributors get started would have prevented this impinging on the time of the team as a whole. Nevertheless, our team responded really well, and we got some useful contributions, and I hope the contributors had a good experience too.
    • Integrating Usman into the team: we chose a project that was independent from what other team members were doing, meaning he mostly interacted with others when he needed help. While it is sensible to make sure interns are decoupled from the main work (because it’s hard to predict how much progress they will make, and they are going stop after their internship), I do also wish we could have found a project that gave more opportunity to work with other people, not just “stealing” their time to help out, but actually working together on shared pieces of work. This is a tricky one to figure out, but food for thought.

    Conclusions

    The experience of being a mentor was really fun, and I would recommend it to anyone working on an open source project.

    I would emphasise, though, that you need to put aside enough time: the internship will not be successful if you don’t make time to work with your intern, get to know them, and introduce them to your community. Since interns may be new to the world of work, or shy about taking your time, as a mentor, you need to take responsibility for giving them enough support.

    Final note: as a mentor, you are NOT responsible for the work going well! Your responsibility is to help and support your intern, and give them everything they need to be successful (including feedback about things that are not working well), but it is up to the the intern themself to do the work, and how much work gets done is going to be the combination of a number of factors, including the intern’s experience and abilities. Don’t worry if you don’t get as far as you expected – after all, that happens in nearly all software projects…

Microsoft C++ versions explained

Anders Schau Knatten from C++ on a Friday

Microsoft has five different version numbers to think about when it comes to C++. Here’s an attempt to explain what they all mean.

  • Visual Studio release year (the “marketing version number”), e.g. Visual Studio 2022
  • Visual Studio actual version number, e.g. Visual Studio 17.0
  • Visual C++ (MSVC) version, e.g. MSVC 14.30
  • Toolset version, e.g. toolset 143
  • Compiler version, e.g. cl.exe 19.30

Visual Studio versions

What most people will see first is the Visual Studio release year. You’ll download Visual Studio 2022, Visual Studio 2019 etc. These however also have a more normal major.minor versioning scheme, and they bump the major version for every release year. So for instance VS 2017 is version 15, VS 2019 is version 16, and VS 2022 is version 17. Note that the year and the major version are not correlated in any way, except that Visual Studio 2010 just happened to also be version 10.

Visual Studio also has minor releases of each major version. Some examples (there are more minor releases per major than shown here):

Yearversion
Visual Studio 201715.0
15.3
Visual Studio 201916.0
16.1
Visual Studio 202217.0
17.1

source: Wikipedia

Visual C++ versions

Microsoft Visual C++, aka MSVC, ships as a part of Visual Studio, but has its own versioning scheme. Importantly, the major number signifies ABI compatibility, so something compiled with MSVC at one major version number can be linked against something compiled with any other MSVC at the same major version. (Some restrictions apply.) The MSVC major version number luckily gets bumped a lot less often than the Visual Studio version itself. As of Visual Studio 2015, they have kept the MSVC major version at 14. The first digit of the minor version seems to be bumped for each major version of Visual Studio itself. The Visual C++ version number is also used for the Visual C++ Redistributable.

Some examples:

VS YearVS versionMSVC version
Visual Studio 201715.014.1
15.314.11
Visual Studio 201916.014.20
16.114.21
Visual Studio 202217.014.30
17.114.31

source: Wikipedia

C++ toolset versions

Closely related to the MSVC version number is the C++ toolset version number. I can’t find a good source for it, but from Microsoft’s article it seems that the toolset version is made up of the MSVC major version and the first digit of the MSVC minor version. Some examples:

VS YearVS versionMSVC versionToolset version
Visual Studio 201715.014.1141
15.314.11141
Visual Studio 201916.014.20142
16.114.21142
Visual Studio 202217.014.30143
17.114.31143

Source: Microsoft

The linker (link.exe) also uses the C++ toolset version number as its version number, so e.g. for toolset 14.32 I might see link.exe version 14.32.31332.0.

Compiler versions

Finally, there’s the compiler version, which is what cl.exe reports. E.g. 19.16.27048. The major.minor version scheme correlates with the _MSC_VER macro which you can check in your source code (godbolt). So e.g. cl.exe version 19.21 has _MSC_VER 1921. (I’ll be nice and count those as one version number.)

VS YearVS versionMSVC versionToolset versionCompiler version
Visual Studio 201715.014.114119.10
15.314.1114119.11
Visual Studio 201916.014.2014219.20
16.114.2114219.21
Visual Studio 202217.014.3014319.30
17.114.3114319.31

The _MSC_VER version number is incremented monotonically at each Visual C++ toolset update, so if you want to only compile some stuff if the compiler is new enough, you can do e.g. #if _MSC_VER >= 1930.

Appendix: Running out of version numbers

Interestingly, the scheme where they bump the first digit of the Visual C++ minor version for each major release of Visual Studio means that they can only have nine minor versions of MSVC per Visual Studio major version! And looking at wikipedia, it seems they actually ran out of toolset versions at the end of Visual Studio 2019 and reused 14.28 and 14.29 for the final four Visual Studio 2019 releases (Visual Studio 16.8 and 16.9 had MSVC 14.28, Visual Studio 16.10 and 16.11 had MSVC 14.29).

Patterns in the LSST:DM Sprint/Story-point/Story ‘done’ issues

Derek Jones from The Shape of Code

Projects that use Scrum as their project management framework estimate tasks (known as a user story, or just story) in units of Story-points. A collection of User stories are grouped together to be implemented during a Sprint (a time-boxed interval, often lasting 2-weeks).

What are Story-points, and how do they map to time (in hours and minutes)? For this post, let’s ignore these questions, simply assuming that the people who assign a story-point value to a story have some mapping in their head.

What is the average number of story-points in a story, and how does this average vary across teams? What is the distribution of number of stories estimated per sprint, how many are actually implemented, and how does this vary across teams?

The data required to answer these questions has not been publicly available, or rather public data is not known to me. Until this week, I had only known of a few public Jira repos where story-points were given for at most a few hundred stories.

The LSST Corporation, a not-for-profit involved in astronomy and physics research, has a Data Management (DM) project. The Jira repo for this project contains 26,671 ‘Done’ issues (as of Aug 2022), of which 11,082 (41.5%) have assigned story-points; there have been 469 sprints, which involved 33% of the issues. The start/end implementation date/time for stories is mostly rather granular, and not fine enough to be used to attempt to correlate individual stories with hours. I found this repo, and a couple of others, via the paper Story points changes in agile iterative development, and downloaded all available issues.

What patterns are present in the story-point and sprint data?

Story points are commonly thought of as being integer valued, but 28% of the values are non-integer. If any developers are using the Fibonacci scale, there are not enough to have a noticeable impact. The plot below shows the number of stories estimated to involve a given number of story-points (black pluses are non-integer values, which have been rounded to fit the regression model). The green curved line is a fitted biexponential (sum of two exponentials), with the two straight lines being the two component exponentials (code+data):

Number of stories estimated to involve a given number of story-points.

One exponential is dominant for stories assigned up to 10 story-points, and the second exponential for higher story-point values.

The development team decides to implement a story and allocates it to a sprint. A story may be reallocated to another sprint before the start of the original sprint, or after the sprint is finished when its implementation is incomplete or not yet started (the data does not allow for these cases to be distinguished). How many sprints is a story allocated to, before the story implementation is complete?

The plot below shows the number of stories allocated to a given number of sprints, with a fitted regression line of the form Stories approx Sprints^{-2.8} (code+data):

Number of stories assigned to a given number of distinct sprints.

So around 14% of stories are allocated to two sprints, 5% to three and 2% to four.

How many stories are assigned to a sprint? The plot below shows the number of sprints having a given number of stories assigned to them, and the number of sprints implementing a given number of stories; lines are fitted loess models (code+data):

Number of sprints assigned a given number of stories, and implementing a given number of stories.

Are the Story/Story-point/Sprint patterns found in the DM project likely to occur in other projects using Scrum?

I don’t know, but I hope so. Developing theories of software development processes requires that there be consistent patterns of behavior.

Not knowing what stories were assigned to a sprint at the start of the sprint, rather assigned earlier and then moved to another sprint, potentially undermines the sprint patterns. We will have to wait and see.

If anybody knows of any public Jira repos where a high percentage (say 40%) of the issues have been assigned story-points, please let me know (all the ones I know of on the Atlassian site contain a tiny percentage of story-points).

Impact of number of files on number of review comments

Derek Jones from The Shape of Code

Code review is often discussed from the perspective of changes to a single file. In practice, code review often involves multiple files (or at least pull-based reviews do), which begs the question: Do people invest less effort reviewing files appearing later?

TLDR: The number of review comments decreases for successive files in the pull request; by around 16% per file.

The paper First Come First Served: The Impact of File Position on Code Review extracted and analysed 219,476 pull requests from 138 Java projects on Github. They also ran an experiment which asked subjects to review two files, each containing a seeded coding mistake. The paper is relatively short and omits a lot of details; I’m guessing this is due to the page limit of a conference paper.

The plot below shows the number of pull requests containing a given number of files. The colored lines indicate the total number of code review comments associated with a given pull request, with the red dots showing the 69% of pull requests that did not receive any review comments (code+data):

Number of pull requests containing a given number of files, for all pull requests, and those receiving at least 1, 2, 5, and 10 comments.

Many factors could influence the number of comments associated with a pull request; for instance, the number of people commenting, the amount of changed code, whether the code is a test case, and the number of files already reviewed (all items which happen to be present in the available data).

One factor for which information is not present in the data is social loafing, where people exert less effort when they are part of a larger group; or at least I did not find a way of easily estimating this factor.

The best model I could fit to all pull requests containing less than 10 files, and having a total of at least one comment, explained 36% of the variance present, which is not great, but something to talk about. There was a 16% decline in comments for successive files reviewed, test cases had 50% fewer comments, and there was some percentage increase with lines added; number of comments increased by a factor of 2.4 per additional commenter (is this due to importance of the file being reviewed, with importance being a metric not present in the data).

The model does not include information available in the data, such as file contents (e.g., Java, C++, configuration file, etc), and there may be correlated effects I have not taken into account. Consequently, I view the model as a rough guide.

Is the impact of file order on number of comments a side effect of some unrelated process? One way of showing a causal connection is to run an experiment.

The experiment run by the authors involved two files, each containing one seeded coding mistake. The 102 subjects were asked to review the two files, with file order randomly selected. The experiment looks well-structured and thought through (many are not), but the analysis of the results is confused.

The good news is that the seeded coding mistake in the first file was much more likely to be detected than the mistake in the second file, and years of Java programming experience also had an impact (appearing first had the same impact as three years of Java experience). The bad news is that the model (a random effect model using a logistic equation) explains almost none of the variance in the data, i.e., these effects are tiny compared to whatever other factors are involved; see code+data.

What other factors might be involved?

Most experiments show a learning effect, in that subject performance improves as they perform more tasks. Having subjects review many pairs of files would enable this effect to be taken into account. Also, reviewing multiple pairs would reduce the impact of random goings-on during the review process.

The identity of the seeded mistake did not have a significant impact on the model.

Review comments are an important issue which is amenable to practical experimental investigation. I hope that the researchers run more experiments on this issue.