Overview of broad US data on IT job hiring/firing and quitting

Derek Jones from The Shape of Code

Software developers are employed by organizations and people change jobs, either voluntarily or not; every year a new batch of people join the workforce, e.g., new graduates. Governments track employment activities for a variety of reasons, e.g., tax collection, and monitoring labour supply and demand (for the purposes of planning).

The US Bureau of Labor Statistics’ publishes a monthly summary of their Job Openings and Labor Turnover Survey. What can be learned about software development employment from this data (description)?

The data starts in December 2000, with each row contains a monthly count of Job Openings, Hires, Quits, Layoffs and Discharges, and totals, along with one of 21 major non-farm industry codes or one of the 5 government codes (the counts are broken out by State). I’m guessing that software developers are assigned the Information code (i.e., 510000), but who is to say that some have not been classified with the code for, say, Construction or Education and health services. The Information code will cover a lot more than just software developers; I’m trading off broad IT coverage for monthly details on employment turnover (software developer specific information is available, but it comes without the turnover information). The Bureau of Labor Statistics make available a huge quantity of information, and understanding how it all fits together would probably require me to spend several months learning my way around (I have already spent a week or two over the years), so I’m sticking with a prebuilt dataset.

The plot below shows the aggregated monthly counts (i.e., all states) of Job Openings, Hires, Quits, Layoffs and Discharges for the Information industry code (code+data):

Aggregated monthly counts of Job Openings, Hires, Quits, Layoffs and Discharges for the Information industry code, from 2000 to 2022.

The general trend follows the ups and downs of the economy, there is a huge spike in layoffs in early 2020 (the start of COVID), and Job Openings often exceeding Hires (which I did not expect).

These counts have the form of a time-series, which leads to the questions about repeating patterns in the sequence of values? The plot below shows the autocorrelation of the four employment counts (code+data):

Autocorrelation of Job Openings, Hires, Quits, Layoffs time series for the Information code.

The spike in Hires at 12-months is too large to be just be new graduates entering the workforce; perhaps large IT employers have annual reviews for all employees at the same time every year, causing some people to quit and obtain new jobs (Quits has a slightly larger spike at 12-months). Why is there a regular 3-month cycle for Job Openings? The negative correlation in Layoffs at one & two months is explained by companies laying off a batch of workers one month, followed by layoffs in the following two months being lower than usual.

I don’t know much about employment practices, so I won’t speculate any more. Comments welcome.

Are there any interest cross-correlations between the pairs of time-series?

The plot below shows four pairs of cross correlations (code+data):

Cross correlation between the pairs of time series Hires/Layoffs, Quits/Layoffs, Job Openings/Hires, and Hires/Quits.

Hires & Layoffs shows a scattered pattern of Hires preceding Layoffs (to be expected), and the bottom left shows there is a pattern of Quits preceding Layoffs (are people searching for steadier employment when layoffs loom?). Top right shows a pattern of Job Openings following Hires (I’m clutching at straws for this; is Hires a proxy for Quits, the cross correlation of Job Openings & Quits does have Job Openings leading), the bottom right shows the pattern of Hires leading Quits.

Nothing in this analysis surprised me, but then it is rather basic and broad brush. These results are the start of an analysis of the IT employment ecosystem; one that probably won’t progress far because of a lack of data and interest on my part.

ARD & Winterfyleth at the Bread Shed

Paul Grenyer from Paul Grenyer

ARD

Following a nightmare which is parking in Manchester and getting a meal on a Saturday night without booking, we walked into the Bread Shed just as ARD were getting going, minus my vinyl and CDs for signing!. The masterpiece which is Take Up My Bones was instantly recognisable, as were composer and multi instrumentalist Mark Deeks and fellow Winterfyleth band mate Chris Naughton, both on guitar. The latter was centre stage, where surely Deeks should have been?

From the off the band, who were put together to perform an album which was never intended to be performed live, were a little loose with the drums too prominent and the guitars not clear enough. There appeared to be a lot retuning necessary, especially from Chris and the lead guitarist who appeared hidden away a lot of the time. This didn’t really detract from enjoyment of the incredible compositions from the album.  By the time the final 10 minutes, consisting of Only Three Shall Know, came along something had changed, the band was as tight as anything and I wished they could have started again from the beginning. 45 minutes had flown by and I’ll certainly go and see them again.

Winterfyleth


I think I’ve seen Winterfyleth four times now, including the set which became their live album recorded at Bloodstock and earlier this year supporting Emperor at Incineration Fest. They never disappoint.

Winterfyleth are one of those bands that are so consistent with their music, without being boring or repetitive, that it doesn’t matter what they play or how familiar I am with the songs, it’s just incredible to listen to. Having said that, disappointingly, they didn’t play A Valley Thick With Oaks, which is my favourite. Who can resist singing along “In the heart of every Englishman…”? However, I did come away with a new favourite in Green Cathedral!

We only got an hour, but at least they didn’t bugger about going off and coming back for an encore. There were old songs, new songs and never before played live songs. Loved every second of it and, for the first time for me, the final song wasn’t preceded with “Sadly time is short and our songs are long, so this is our last one.” Until next time!

 

Optimal sizing of a product backlog

Derek Jones from The Shape of Code

Developers working on the implementation of a software system will have a list of work that needs to be done, a to-do list, known as the product backlog in Agile.

The Agile development process differs from the Waterfall process in that the list of work items is intentionally incomplete when coding starts (discovery of new work items is an integral part of the Agile process). In a Waterfall process, it is intended that all work items are known before coding starts (as work progresses, new items are invariably discovered).

Complaints are sometimes expressed about the size of a team’s backlog, measured in number of items waiting to be implemented. Are these complaints just grumblings about the amount of work outstanding, or is there an economic cost that increases with the size of the backlog?

If the number of items in the backlog is too low, developers may be left twiddling their expensive thumbs because they have run out of work items to implement.

A parallel is sometimes drawn between items waiting to be implemented in a product backlog and hardware items in a manufacturer’s store waiting to be checked-out for the production line. Hardware occupies space on a shelf, a cost in that the manufacturer has to pay for the building to hold it; another cost is the interest on the money spent to purchase the items sitting in the store.

For over 100 years, people have been analyzing the problem of the optimum number of stock items to order, and at what stock level to place an order. The economic order quantity gives the optimum number of items to reorder, Q (the derivation assumes that the average quantity in stock is Q/2), it is given by:

Q=sqrt{{2DK}/h}, where D is the quantity consumed per year, K is the fixed cost per order (e.g., cost of ordering, shipping and handling; not the actual cost of the goods), h is the annual holding cost per item.

What is the likely range of these values for software?

  • D is around 1,000 per year for a team of ten’ish people working on multiple (related) projects; based on one dataset,
  • K is the cost associated with the time taken to gather the requirements, i.e., the items to add to the backlog. If we assume that the time taken to gather an item is less than the time taken to implement it (the estimated time taken to implement varies from hours to days), then the average should be less than an hour or two,
  • h: While the cost of a post-it note on a board, or an entry in an online issue tracking system, is effectively zero, there is the time cost of deciding which backlog items should be implemented next, or added to the next Sprint.

    If the backlog starts with n items, and it takes t seconds to decide whether a given item should be implemented next, and f is the fraction of items scanned before one is selected: the average decision time per item is: avDecideTime={f*n*(f*n+1)/2}*t seconds. For example, if n=50, pulling some numbers out of the air, f=0.5, and t=10, then avDecideTime=325, or 5.4 minutes.

    The Scrum approach of selecting a subset of backlog items to completely implement in a Sprint has a much lower overhead than the one-at-a-time approach.

If we assume that K/h==1, then Q=sqrt{2*1000}=44.7.

An ‘order’ for 45 work items might make sense when dealing with clients who have formal processes in place and are not able to be as proactive as an Agile developer might like, e.g., meetings have to be scheduled in advance, with minutes circulated for agreement.

In a more informal environment, with close client contacts, work items are more likely to trickle in or appear in small batches. The SiP dataset came from such an environment. The plot below shows the number of tasks in the backlog of the SiP dataset, for each day (blue/green) and seven-day rolling average (red) (code+data):

Tasks waiting to be implemented, per day, over duration of SiP projects.

Visual Lint 8.0.10.355 has been released

Products, the Universe and Everything from Products, the Universe and Everything

Visual Lint 8.0.10.355 has now been released.

This is a maintenance update for Visual Lint 8.0, and includes the following changes:

  • The VisualLintConsole command line parser now accepts spaces between the name and value of a command line switch. As a result, switches of the form. /switchname = <value> are now accepted, whereas previously only /switchname=<value> would work.

  • Entries in the file list within manually created custom project (.vlproj) files can now use wildcards. Note however that if the project file is subsequently edited by VisualLintGui, the file list will be expanded to explicitly reflect the files actually found. As such, this feature is intended for use with hand-created .vlproj files (which are really just .ini files, and as such lend themselves nicely to hand-editing).

  • Fixed a bug opening analysis tool manual PDFs located on UNC drives.

  • Fixed a bug in VisualLintGui which could cause some file save operations to fail.

  • Updated the PC-lint Plus compiler indirect file co-rb-vs2022.lnt to support Visual Studio 2022 v17.2.6.

  • Updated the PC-lint Plus compiler indirect fileco-rb-vs2019.lnt to support Visual Studio 2019 v16.11.17.

  • Updated the PC-lint Plus Boost library indirect file lib-rb-boost.lnt.

  • Removed nonfunctional "Print" menu commands from VisualLintGui.

  • Updates to various help topics.

Download Visual Lint 8.0.10.355

Tips for contenteditables

Andy Balaam from Andy Balaam&#039;s Blog

I’ve been working a bit with contenteditable tags in my HTML, and learnt a couple of things, so here they are.

Why can’t I see the cursor inside an empty contenteditable?

If you make an editable div like this:

<div contenteditable="true">
</div>

and then try to focus it, then sometimes, in some browsers, you won’t see a cursor.

You can fix it by adding a <br /> tag:

<div contenteditable="true">
<br />
</div>

Now you should get a cursor and be able to edit text inside.

Programmatically selecting text inside a contenteditable

It’s quite tricky to get the browser to select anything. Here’s a quick recipe for that:

<div id="ce" contenteditable="true">
Some text here
</div>
<script>
const ce = document.getElementById("ce");
const range = document.createRange();
range.setStart(ce.firstChild, 6);
range.setEnd(ce.lastChild, 10);
const sel = document.getSelection();
sel.removeAllRanges();
sel.addRange(range);
</script>

This selects characters 6 to before-10, i.e. the word “text”. To select more complicated stuff inside tags etc. you need to find the actual DOM nodes to pass in to setStart and setEnd, which is quite tricky.

Whenever you setHTML on a contenteditable, add a BR tag

If you use setHTML on a contenteditable you should always append a <br /> on the end. It doesn’t appear in any way, and it prevents weird problems.

Most notably, if you want to have an empty line at the end of your text, you need two <br /> tags, like this:

<div id="ce" contenteditable="true">
Some text here
</div>
<script>
const ce = document.getElementById("ce");
ce.innerHTML = "a<br /><br />"
</script>

If you only include one br tag, there will be no empty line at the end.

Selecting the end of a contenteditable

It’s surprisingly tricky to put the cursor at the end of a contenteditable div but here is a recipe that works:

const range = document.createRange();
range.selectNodeContents(ce);
range.collapse();
const sel = document.getSelection();
sel.removeAllRanges();
sel.addRange(range);

(Where ce is our contenteditable div.)

More tips?

Any more tips? Drop them in the comments and I’ll include them.

Career progression: an invisible issue in software development

Derek Jones from The Shape of Code

Career progression is an important issue in the development of some software systems, but its impact is rarely discussed, let along researched. A common consequence of career progression is that a project looses a member of staff, e.g., they move to work on a different project, or leave the company. Hiring staff and promoting staff are related neglected research areas.

Understanding the initial and ongoing development of non-trivial software systems requires an understanding of the career progression, and expectations of progression, of the people working on the system.

Effectively working on a software system requires some amount of knowledge of how it operates, or is intended to operate. The loss of a person with working knowledge of a system reduces the rate at which a project can be further developed. It takes time to find a suitable replacement, and for that person’s knowledge of the behavior of the existing system to reach a workable level.

We know that most software is short-lived, but know almost nothing about the involvement-lifetime of those who work on software systems.

There has been some research studying the durations over which people have been involved with individual Open source projects. However, I don’t believe the findings from this research, because I think that non-paid involvement on an Open source project has very different duration and motivation characteristics than a paying job (there are also data cleaning issues around the same person using multiple email addresses, and people working in small groups with one person submitting code).

Detailed employment data, in bulk, has commercial value, and so is rarely freely available. It is possible to scrape data from the adverts of job websites, but this only provides information about the kinds of jobs available, not the people employed.

LinkedIn contains lots of detailed employment history, and the US courts have ruled that it is not illegal to scrape this data. It’s on my list of things to do, and I keep an eye out for others making such data available.

The National Longitudinal Survey of Youth has followed the lives of 10k+ people since 1979 (people were asked to detail their lives in periodic surveys). Using this data, Joseph, Boh, Ang, and Slaughter investigated the job categories within the career paths of 500 people who had worked in a technical IT role. The plot below shows the career paths of people who had spent at least five years working in an IT role (code+data):

The job categories contained within the seven career paths in which people spent at least five years working in a technical IT role.

Employment history provides an upper bound for the time that a person is likely to have worked on a project (being employed to work on an Open source project while, over time, working at multiple companies is an edge case).

A company may have employees simultaneously working on multiple projects, spending a percentage of their time on each. How big a resource impact is the loss of such a person? Were they simply the same kind of cog in multiple projects, or did they play an important synchronization role across projects? Details on all the projects a person worked on would help answer some questions.

Building a software system involves a lot more than writing the code. Technical managers working on high level, broad brush, issues. The project knowledge that technical managers have contributes to ongoing work, and the impact of loosing a technical manager is probably more of a longer term issue than loosing a coding-developer.

There are systems that are developed and maintained by essentially one person over many years. These get written about and celebrated, but are comparatively rare.

One of the more reliable ways of estimating developer productivity is to measure the impact of them leaving a project.

Glory! Hammer!

Paul Grenyer from Paul Grenyer

Glory! Hammer! Were fantastic!

They played well and were lots of fun as you’d expect.  I mean who doesn’t like a gig to start with a cardboard cutout of Tom Jones and Delilah playing on the PA. The band were all dressed up - it must have been very hot - and playing their parts.

I did find some of the gaps between songs and the interplay with the audience felt a little too Steel Panther. It was too frequent, superfluous and added time to a set which could have been shorter.

Sozos Michael is a phenomenal singer and makes it seem effortless and perfect. I’m a big fan of widdly guitar and it doesn’t stand out as much on record as it did live which was a really nice surprise.

It’ll be great to see them again when the promised new album is out and they tour again.




Programming Languages: History and Fundamentals

Derek Jones from The Shape of Code

Programming Languages: History and Fundamentals by Jean E. Sammet is often cited in discussions of language history, but very rarely read (I appreciate that many oft cited books have not been read by those citing them, but age further reduces the likelihood that anybody has read this book; it was published in 1969). I read this book as an undergraduate, but did not think much of it. For around five years it has been on my list of books to buy, should a second-hand copy become available below £10 (I buy anything vaguely interesting below this price, with most ending up left on trains or the book table of coffee shops).

Thanks to Adam Gashlin the Internet Archive now contains a downloadable copy.

The list of 120 languages covered contains a handful of the 28 languages covered in an article from 1957. Sammet says that of the 120, 20 are already dead or on obsolete computers (i.e., it is unlikely that another compiler will be written), and that about 15 are widely used/implemented).

Today, the book is no longer a discussion of the recent past, but a window in to the Cambrian explosion of programming languages that happened in the 1960s (almost everything since then has been a variation on a theme); languages from the 1950s are also included.

How does the material appear to me from a 2022 vantage-point?

The organization of the book reminded me that programming languages were once categorized by application domain, i.e., scientific/engineering users, business users, and string & list processing (i.e., academic users). This division reflected the market segmentation for computer hardware (back then, personal computers were still in the realm of science fiction). Modern programming language books (e.g., Scott’s “Programming Language Pragmatics”) often organize material based on implementation details, e.g., lexical analysis, and scoping rules.

The overview of programming languages given in the first three chapters covers nearly all the basic issues that beginners are taught today, but the emphasis is different (plus typographical differences, such as keyword spelt ‘key word’).

Two major language constructs are missing: Dynamic storage allocation is not discussed: Wirth’s book Algorithms + Data Structures = Programs is seven years in the future, and Kernighan and Ritchie’s The C Programming Language nine years; Simula gets a paragraph, but no mention of the object-oriented concepts it introduced.

What is a programming language, and what are the distinguishing features that make some of them high-level programming languages?

These questions may sound pointless or obvious today, but people used to spend lots of time arguing over what was, or was not, a high-level language.

Sammet says: “… the first characteristic of a programming language is that the user can write a program without knowing much—if anything—about the physical characteristics of the machine on which the program is to be run.”, and goes on to infer: “… a major characteristic of a programming language is that there must be a reasonable potential of having a source program written in that language run on two computers with different machine codes without rewriting the source program. … In most programming languages, some—but often very little—rewriting of the source program is necessary.”

The reason that some rewriting of the source was likely to be needed is that there were often a lot of small variations between compilers for the same language. Compilers tended to be bespoke, i.e., the Fortran compiler for the X cpu running OS Y was written specifically for that combination. Retargetting an existing compiler to a new cpu or OS was much talked about, but it was more fun to write a new compiler (and anyway, support for new features was needed, and it was simpler to start from scratch; page 149 lists differences in Fortran compilers across IBM machines). It didn’t help that there was also a lot of variation in fundamental quantities such as word length, e.g., 16, 18, 20, 24, 32, 36, 40, 48, 60 bit words; see page 18 of Dictionary of Computer Languages.

Sammet makes the distinction: “One of the prime differences between assembly and higher level languages is that to date the latter do not have the capability of modifying themselves at execution time.”

Sammet then goes on to list the advantages and disadvantages of what she calls higher level languages. Most of the claimed advantages will be familiar to readers: “Ease of Learning”, “Ease of Coding and Understanding”, “Ease of Debugging”, and “Ease of Maintaining and Documenting”. The disadvantages included: “Time Required for Compiling” (the issue here is converting assembler source to object code is much faster than compiling a high-level language), “Inefficient Object Code” (the translation process was often a one-to-one mapping of what was written, e.g., little reuse of register contents), “Difficulties in Debugging Without Learning Machine Language” (symbolic debuggers are still in the future).

Sammet’s observation: “In spite of the fact that higher level languages have been with us for over 10 years, there has been relatively little quantitative or qualitative analysis of their advantages and disadvantages.” is still true 50 years later.

If you enjoy learning about lots of different languages, you will like this book. The discussion of specific languages contains copious examples, which for me brought things to life.

Sites such as the Internet Archive and Bitsavers make the book’s references accessible (there are a few I had not seen before), and offer readers a path to pre-Cambrian times.

Saul Rosen’s 1967 book “Programming Systems and Languages” is sometimes cited in discussions of programming language history. This book is a collection of papers that discuss a variety of languages and the operating systems that support them. Fewer languages are covered, but in more depth, along with lots of implementation details. Again, lots of interesting references.

A review of React Cookbook: Recipes for Mastering the React Framework

Paul Grenyer from Paul Grenyer

React Cookbook: Recipes for Mastering the React Framework

by David Griffiths and Dawn Griffiths
ISBN: 978-1492085843

This is a book of about 100 recipes across 11 sections. The sections range from the basics, such as creating React apps, routing and managing state to the more involved topics such as security, accessibility and performance.

I was especially pleased to see that the section on creating apps looked at create-react-app, nextjs and a number of other getting started tools and libraries, rather than just sticking with create-react-app.

I instantly liked the way each recipe laid out the problem it was solving, the solution and then had a discussion on different aspects of the solution. It immediately felt a bit like a patterns book. For example, after describing how to use create-react-app, the discussion section explains in more depth what it really is, how it works, how to use it to maintain your app and how to get rid of it.

As with a lot of React developers, the vast majority of the work I do is maintaining existing applications, rather than creating new ones from scratch. I frequently forget about how to setup things like routing scratch and would usually reach for Google. However, with a book like this I can see myself reaching for the easy to find recipes again and again.