September 2021 – ACCU World of Code

[I’m sure there is something else going on here but on the off-chance someone else is also observing this and also lost at least they’ll know they’re not alone.]

We have a GitLab project pipeline that started out as a monolithic job but over the last 9 months has slowly been parallelized and now runs as over 150 jobs spread out across a cluster of 4 fairly decent [1] machines with 8 to 10 concurrent jobs per host. More recently we’ve started seeing the PowerShell Expand-Archive cmdlet failing randomly up to 5% of the time with the following error:

Remove-Item : Cannot find path {...} because it does not exist.

The line of code highlighted in the error is:

$expandedItems | % { Remove-Item $_ -Force -Recurse }

If you google this message it suggests this probably isn’t the real error but a problem with the cmdlet trying to clean-up after failing to extract the contents of the .zip file. Sadly the reason why the extraction might have failed in the first place is now lost.

Investigation

While investigating this error message I ran across two main hits – one from Stack Overflow and the other on the PowerShell GitHub project – both about hitting the classic long path problem in Windows. In our case the extracted paths, even including the build agent root, is still only 100 characters so well within the limit as the archive only has one subfolder and the filenames are short.

Also the archive is built with it’s companion cmdlet Compress-Archive so I doubt it’s an impedance mismatch in our choice of tools.

My gut reaction to anything spurious like this is that it’s the virus scanner (AV) [2]. Sadly I have no direct control over the virus scanner product choice or its configuration. In this instance the machines have Trend Micro whereas the other build agents I’ve built are VMs and have Windows Defender [3], but their load is also much lower. I managed to get the build folder excluded temporarily but that appears to have had no effect and nothing was logged in the AV to say it had blocked anything. (The “behaviour monitoring” in modern AV products often gets triggered by build tools which is annoying.)

After discounting the obvious and checking that memory exhaustion also wasn’t a factor as the memory load for the jobs is variable and the worst case loading can cause the page-file to be used, I wondered if there the problem lay with the GitLab runner cache somehow.

Corrupt Runner Cache?

To avoid downloading the .zip file artefact for every job run we utilise the GitLab runner local cache. This is effectively a .zip file of a packages folder in the project working copy that gets packed up and re-used in the other jobs on the same machine which, given our level of concurrency, means it’s constantly in use. Hence I wondered if our archive was being corrupted when the cache was being unpacked as I’ve seen embedded .zip files cause problems in the past for AV tools (even though it supposedly shouldn’t have been touching the folder). So I added a step to test our archive’s integrity before unpacking it by using 7-Zip as there doesn’t appear to be a companion cmdlet Test-Archive. I immediately saw the integrity test pass but the Expand-Archive step fail a few times so I’m pretty sure the problem is not archive corruption.

Workaround

The workaround which I’ve employed is to use 7-Zip for the unpacking step too and so far we’ve seen no errors at all but I’m left wondering why Expand-Archive was intermittently failing. Taking an extra dependency on a popular tool like 7-Zip is hardly onerous but it bumps the complexity up very slightly and needs to be accounted for in the docs / scripts.

In my 2017 post Fallibility I mentioned how I once worked with someone who was more content to accept they’d found an undocumented bug in the Windows CopyFile() function than believe there was a flaw in their code or analysis [4]. Hence I feel something as ubiquitous as Expand-Archive is unlikely to have a decompression bug and that there is some piece of the puzzle here that I’m missing. Maybe the AV is still interfering in some way that isn’t triggered by 7-Zip or the transient memory pressure caused by the heavier jobs is having an impact?

Given the low cost of the workaround (use 7-Zip instead) the time, effort and disruption needed to run further experiments to explore this problem further is sadly too high. For the time being annecdata is the best I can do.

[1] 8 /16 cores, 64 / 128 GB RAM, and NVMe based disks.

[2] I once did some Windows kernel debugging to help prove an anti-virus product update was the reason our engine processes where not terminating correctly under low memory conditions.

[3] Ideally servers shouldn’t need anti-virus tools at all but the principle of Defence in Depth suggests the minor performance impact is worth it to potentially help slow lateral movement.

[4] TL;DR: I quickly showed it was the latter at fault not the Windows API.

September 28, 2021

7 habits of highly effective software development?

Allan Kelly from Allan Kelly

“Most of us think we don’t have enough time to exercise. What a distorted paradigm! We don’t have not to.” Stephen R. Covey

Try reading that quote again and substituting the word “refactor” for “exercise.” Or try substitute the words “test first”, or “technical excellence” for “exercise.”

It was Craig Girvan of Head Forwards at Agile on the Beach this month who pointed out to me that one of the most famous management books of all time actually contains an edict to pursue technical excellence and refactoring.

Read this snippet form The 7 Habits of Highly Effective People and as you do so think about software development, and specifically the technical quality of the code being cut:

“We are instruments of our own performance, and to be effective, we need to recognize the importance of taking time to sharpen the saw”

Whether you think of the skills of the engineers building the system, the system itself, the technology which powers the system or the process that build the code into an executable there is a resonance.

On of the things Covey emphasis in the book is that the be effective one needs not just productive capacity (“PC”, the capability to produce something) but to maintain that ability and enhance that capacity. Hence his advice to: sharpen the saw. That means using some of your PC to create more PC in future, grow the pie if you like. This is a theme he returns too many times in the book.

And that is exactly the thinking we need to put behind our software development teams: it is not just about producing something for today, it is about increasing our capability for tomorrow. Of course there is a balance, one needs to both produce today and enhance for tomorrow, find the sweat spot is hard.

In the race to deliver value today we sometimes loose that. We forget that enhancing our capability creates value because it helps us create more value tomorrow – that capability is itself valuable. In part the problem is because investing in capability enhancements has a longer payback period, the return on those investments will not be seen immediately while directing our efforts to today will deliver returns real soon.

The old “jam today or more jam tomorrow” problem. It is a balance, and getting that balance right is incredibly difficult.

But it is exactly that philosophy that lead Microsoft and Amazon to reinvest all their profits for many years. Rather than pay shareholders dividends they prefer to invest in themselves so their future capacity is greater. And because of that promise their share rise in value and shareholders benefit more than if they had paid dividends.

It’s nearly 20 years since I read 7 Habits but after Craig’s observation I fished out my copy. What struck me was how the 7 Habits themselves could be seen as a software development method in their own right:

Be proactive: teams are experts in delivering useful digital products, they should be finding what is needed and working to deliver it. Simply doing what you are told is not enough. The days of sitting around waiting for a requirements document or specification are over.
Begin with the end in mind: what could be more outcome focused than that? It’s not about the myriad of stories and features you will develop, its the ultimate goals that is important
Put first things first: some will see this as a mandate to design the thing up front, I don’t, I see this as an instruction to start delivering value and testing hypothesis immediately. For Covey it is simply about prioritisation, “organize and execute around priorities” – simply decide what are the priorities today and get on with them.
Think Win/Win: too often in development we frame decisions in confrontational terms, “Users v. developers”, “The business v. coders”, “Programmers v. testers”. One side “wins” at the others expense. We need to stop that, and we need to stop seeing the divisions. (Similar to David Cote’s argument in Winning now, Winning later.)
Seek to understand first, then be understood: what a brilliant way of saying “Listen to customers” and then frame technical discussions in terms the audience will understand.
Synergize: this overlaps with #4 in my mind. Covey says it is about recognising that the whole is greater than the sum of its parts. What better description of a software system could you want? All those little parts, functions, classes, modules, all working together to produce a useful whole. Yet this is one of the most difficult problems in software engineering, approaching it from either the parts or the whole creates problems. Instead we need to build something small that works, some parts that work together, and then grow it, organically.
Sharpen the saw: work to have more capability tomorrow than you have today, which is where we came in

So maybe 7 Habits is a development method in disguise, or maybe its a way of thinking that should inform our approach. In fact, as I said, I read the book nearly 20 years ago, I suspect much of this has seeped into my general thinking and, without me knowing it, informed my approach.

The book may have become a cliche itself but I would still recommend reading 7 Habits.

Saw images from Luke Milburn on Wikicommons, Creative Commons License.

Subscribe to my blog newsletter and download Continuous Digital for free – normal price $9.99/£9.95/€9.95

The post 7 habits of highly effective software development? appeared first on Allan Kelly.

September 28, 2021

7 habits of highly effective software development?

Allan Kelly from Allan Kelly, Software Strategy

“Most of us think we don’t have enough time to exercise. What a distorted paradigm! We don’t have not to.” Stephen R. Covey

Try reading that quote again and substituting the word “refactor” for “exercise.” Or try substitute the words “test first”, or “technical excellence” for “exercise.”

Read this snippet form The 7 Habits of Highly Effective People and as you do so think about software development, and specifically the technical quality of the code being cut:

“We are instruments of our own performance, and to be effective, we need to recognize the importance of taking time to sharpen the saw”

The old “jam today or more jam tomorrow” problem. It is a balance, and getting that balance right is incredibly difficult.

Be proactive: teams are experts in delivering useful digital products, they should be finding what is needed and working to deliver it. Simply doing what you are told is not enough. The days of sitting around waiting for a requirements document or specification are over.
Begin with the end in mind: what could be more outcome focused than that? It’s not about the myriad of stories and features you will develop, its the ultimate goals that is important
Put first things first: some will see this as a mandate to design the thing up front, I don’t, I see this as an instruction to start delivering value and testing hypothesis immediately. For Covey it is simply about prioritisation, “organize and execute around priorities” – simply decide what are the priorities today and get on with them.
Think Win/Win: too often in development we frame decisions in confrontational terms, “Users v. developers”, “The business v. coders”, “Programmers v. testers”. One side “wins” at the others expense. We need to stop that, and we need to stop seeing the divisions. (Similar to David Cote’s argument in Winning now, Winning later.)
Seek to understand first, then be understood: what a brilliant way of saying “Listen to customers” and then frame technical discussions in terms the audience will understand.
Synergize: this overlaps with #4 in my mind. Covey says it is about recognising that the whole is greater than the sum of its parts. What better description of a software system could you want? All those little parts, functions, classes, modules, all working together to produce a useful whole. Yet this is one of the most difficult problems in software engineering, approaching it from either the parts or the whole creates problems. Instead we need to build something small that works, some parts that work together, and then grow it, organically.
Sharpen the saw: work to have more capability tomorrow than you have today, which is where we came in

The book may have become a cliche itself but I would still recommend reading 7 Habits.

Saw images from Luke Milburn on Wikicommons, Creative Commons License.

Subscribe to my blog newsletter and download Continuous Digital for free – normal price $9.99/£9.95/€9.95

The post 7 habits of highly effective software development? appeared first on Allan Kelly, Software Strategy.

September 27, 2021

New Job at Element (Matrix)

Andy Balaam from Andy Balaam's Blog

I started a new job today at Element!

It has been a long-standing ambition of mine to work in Free and Open Source software, and I am very excited to work for a company that is the main developer of a really important project: the Matrix communication network.

I don’t know much about what I’ll be doing yet, but finding an open source company with a decent business model that is prepared to pay me is very exciting. The fact that they have offices that are close enough for me to go for is another huge bonus.

Wish me luck, and I’ll let you know what I’m working on when it becomes more clear.

September 27, 2021September 27, 2021

Lose the Source Luke?

Chris Oldwood from The OldWood Thing

We were writing a new service to distribute financial pricing data around the trading floor as a companion to our new desktop pricing tool. The plugin architecture allowed us to write modular components that could tap into the event streams for various reasons, e.g. provide gateways to 3rd party data streams.

Linking New to Old

One of the first plugins we wrote allowed us to publish pricing data to a much older in-house data service which had been sat running in the server room for some years as part of the contributions system. This meant we could eventually phase that out and switch over to the new platform once we had parity with it.

The plugin was a doddle to write and we quickly had pricing data flowing from the new service out to a test instance of the old service which we intended to leave running in the background for soak testing. As it was an in-house tool there was no installer and my colleague had a copy of the binaries lying around on his machine [1]. Also he was one of the original developers so knew exactly what he was doing to set it up.

A Curious Error Message

Everything seemed to be working fine at first but as the data volumes grew we suddenly noticed that the data feed would eventually hang after a few days. In the beginning we were developing the core of the new service so quickly it was constantly being upgraded but now the pace was slowing down the new service was alive for much longer. Given how mature the old service was we assumed the issue was with the new one. Also there was a curious message in the log for the old service about “an invalid transaction ID” before the feed stopped.

While debugging the new plugin code my colleague remembered that the Transaction ID meant the message sequence number that goes in every message to allow for ordering and re-transmission when running over UDP. The data type for that was a 16-bit unsigned integer so it dawned on us that we had probably messed up handling the wrapping of the Transaction ID.

Use the Source Luke

Given how long ago he last worked on the old service he couldn’t quite remember what the protocol was for resetting the Transaction ID so we decided to go and look at the old service source code to see how it handled it. Despite being at the company for a few years myself this all pre-dated me so I left my colleague to do the rummaging.

Not long after my colleague came back over to my desk and asked if I might know where the source code was. Like so many programmers in a small company I was a part-time sysadmin and generally looked after some of servers we used for development duties, such as the one where our Visual SourceSafe repository lived that contained all the projects we’d ever worked on since I joined.

The VCS Upgrade

When I first started at the company there were only a couple of programmers not working on the mainframe and they wrote their own version control system. It was very Heath Robinson and used exclusive file locks to side-step the problem of concurrent changes. Having been used to a few VCS tools by then such as PVCS, Star Versions, and Visual SourceSafe I suggested that we move to a 3rd party VCS product as we needed more optimistic concurrency controls as more people were going to join the team. Given the MSDN licenses we already had along with my own experience Visual SourceSafe (VSS) seemed like a natural choice back then [2].

Around the same time the existing development server was getting a bit long in the tooth so the company forked out for a brand new server and so I set-up the new VSS repository on that and all my code went in there along with all the subsequent projects we started. None of the people that joined after me ever touched any of the old codebase or VCS as it was so mature it hadn’t needed changing in some time and anyway the two original devs where still there to look after it.

The Office Move

A couple of years after I joined, the owners of the lovely building the company had been renting for the last few decades decided they wanted to gut and renovate it as the area in London where we were based was getting a big makeover. Hence we were forced to move to new premises about half a mile away. The new premises were nice and modern and I no longer had the vent from the portable air-conditioning machine from one of the small server rooms pumping out hot air right behind my desk [3].

When moving day came I made sure the new server with all our stuff on got safely transported to the new office’s server room so that we ready to go again on Monday morning. As we stood staring around the empty office floor my colleague pointed to the old development server which had lay dormant in the corner and asked me (rhetorically) whether we should even bother taking it with us. As far as I was concerned everything I’d ever needed had always been on the new server and so I didn’t know what was left that we’d still need.

My colleague agreed and so we left the server to be chucked in the skip when the bulldozers came.

Dormant, But Not Redundant

It turned out their original home-grown version control system had a few projects in it, including the old data service. Luckily one of the original developers who worked on the contributions side still had an up-to-date copy of that and my colleague found a local copy of the code for one of the other services but had no idea how up-to-date it was. Sadly nobody had even a partial copy of the source to the data service we were interested in but we were going to replace that anyway so in the end the loss was far less significant than we originally feared.

In retrospect I can’t believe we didn’t even just take the hard disk with us. The server was a classic tower so took up a far bit of room which was still somewhat at a premium in the new office whereas the disk could probably have sit in a desk drawer or even been fitted as an extra drive in the new midi sized development server.

[1] +1 for xcopy deployment which made setting up development and test instances a piece of cake.

[2] There are a lot of stories of file corruption issues with VSS but in the 7 years I’d used it with small teams, even over a VPN, we only had one file corruption issue that we quickly restored from a backup.

[3] We were on the opposite side from the windows too so didn’t even get a cool breeze from those either.

September 26, 2021

The Approximate Number System and software estimating

Derek Jones from The Shape of Code

The ability to perform simple numeric operations can improve the fitness of a creature (e.g., being able to select which branch contains the most fruit), increasing the likelihood of it having offspring. Studies have found that a wide variety of creatures have a brain subsystem known as the Approximate Number System (ANS).

A study by Mechner rewarded rats with food, if they pressed a lever N times (with N taking one of the values 4, 8, 12 or 16), followed by pressing a second lever. The plot below shows the number of lever presses made before pressing the second lever, for a given required N; it suggests that the subject rat is making use of an approximate number system (code+data):

Daily article counts for blog.

Humans have a second system for representing numbers, which is capable of exact representation, it is language. The Number Sense by Stanislas Dehaene was on my list of Christmas books for 2011.

One method used to study the interface between the two language systems, available to humans, involves subjects estimating the number of dots in a briefly presented image. While reading about one such study, I noticed that some of the plots showed patterns similar to the patterns seen in plots of software estimate/actual data. I emailed the lead author, Véronique Izard, who kindly sent me a copy of the experimental data.

The patterns I was hoping to see are those invariably seen in software effort estimation data, e.g., a power law relationship between actual/estimate, consistent over/under estimation by individuals, and frequent use of round numbers.

Psychologists reading this post may be under the impression that estimating the time taken to implement some functionality, in software, is a relatively accurate process. In practice, for short tasks (i.e., under a day or two) the time needed to form a more accurate estimate makes a good-enough estimate a cost-effective option.

This Izard and Dehaene study involved two experiments. In the first experiment, an image containing between 1 and 100 dots was flashed on the screen for 100ms, and subjects then had to type the estimated number of dots. Each of the six subjects participated in five sessions of 600 trials, with each session lasting about one hour; every number of dots between 1 and 100 was seen 30 times by each subject (for one subject the data contains 1,783 responses, other subjects gave 3,000 responses). Subjects were free to type any value as their estimate.

These kinds of studies have consistently found that subject accuracy is very poor (hardly surprising, given that subjects are not provided with any feedback to help calibrate their estimates). But since researchers are interested in patterns that might be present in the errors, very low accuracy is not an issue.

The plot below shows stimulus (number of dots shown) against subject response, with green line showing Response==Stimulus , and red line a fitted regression model having the form $Response=1.7*Stimulus^{0.7}$ (which explains just over 70% of the variance; code+data):

Response given for given number of stimulus dots, with fitted regression model.

Just like software estimates, there is a good fit to a power law, and the only difference in accuracy performance is that software estimates tend not to be so skewed towards underestimating (i.e., there are a lot more low accuracy overestimates).

Adding subjectID to the model gives: $Response=1.8*Stimulus^{0.7}*SubjectID$ , with varying between 0.65 and 1.57; more than a factor of two difference between subjects (this model explains just under 90% of the variance). This is a smaller range than the software estimation data, but with only six subjects there was less chance of a wider variation (code+data).

The software estimation data finds shows that accuracy does not improve with practice. The experimental subjects were not given any feedback, and would not be expected to improve, but does the strain of answering so many questions cause them to get worse? Adding trial number to the model suggests a 12% increase in underestimation, over 600 trials. However, adding an interaction with SubjectID shows that the performance of two subjects remains unchanged, while two subjects experience a 23% increase in underestimation.

The plot below shows the number of times each response was given, combining all subjects, with commonly given responses in red (code+data):

Number of occurrences of response values, over all subjects.

The commonly occurring values that appear in software estimation data are structured as fractions of units of time, e.g., 0.5 hours, or 1 hour or 1 day (appearing in the data as 7 hours). The only structure available to experimental subjects was subdivisions of powers of 10 (i.e., 10 and 100).

Analysing the responses by subject shows that each subject had their own set of preferred round numbers.

To summarize: The results from an experiment investigating the interface between the two human number systems contains three patterns seen in software estimation data, i.e., power law relationship between actual and estimate, individual differences in over/underestimating, and extensive use of round numbers.

Izard’s second experiment limited response values to prespecified values (i.e., one to 10 and multiples of 10), and gave a calibration example after each block of 46 trials. The calibration example improved performance, and the use of round numbers as prespecified response values had the effect of removing spikes from the response counts (which were relatively smooth; code+data)).

We now have circumstantial evidence that software developers are using the Approximate Number System when making software estimates. We will have to wait for brain images from a developer in an MRI scanner, while estimating a software task, to obtain more concrete proof that the ANS is involved in the process. That is, are the areas of the brain thought to be involved in the ANS (e.g., the intraparietal sulcus) active during software estimation?

September 24, 2021

Questioning the great work from home experiment

AllanAdmin from Allan Kelly

18 months ago everyone who worked in an office was sent home, and told “work from home.” Suddenly even the most anti-work from home companies and bosses had to accept it. Even the slowest, bureaucratic, IT departments had to support remote working.

In many ways it has been a great experiment – an experiment that is judged a success because people have been more productive than expected. And look… the western economies are still here. Indeed, some businesses have boomed.

But, I don’t think that was the great experiment. To my mind the great experiment starts now – now that people are getting the option to work in the office or work from home, now that big-bad-managers can lean on people to go into the office and start using “presenteeism” again.

There are a few of points about the great work from home which I think are generally missed. It is because these points no longer apply which makes me thing now is the true experiment. I’ve also got a bunch of concerns, I’m not convinced that surviving the last 18 months by working-from-home is the way we should carry on.

First, the great work-from-home was egalitarian: it wasn’t a privilege or a punishment. Nor was it self-selecting: whether you wanted to or not you did it.

Second, everyone had to do it so everything was online. There were no meetings with some people in one room and others hanging on the end of a telephone, the kitchen was no longer a place for side chats and the smokers couldn’t have their own meetings outside the back door.

Third there was no end date: there was no option to say “we’ll decide it when we are all in the office.”

Finally, last but not least: at least initially it was existing people and teams so relationships and social-capital already existed People were used to working together already.

Bringing in new people, “onboarding” and “enculturing” is always hard, its a lot harder online – and being honest, one of the things I find hardest is finding my way into a new team when everything is online.

As work-from-home has gone on teams have had to learn to recruit and incorporate new people online.

Now, as people drift back to the office – at different paces in different places – these points don’t hold. Now being in the office or not is often a choice, it is no longer whole teams so those side conversations are back.

To my mind this is an even bigger experiment than the great work-from-home; and I think it is going to prove more difficult for companies and teams to navigate.

Personally, I’m really lucky, I’ve had my office in my home for years so I’m all set up for it. But I’m sick of working in the same place day-after-day for months on end. While I’m sure many people never want to see an office again I think there will be many others who are keen to go back to the office.

Next, I have a few fears about the extended remote working model. First off is mental health: without an office, without the journey to and from the office, without the change of clothes that all involves there is less separation between work and home. Without dividing lines that means work impinges on home time and it is difficult to escape work stress.

Second, online working is far more dependent on the written form. That means that those of us who struggle with the written form are at a disadvantage.

Yes I know I’m a bad example, I write too much. But look at all my grammatical and spelling foibles – I’m dyslexic. I’ve learned to be good at writing but really I’d rather be talking.

As e-mail replaces the telephone, Slack replaces the talking across the desk, WhatsApp replaces coffee conversations those of us who struggle with writing are disadvantaged.

Next, I’m concerned for younger people, those who only entered the work force recently. How are they to learn about their job? How are they to learn work culture, let along specific company culture if there is no office?

And most those same people are often in a more difficult position. They are stilling with their parents in their childhood bedroom, or they are living in a house share with problematic internet. In short, younger people need the office more and they need older people as role models and teachers.

Finally, I feel a lot of technology people – programmers specifically – are very keen to push back on any suggestion that face-to-face communication, and co-location, has any advantages. It seems very acceptable to say “We can do everything online, there is no difference.”

I beg to differ. If only because without body language, without facial expressions, without seeing how people react then communication is less – the old “80% of communication is non-verbal” idea. You can do a lot online, but you can also do a lot in person.

Unfortunately I don’t see this debate. I don’t see people discussing “what is best done online” and “what is best done in person.” I see my fellow digital workers being very quick to push back on anyone that questions online working.

Perhaps it’s me, but I feel the technology community is so anti-office work that I have hesitated even to voice these concerns. Anyone else share these views? Or am I persona non-grata?

Subscribe to my blog and download Continuous Digital for free – normal price $9.99/£9.95/€9.95

The post Questioning the great work from home experiment appeared first on Allan Kelly.

September 24, 2021

Questioning the great work from home experiment

AllanAdmin from Allan Kelly Associates