## Reidolized

So Blackie Lawless decided to re-record my favorite album of all time and my initial thoughts are why did he bother. I’ve avoided buying it for months until last Friday when I was trying out some new headphones with the re-recorded guitar solo from The Idol and quite enjoyed the new layers and mix of the new version of the song. Although I’m not a musician, it feels like I know every note and I could really hear the difference. It’s not the original guitarist, and consequently the solo wasn’t the same or as good.

Having now listened to the first disk of the re-record I can only say there is one improvement, the extra songs, at least one of which was taken from WASP’s latest studio album and is magnificent. The removing of the swearing in the songs and intros feels wrong and a cop out and the lead guitar work just isn’t up to Bruce Kulik’s incredible standard.

The second disk is starting now….

## Cuboid Space Division – a.k.

Over the last few months we have been taking a look at algorithms for interpolating over a set of points (xi,yi) in order to approximate values of y between the nodes xi. We began with linear interpolation which connects the points with straight lines and is perhaps the simplest interpolation algorithm. Then we moved on to cubic spline interpolation which yields a smooth curve by specifying gradients at the nodes and fitting cubic polynomials between them that match both their values and their gradients. Next we saw how this could result in curves that change from increasing to decreasing, or vice versa, between the nodes and how we could fix this problem by adjusting those gradients.
I concluded by noting that, even with this improvement, the shape of a cubic spline interpolation is governed by choices that are not uniquely determined by the points themselves and that linear interpolation is consequently a more mathematically appropriate scheme, which is why I chose to generalise it to other arithmetic types for y, like complex numbers or matrices, but not to similarly generalise cubic spline interpolation.

The obvious next question is whether or not we can also generalise the nodes to other arithmetic types; in particular to vectors so that we can interpolate between nodes in more than one dimension.

## Bulk adding items to Wunderlist using wunderline on Ubuntu MATE

If you use Wunderlist and want to be able to bulk-add tasks from a text file, first install and set up wunderline.

Now, to be able to right-click a text file containing one task per line on Ubuntu MATE, create a file called “wunderlist-bulk-add” in ~/.config/caja/scripts/ and make it executable. Paste the code below into that file.

(Note: it’s possible that this would work in GNOME if you replaced “caja” with “nautius” in that path – let me know in the comments.)

```#!/usr/bin/env bash

set -e
set -u
set -o pipefail

function nonblank()
{
grep -v -e '^[[:space:]]*\$' "\$@"
}

for F in "\$@"; do
{
COUNT=\$(nonblank "\$F" | wc -l | awk '{print \$1}')
if [ "\$COUNT" = "" ]; then
{
zenity --info --no-wrap --text="File \$F does not exist"
}
else
{
set +e
zenity --question --no-wrap \
--text="Add \$COUNT items from \$F to Wunderlist?"
set -e

if [ "\$ANSWER" = "0" ]; then
{
set +e
nonblank "\$F" | \
zenity --progress --text "Adding \$COUNT items to Wunderlist"
SUCCEEDED=\$?
set -e
if [ "\$SUCCEEDED" = "0" ]; then
{
zenity --info --no-wrap --text "Added \$COUNT items."
}
else
{
zenity --error --no-wrap \
--text "Failed to add some or all of these items."
}; fi
}; fi
}; fi
}; done
```

## Release or be damned

Back when I was still paid to code I had a simple question I posed to troubled development efforts:

“Why can’t we release tomorrow?”

This short simple question turns out to be amazingly powerful. I remember one effort I was involved with in California where a new CEO took over and started cutting jobs. I posed this question to the team and in a week or two we did a “beta release” – we did those sort of things back then. Asking this question was the key that allows us to question everything, to cut the feature list – or rather push work back, it stayed on the to-do list but we didn’t let it stop us from pushing to release.

We rethought what we were trying to achieve: we didn’t need the whole product, we just needed enough of the product to work to show to one specific target customer. Even if they signed there and then we had weeks before they used it in anger. But until we released something, until we had something “done” our team, our product, look like just another “maybe.” We had to draw a line under it so the new CEO wouldn’t draw a line under us.

Saying “only do the essential” is easy and come up again and again, whether it is Minimal Viable Product, Minimal Subset, Must haves in Moscow rules, but it is far easier said than done. One persons “essential” is so often another persons “optional extra.” In this context, when I say “essential” I mean “the parts needed to make the system work end to end” – I’m far closer to the old walking skeleton idea.

I was reminded of this question by a couple of endeavours that came to my attention during the summer. Well, I say came to my attention, I feel a bit responsible. Both endeavours are happening at clients; clients who I had fallen out of touch with. My style of working is to help clients who want help, I don’t like selling myself. These clients didn’t ask for more help so I didn’t jam my foot in the door, in retrospect maybe I should have.

In one case the team were doing very well. They were iterating, they were TDD/BDD’ing, they were demoing, they were working with the client, they were doing everything … except releasing. Then one day the client asked “when will it be done?”

Now think for a moment: What if you could release your product tomorrow?

The thing is, without actual products those around the team look for signs that the team can be trusted, that they team will deliver, that the team are thinking about what is to be done. People ask for proxy-products: plans, schedules, risk-logs, budget forecasts and so on. When stakeholders can’t see progress they look for things to assure them that there is (or will be) progress (soon).

Who needs plans and predictions about the future when the future is here tomorrow?

Actual releases are they key to reaching the new world, they change everything.

So I feel guilty: I should have inflicted myself on these teams, I should have been there again and again bugging them “Go to release”, “Remove that barrier”, “Force it through”.

Being able to ship an update of your product has a transformative effect.

It demonstrates the team have the ability to do the job in hand.
It demonstrates you have quality. It obliterates the need for a test-fix-test-fix aka stabilisation aka hardening phase.
It blows away sunk costs because something has been delivered.
It removes “maybe” and “ready but…”
It is probably the greatest risk mitigation strategy possible.
It creates trust and provides a platform for solid conversations.

Most of all, a released product is a far better statement of progress than any number of plans or forecasts.

This does not mean everything is done. Sure there are things left undone but there will be things left undone when I’m on my deathbed, that is the nature of life. As much as we (especially men) love to collect entire sets there are few prizes in life for completing everything on your bucket list.

Having a released product utterly changes the nature of the conversation. Conversations are no longer full of “ifs” “maybes” “shoulds” “how long will it take?” “what are the quick wins?”. Those questions can go away. In its place you can have serious conversations about prioritisation and “what do you want tomorrow?”

This is all part of the reason I love continuous delivery. Teams can focus on real priorities and stop wasting time on conjecture.

In my book if you don’t have a releasable product at least every two weeks – say every second Thursday – you are not Agile. And if you haven’t released a product to live in the last two weeks you are probably not Agile.

I don’t care how close you get to a releasable product: it isn’t a release if it isn’t released to a live environment – close but no cigar as they say. (OK, I’ll accept the live environment may not be publicly know, or might be called a beta, but it has to be the real thing.)

Nor should you rest on your laurels once you have regular releases (to live) every second week. That is but first base. You have opened the door, now go further. There are at least 13 opportunities to improve.

If you cannot do that now then ask yourself: Why can’t we release tomorrow?

And start working to remove those obstacles:

• Reduce the number of work items you are aiming to put in the release.
• Fix show-stopper defects now.
• Running tests now.
• Get those people who need to sign-off to sign-off.

Software development has diseconomies of scale: many small is cheaper than few large.

And once you have your release you can turn your attention to making sure these things don’t happen again:

• Reduce the amount of work you accept into development at one time.
• Fix every defects as soon as they are found.
• Automate tests so they can run more often. (Automate anything that moves, and if it doesn’t move, automate it in case.)
• Find a way to reduce the time it takes to get sign-offs: remove the sign-off, make sure the signer prioritises signing or delegate someone else to sign (or automate the signature.)

If there are essential processes, activities, third-parties (or anything else) that has limited bandwidth which need to be done before release but inject delay then re-orientate your process around that bottleneck. For example, if your code needs to pass a security audit before release (an audit you can’t automate that is) then, downsize all the other activities so that the audit process is 100% utilised. (OK, 100% is wrong, 76% might be better, but thats a long conversation about queuing theory.)

Again and again I seem condemned to learn the lesson: nothing counts but working software which is used.

As for my team, and my job in California, it didn’t save me. I regret not asking the question sooner.

The post Release or be damned appeared first on Allan Kelly Associates.

## Writing a new Flarum extension on Ubuntu

In a previous post I described how to install Flarum locally on Ubuntu.

Here is how I set up my development environment on top of that setup so I was able to write a new Flarum extension and test it on my local machine.

Recap: I installed Apache and PHP and enabled mod_rewrite, installed MariaDB and made a database, installed Composer, and used Composer to install Flarum at /var/www/html/flarum.

I decided to call my extension “rabbitescape/flarum-ext-rabbitescape-leveleditor”, so that is the name you will see below.

Here’s what I did:

```cd /var/www/html/flarum
```

Edit /var/www/html/flarum/composer.json (see the workbench explanation for examples of the full file).

At the end of the “require” section I added a line like this:

“rabbitescape/flarum-ext-rabbitescape-leveleditor”: “*@dev”

(Remembering to add a comma at the end of the previous line so it remained valid JSON.)

After the “require” section, I added this:

```    "repositories": [
{
"type": "path",
"url": "workbench/*/"
}
],
```

Next, I made the place where my extension would live:

```mkdir workbench
cd workbench
mkdir flarum-ext-rabbitescape-leveleditor
cd flarum-ext-rabbitescape-leveleditor
```

Now I created a file inside flarum-ext-rabbitescape-leveleditor called composer.json like this:

```{
"name": "rabbitescape/flarum-ext-rabbitescape-leveleditor",
"description": "Allow viewing and editing Rabbit Escape levels in posts",
"type": "flarum-extension",
"keywords": ["rabbitescape"],
"authors": [
{
"name": "Andy Balaam",
"email": "andybalaam@artificialworlds.net"
}
],
"support": {
"issues": "https://github.com/andybalaam/flarum-ext-rabbitescape-leveleditor/issues",
"source": "https://github.com/andybalaam/flarum-ext-rabbitescape-leveleditor"
},
"require": {
"flarum/core": "^0.1.0-beta.5"
},
"extra": {
"flarum-extension": {
"title": "Rabbit Escape Level Editor"
}
}
}
```

In the same directory I created a file bootstrap.php like this:

```<?php

return function () {
echo 'Hello, world!';
};
```

Then I told Composer to refresh the project like this:

```cd ../..   # Back up to the main directory
composer update
```

Among the output I saw a line like this which convinced me it had worked:

```  - Installing rabbitescape/flarum-ext-rabbitescape-leveleditor (dev-master): Symlinking from workbench/flarum-ext-rabbitescape-leveleditor
```

Now I went to http://localhost/flarum, signed in as admin, adminpassword (as I set up in the previous post), clicked my username in the top right and chose Administration, then clicked Extensions on the left.

I saw my extension in the list, and turned it on by clicking the check box below it. This made the whole web site disappear, and instead just the text “Hello world!” appeared. This showed my extension was being loaded.

Finally, I edited bootstrap.php and commented out the line starting with “echo”. I refreshed the page in my browser and saw the Flarum site reappear.

Now, my extension is installed and ready to be developed! See flarum.org/docs/extend/start for how to get started making it do cool stuff.

## Moving to the 12th circle in fault prediction modeling

Most software fault prediction papers are based on a false assumption, i.e., a list of dates when a fault was first experienced, by a program, contains enough information to build a model that has a connection to reality. A count of faults that have been experienced twice is required to fit a basic model that has some mathematical connection to reality.

I had thought that people had moved on from writing papers that fitted yet more complicated equations to one of the regularly used data sets. No, it seems they have just switched to publishing someplace they have not been seen before.

Table 1 lists the every increasing number of circles within circles; the new model is proposed as the 12th refinement (the table is a summary, lots of forks have been proposed over the years). I have this sinking feeling there is another paper in the works, one that ‘benchmarks’ the new equation using a collection of the other regular characters data sets that appear in papers of this kind.

Fitting an equation to data of first experience of a fault is little better than fitting noise.

As Planck famously said, science advances one funeral at a time.

## Ubuntu “compose” key for easy unicode character input

I found out on Mastodon recently that the Compose key exists, allowing you to enter special characters using easy-to-remember key sequences (e.g. “<compose>/=” gives you “≠”).

To do this on Ubuntu (and probably many other systems), hold SHIFT, then press AltGr, release them both, and type something like “/=”. You should see the ≠ symbol appear.

In Ubuntu MATE, you can customise what key is the compose key by opening “Keyboard”, clicking “Layouts”, then “Options…” and checking a box under “Position of compose key”. I checked “Right Alt”, meaning my “AltGr” key is now the compose key.

Want to explore? See the list of the default sequences you can type.

## Redirecting all requests to https and www using .htaccess in Apache

I want all requests to artificialworlds.net/rabbit-escape/levels/ to get redirected to use the https protocol, and to include “www.” at the beginning of the URL, and I found lots of Stack Overflow articles, but nothing that worked perfectly for me. Here is how I managed it.

I edited the .htaccess file in the directory where I want this to apply, and added this at the top:

```<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} ^artificialworlds\.net\$ [NC]
RewriteRule ^ https://www.artificialworlds.net%{REQUEST_URI} [R=301,L]

RewriteCond %{HTTPS} off
RewriteRule ^ https://www.artificialworlds.net%{REQUEST_URI} [R=301,L]
</IfModule>
```

## Surgical Support Needs Surgical Tools

In the world of IT support there is the universal solution to every problem – turn it off and on again. You would hope that this kind of drastic action is only drawn upon when all other options have been explored or that the problem is known a priori to require such a response and is the least disruptive course of action. Sadly what was often just a joke in the past has become everyday life for many systems as rebooting or restarting is woven into the daily support routine.

In my professional career I have worked on a number of systems where there are daily scheduled jobs to reboot machines or restart services. I’m not talking about the modern, proactive kind like the Chaos Monkey which is used to probe for weaknesses, or when you force cluster failovers to check everything’s healthy; I’m talking about jobs where the restart is required to ensure the correct functioning of the system – disabling them would cripple it.

Sledgehammers

The need for the restart is usually to overcome some kind of corrupt or immutable internal state, or to compensate for a resource leak, such as memory, which degrades the service to an unacceptable level. An example of the former I’ve seen is to flush a poisoned cache or pick up the change in date, while for the latter it might be unclosed database connections or file handles. The notion of “recycling” processes to overcome residual effects has become so prominent that services like IIS provide explicit support for it [1].

Depending on where the service sits in the stack the restart could be somewhat disruptive, if it’s located on the edge, or largely benign if it’s purely internal, say, a background message queue listener. For example I once worked on a compute farm where one of the front-end services was restarted every night and that caused all clients to drop their connection resulting in a support email being sent due to the “unhandled exception”. Needless to say everyone just ignored all the emails as they only added to the background noise making genuine failures harder to spot.

These kind of draconian measures to try and attain some system stability actually make matters worse as the restarts then begin to hide genuine stability issues which eventually start happening during business hours as well and therefore cause grief for customers as unplanned downtime starts occurring. The impetus for one of my first ACCU articles “Utilising More Than 4GB of Memory in a 32-bit Windows Process” came from just such an issue where a service suddenly starting failing with out-of-memory errors even after a restart if the load was awkwardly skewed. It took almost four weeks to diagnose and properly fix the issue during which there were no acceptable workarounds – just constant manual intervention from the support team.

I also lost quite a few hours on the system I mentioned earlier debugging a problem in the caching mechanism which was masked by a restart and only surfaced because the restart failed to occur. No one had remembered about this failure mode because everyone was so used to the restart hiding it. Having additional complexity in the code for a feature that will never be used in practice is essentially waste.

Cracking Nuts

Although it’s not true in all cases (the memory problem just described being a good example) the restart option may be avoidable if the process exposed additional behaviours that allowed for a more surgical approach to recovery to take place. Do you really need to restart the entire process just to flush some internal cache, or maybe just a small part of it? Similarly if you need to bump the business date via an external stimulus can that not be done through a “discoverable” API instead of hidden as part of a service restart [2]?

In some of my previous posts and articles, e.g “From Test Harness To Support Tool”, “Home-Grown Tools” and more recently in “Libraries, Console Apps, and GUIs”, I have described how useful I have found writing simple custom tools to be for development and deployment but also, more importantly, for support. What I think was missing from those posts that I have tried to capture in this one, most notably through its title, is to focus on resolving system problems with the minimal amount of disruption. Assuming that you cannot actually fix the underlying design issue without a serious amount of time and effort can you at least alleviate the problem in the shorter term by adding simple endpoints and tools that can be used to make surgical-like changes inside the critical parts of the system?

For example, imagine that you’re working on a grid computing system where tasks are dished out to different processes and the results are aggregated. Ideally you would assume that failures are going to happen and that some kind of timeout and retry mechanism would be in place so that when a process dies the system recovers automatically [3]. However, if for some reason that automatic mechanism does not exist how are you going to recover? Given that someone (or something) somewhere is waiting for their results how are you going to “unblock the system” and allow them to make progress, without also disturbing all your other users who are unaffected?

You can either try and re-submit the failed task and allow the entire job to complete or kill the entire job and get the client to re-submit its job. As we’ve seen one way to achieve the latter would be to restart parts of the system thereby flushing the job from it. When this is a once in a million event it might make sense [4] but once the failures start racking up throwing away every in-flight request just to fix the odd broken one becomes more and more disruptive. Instead you need a way to identify the failed task (perhaps using the logs) and then instruct the system to recover such as by killing just that job or by asking it to resubmit it to another node.

Hence, ideally you’d just like to hit one admin endpoint and say something like this:

> admin-cli kill-job --server prod --job-id 12345

If that’s not easily achievable and there is distributed state to clear up you might need a few separate little tools instead that can target each part of system, for example:

> admin-cli remove-node –s broker-prod --node NODE99
> admin-cli remove-results -s results-prod --job 12345
> admin-cli reset-client –s ui-prod --client CLT42

This probably seems like a lot of work to write all these little tools but what I’ve found in practice is that usually most of the tricky logic in the services already exists – you just need to find a way to invoke it externally with the correct arguments.

These days it’s far easier to graft a simple administration endpoint onto an existing service. There are plenty of HTTP libraries available that will allow you to expose a very basic API which you could even poke with CURL. If you’re already using something more meaty like WCF then adding another interface should be trivial too.

Modern systems are becoming more and more decoupled through the use of message queues which also adds a natural extension point as they typically already do all the heavy lifting and you just need to add new message handlers for the extra behaviours. One of the earliest distributed systems I worked on used pub/sub on a system-wide message bus both for functional and administrative use. Instead of using lots of different tools we had a single admin command line tool that the run playbook generally used (even for some classic sysadmin stuff like service restarts) as it made the whole support experience simpler.

Once you have these basic tools it then becomes easy to compose and automate them. For example on a similar system I started by adding a simple support stored procedure to find failed tasks in a batch processing system. This was soon augmented by another one to resubmit a failed task, which was then automated by a simple script. Eventually it got “productionised” and became a formal part of the system providing the “slow retry” path [3] for automatic recovery from less transient outages.

Design for Supportability

One of the design concepts I’ve personally found really helpful is Design for Testability; something which came out of the hardware world and pushes us to design our systems in a way that makes them much easier test. A knock-on effect of this is that you can reduce your end-to-end testing burden and deliver quicker.

A by-product of Design for Testability is that it causes you to design your system in a way that allows internal behaviours to be observed and controlled in isolation. These same behaviours are often the same ones that supporting a system in a more fine-grained manner will also require. Hence by thinking about how you test your system components you are almost certainly thinking about how they would need to be supported too.

Ultimately of course those same thoughts should also be making you think about how the system will likely fail and therefore what needs to be put in place beforehand to help it recover automatically. In the unfortunate event that you can’t recover automatically in the short term you should still have some tools handy that should facilitate a swift and far less disruptive manual recovery.

[1] Note that this is different from a process restarting itself because it has detected that it might have become unstable, perhaps through the use of the Circuit Breaker pattern.

[2] Aside from the benefits of comprehension this makes the system far more testable as it means you can control the date and therefore write deterministic tests.

[3] See “When Does a Transient Failure Stop Being Transient” for a tangent about fast and slow retries.

[4] Designing any distributed system that does not tolerate network failures is asking for trouble in my book but the enterprise has a habit of believing the network is “reliable enough”.