Most of my professional life has been spent analyzing other peoples’ code, for one reason or another (mostly Fortran, then Pascal, and then C). I had heard that the Imperial software was written in C, but the released code is written in R (as of six hours ago there is the start of a Python version). Ok, I can work with R, but my comments will be general, since I don’t have lots of in depth experience reading R code.
The code comes from a research context, and is evolving, i.e., some amount of messiness is to be expected.
There is not a lot of code to talk about (248 lines setting things up, 111 lines for a Stan model, 371 lines of plotting code, and 85 lines of utility code). The analysis is performed by creating a model using the Stan statistical inference language (in which the high level structure of the problem is specified, compiled to a lower level form and then run; the Stan language is very similar to R). These days lots of problems are coded using a relatively small number of lines that call fancy libraries to do the heavy lifting. It is becoming rare to have to write tens of thousands of lines of code to solve a problem.
I have two points to make about the code, all designed to reduce the likelihood of mistakes being made by the person working on the source. These points mainly apply to the Stan code, because that is where the important stuff happens, but are equally applicable to all code.
Numeric literals are embedded in the code, values include: 2.4, 1.0, 0.5, 0.03, 1e-5, and 1e-9. These values obviously mean something to the person who wrote the code, and they can probably be interpreted by experts in the spread of virus infections. But why are they scattered about the code, rather than appearing together (as a sequence of assignments to variables with meaningful names)? Having all the constants in one place makes it easier to spot when a mistake has been made, e.g., one value has been changed without a corresponding change in another value; it also makes it easier for people new to the code to figure out what is going on,
when commenting out code, make it very obvious, e.g., have /********************** on its own line, and *****************************/ on its own line. Using just /* and */ makes it easy to miss that code has been commented out.
Why have they started a Python implementation? Perhaps somebody on the team is more comfortable working with Python (when deadlines loom, it is always best to go with what you know).
Having both an R and Python version is good, in that coding mistakes are likely to show up as inconsistencies in the results produced. It’s always good to have the output of two independently written programs to compare (apart from the fact it may cost twice as much).
The README mentions performance issues. I imagine that most of the execution time is spent in the Stan code, so R vs. Python is not a performance issue.
Any reader with expertise tuning Stan models for performance might like to check out the code. I’m sure the Imperial folk would be happy to hear about worthwhile speed-ups.
I was recording some podcast audio tonight and wanted to be able to press a single key when I reached a significant moment, so I could add the times to the show notes.
I couldn’t find anything that already did this, so I wrote a tiny bash script. I ran this script pressed Enter whenever I wanted a time recorded:
A few years ago we saw how we could approximate a function f between pairs of points (xi, f(xi)) and (xi+1, f(xi+1)) by linear and cubic spline interpolation which connect them with straight lines and cubic polynomials respectively, the latter of which yield smooth curves at the cost of somewhat arbitrary choices about their exact shapes.
An alternative approach is to construct a single function that passes through all of the points and, given that nth order polynomials are uniquely defined by n+1 values at distinct xi, it's tempting to use them.
Last year I was coaching a team. One of the big problems the team faced was excessive work in progress – and tendency for developers to start new work when they hit a blockage. Eventually, with the help of the Product Owner who saw the problem too, we starved the work pipeline. The team actually ran out of work. We saw this as a great success. It had never happened before and meant we could really focus and prioritise work.
Unfortunately, this happened when the two of us were not instantly available. You can argue that we should have been instantly available. Or that people should have made more of an effort to contact us. Or that we should have left a secret stash of work to do. Or that the team should have self-organized to fix the problem. That is easy in retrospect but really, I still don’t see it as a problem.
A few hours without work? I see it as a momentous moment, the start of something great.
But that is not how others saw it. The team, the Product Manager, another Agile Coach on site, and anyone these people could tell were quick to tell us how awful it was: “the team ran out of work.†Word spread quickly that the team had run out of work. My name was dirt.
Doing good by one group isn’t always seen as good by others. When you work as an agile coach conflicts occur daily. But some are bigger and more persistent than others.
For most of the last 10 years I’ve mainly worked as a “drop in†coach (“light touch†I like to call it). I visit clients for a short period of time, talk about improvements, problems, solutions, give directive or non-directive coaching and don’t return for days or weeks. I’m not the same clients every week, maybe I drop in a few days a month and talk to people.
Last year was different, I spent almost all of it with one company, mostly embedded in one team as an official “Agile Coach.â€
Comparing these two approaches to coaching has left me with a lot to reflect on. In particular I found a number of conflicts which troubled me, I’m not sure I ever worked these out and I’m sure others find the same things. So I’d like to share…
Responsible to the team, accountable to the organization
I was lucky, it was one of the team members who called me in but it was the bigger organization that was paying my fees. While I felt responsible to the team I was accountable to the organization.
That organization wanted things and expected me to deliver: they wanted a team which performed better (delivered more stuff and more value!) They wanted the team to share common practices and ceremonies with other teams.
For the organization I was the bringer of change to the team. But I could only do that if I was accepted by the team. That limited my ability to push through changes. Even if nobody else saw that conflict I felt it every day.
At one level team did want to change but they also wanted the organization to change and I had very little power there. Both sides, the team and the organization had no-go areas. More conflict.
The organization restricted my ability to do things I thought would improve the team. Things the team accepted would help: like spending money on training. So I was bearer of bad news to both sides: one side saw me asking, then arguing for money while the other saw me failing to deliver.
That organization also expected me to operate within the organization: join coaches meetings, sign up to shared coaching goals, complete team assessments, etc. None of these things were necessarily bad in their own right but it meant I had two masters: the team and the organization.
When push came to shove I prioritised the team but I know some coaches who prioritised the organization. I know some team members mistrusted their coaches because they believed their coach would put the organization first.
Honouring self-organization but creating change
So an agile team is self-organizing. That gives them the right to self-organise to work exactly as they do today. Self-organization gives them the right to not change anything – something I wrote about way back in Changing Software Development.
But, almost by definition, the (agile) coaches role is to bring about change, to help the team do better. Conflict is inevitable.
Sure you say “its a question of motivation … the coach needs to create the motivation to change and do better†and I would agree, but, even in creating that motivation one is creating change, one is intervening. Which brings us nicely to….
Leading without authority
Agile coaches lack authority – if they had authority they would be managers, I’ll blog about that in future. In a way not having authority is liberating, one can’t use the whip no matter how much one wants to! But it is also difficult.
The organization, and the coach, wants to create change but without authority even the smallest changes can become massive efforts. When the team is divided themselves, or even when one team member objects to implementing a change becomes like wading through treacle. That can be demoralising for the coach.
Yet a little authority can go a long way in pushing through change and overriding objections.
And on occasions I did reach for authority, but that creates a conflict within oneself as a coach: was I right to do it? am I honouring the team? the team members? am I creating a learned dependency?
Accepted while pushing the unpopular
Nowhere is that conflict clearer then when pushing through an unpopular change in the face of opposition – even minority opposition. As a coach one risks loosing future changes because, most change the coach “initiates†is done with the acceptance of the team, pushing through an unpopular change – even with a majority, even with leadership support – risks future acceptance.
One is constantly asking: how far can I take this team right now?
And: if I take them too far will they trust me tomorrow?
And, most of all: am I right to do this?
Hardly a day pasted last year when I didn’t agonise over these questions. And as I write this I imagine one of those teams members reading this and saying “Huh, and you got it wrong.â€
Who gets the credit?
As a coach your job is to make others perform better, but really, only they can perform better. You can’t make them, you can only help them. The final decisions rest with them.
So who should get the credit? – surely it is them, they made the change, they did something different.
That creates an inner conflict. It also creates a conflict with the organization: why should they keep me employed? After all I didn’t make any difference, they did it.
We know the value of positive praise and acknowledgement, but when there is nobody to praise you, when the team don’t recognise the coaches role (which can be hard if the coach is doing a good job) then one becomes demoralised and that saps ones strength to carry on.
As people we need acknowledgement, as a human we all have needs. But the coaches role so often demands that we forego acknowledgement, praise and recognition.
Conflicts exists
This isn’t an exhaustive list of the conflicts I’ve encountered and hopefully as you read this you can see solutions – I can myself! But what I want to say is: these conflicts exist, I’m sure other coaches have them and even when there are solutions those solutions need to be applied.
Living with these conflicts can be hard, mentally and emotionally. Burnout happens to coaches.
And organizations get fed up with coaches who don’t deliver change, don’t turn up to non-team meetings, keep asking for money, don’t crack the whip or exceeds their (none existent) authority.
Which programming languages have been the most influential?
Let’s define an influential language as one that has had an impact on lots of developers. What impact might a programming language have on developers?
To have an impact a language needs to be used by lots of people, or at least have a big impact on a language that is used by lots of people.
Figuring out the possible impacts a language might have had is very difficult, requiring knowledge of different application domains, software history, and implementation techniques. The following discussion of specific languages illustrate some of the issues.
Simula is an example of a language used by a handful of people, but a few of the people under its influence went on to create Smalltalk and C++. Some people reacted against the complexity of Algol 68, creating much simpler languages (e.g., Pascal), while others thought some of its feature were neat and reused them (e.g., Bourne shell).
Cobol has been very influential, at least within business computing (those who have not worked in business computing complain about constructs handling uses that it was not really designed to handle, rather than appreciating its strengths in doing what it was designed to do, e.g., reading/writing and converting a wide range of different numeric data formats). RPG may have been even more influential in this usage domain (all businesses have to specific requirements on formatting reports).
I suspect that most people could not point to the major influence C has had on almost every language since. No, not the use of { and }; if a single character is going to be used as a compound statement bracketing token, this pair are the only available choice. Almost every language now essentially uses C’s operator precedence (rather than Fortran‘s, which is slightly different; R follows Fortran).
Algol 60 has been very influential: until C came along it was the base template for many languages.
Fortran is still widely used in scientific and engineering software. Its impact on other languages may be unknown to those involved. The intricacies of floating-point arithmetic are difficult to get right, and WG5 (the ISO language committee, although the original work was done by the ANSI committee, J3). Fortran code is often computationally intensive, and many optimization techniques started out optimizing Fortran (see “Optimizing Compilers for Modern Architectures” by Allen and Kennedy).
BASIC showed how it was possible to create a usable interactive language system. The compactness of its many, and varied, implementations were successful because they did not take up much storage and were immediately usable.
Forth has been influential in the embedded systems domain, and also people fall in love with threaded code as an implementation technique (see “Threaded Interpretive Languages” by Loeliger).
During the mid-1990s the growth of the Internet enabled a few new languages to become widely used, e.g., PHP and Javascript. It’s difficult to say whether these were more influenced by what their creators ate the night before or earlier languages. PHP and Javascript are widely used, and they have influenced the creation of many languages designed to fix their myriad of issues.
is an excellent book by Douglas Adams (isbn 978-0-330-49121-1).
As usual
I'm going to quote from a few pages.
There is another theory which states that this has already happened.
The story so far: In the beginning the Universe was created.
This had made a lot of people very angry and been widely regarded
as a bad move.
The motto stands - or rather stood - in three-mile high illuminated
letters near the Complaints Department spaceport on Eadraz.
Unfortunately its weight was such that shortly after it was erected,
the ground beneath the letters caved in and they dropped for nearly
half their length through the offices of many talented young
complaints executives - now deceased.
Quite how Zaphod Bebblebrox arrived at the idea of holding a
seance at this point is something he was never quite clear on.
'Listen, three eyes,' he said', 'don't you try to outweird me,
I get stranger things than you free with my breakfast cereal.'
Marvin was forced to say something which came very hard to him.
'I don't know,' he said.
'I go up,' said the elevator, 'or down.'
'Good,' said Zaphod. 'We're going up.'
'Or down,' the elevator reminded him.
And the worse they were to wear, the more people had to buy to
keep themselves shod, and the more the shops poliferated,
until the whole economy of the place passed what I believe is
termed the Shoe Event Horizon, and it became no longer economically
possible to build anything other than shoe shops.
'Transtellar Cruise Lines would like to apologize to passengers
for the continuing delay to this flight. We are currently
awaiting the loading of our complement of small lemon-soaked
napkins for your comfort, refreshment and hygiene during the
journey. Meanwhile we thank you for your patience.'
In it, guests take (willan on-take) their places at table
and eat (willan on-eat) sumptuous meals whilst watching
(willing watchen) the whole of creation explode around them.
'You've never heard of Disaster Area?'
'It says "Golgafrinchan Ark Fleet, Ship B, Hold Seven,
Telephone Sanitizer Second Class" - and a serial number.'
To summarize the summary: anyone who is capable of getting
themselves made President should on no account be allowed
to do the job.
Their track suits were now all dirty and even torn, but they
all had immaculately styled hair.
This time, let's build a decision tree with some data. There are many freely available data sets used to explore machine learning, such as the Iris dataset, in the UCI repository.
So let's try another one. The so-called wine dataset. This has three types of wine, with 13 attributes. Though many blogs list the attributes, I have been unable to find out what these three mystery types of wine are. They are three different Italian cultivars, but I have no idea what.
Rather than concentrating on building a decision tree to accurately categorise the wine, giving us a way to predict the type of another wine based on some or all of the 13 attributes, let's build a tree and see what it says.
These data sets are so common, they can be loaded directly from many machine learning packages, such as the python module sklearn. This also has a DecisionTreeClassifier.
So,
from sklearn.datasets import load_wine X = data.data y = data.target estimator = DecisionTreeClassifier(max_depth=2) estimator.fit(X, y)
We asked for a maximum depth of 2, otherwise it makes a tree as deep (or high) as required to end up with leaves that are "pure" (or as pure as possible). In this case each is the same category of wine. Limiting the depth means it won't get as deep, or wide. But the first few layers will still show us which attributes are used to split up the data.
I say, "show", but we need to see the tree it's made. There are various ways to do this, but I'll use this:
from sklearn import tree from IPython.display import SVG from graphviz import Source from IPython.display import display
Unfortunately, I've had to stick with class names, i.e. wine categories, of 0, 1 and 2, because I have no idea what they really are.
This generates the following picture:
The first line tells you the attribute and the cut off point chosen. For example, any wine with proline less than or equal to 755 goes down the left branch. The gini index is the measure used to decide which attribute or feature to split on. If you look up the decision tree classifier, you'll find other measures to try. The samples tell you how many at that node. We start with 178 wines, with 71 in class 1, with fewer in the other classes, so it reports class 1 at the first node.
For proline less than 755, we have 111 samples, still mostly in class 1. For proline greater than 755 we have 67 samples, mostly in class 0. These 67 samples can then be split on flavanoids. Anything less than 2.165 is class 2, according to this tree. Anything greater is class 0. We do have some class 2 wine on the left-most branch as well, however, I had a brief wander round the internet to read about flavanoids in wine. Wikipedia says
In white wines the number of flavonoids is reduced due to the lesser contact with the skins that they receive during winemaking.
Is class 2 white wine? Who knows. It could be. The decision tree made this stand out far more clearly than looking directly at the input data.
I was planning on giving this as a lightning talk at the ACCU conference, but since it was cancelled this year, because of COVID-19, I wrote this short blog instead. If you can figure out what the types of wine are, get in touch.
Consider this a gift, its also an experiment. Numbers are limited so if you would like to join please e-mail me today – if it goes well I’ll repeat, although I might ask for money next time.
I’m going to tun an online workshop entitled: Stories and Value.
Participation is limited to a 16 and its going to be first come first served – blog/newsletter readers are getting the first chance to sign-up.
This is based on my existing “Requirements, Backlogs and User Stories†workshop which has itself mutated into a discussion of stories and value. The workshop will run as a series of 90 minute sessions, one a week for four weeks online.
I want the workshop sessions to remain interactive, I’m sure I will use some slides at some point but I want to keep it interactive. So I’m going to limit participation to 12.
The draft schedule is:
Workshop 1: How value influences our thinking
Workshop 2: Good and Bad User Stories
Workshop 3: Estimating story value
Workshop 4: Time value profiles and closing discussion
I plan on using exercises in throughout and I think I know how to run them online. And I want discussion! – I may even set a little homework between sessions.
But in all honesty, it’s an experiment. So, I’m not planning on charging for this – it is Free!
If you find it valuable you can make a payment – like those “pay what you like†restaurants. That will itself be feedback.
I’m thinking Wednesday, 3pm UK time so those in mainline Europe could join too (sorry US and Asia, maybe next time); on a Zoom conference. Start next week, April 1 ? – once I know who’s in we might debate this between ourselves.
My thinking is still developing on this so let me know if you have any ideas to contribute. (And if you can’t join but want to let me know, feedback is valuable too! Likewise, if you are tempted but want to see something different please tell me and I’ll see what I can do.)
is an excellent book by Douglas Adams (isbn 978-0-330-49119-8).
As usual
I'm going to quote from a few pages.
Far out in the uncharted backwaters of the unfashionable
end of the Western Spiral Arm of the Galaxy lies a small
unregarded yellow sun.
The Guide also tells you on which planets the best Pan
Galactic Gargle Blasters are mixed, how much you can
expect to pay for one and what voluntary organizations
exist to help you rehabilitate afterwards.
People of Earth, your attention please.
This is Prostetnic Vogon Jeltz of the Galactic Hyperspace
Planning Council...
The practical upshot of all this is that if you stick a Babel fish in
your ear you can instantly understand anything said to you in any
form of language.
The prisoners sat in the Poetry Appreciation chars - strapped in.
'Space', it says, 'is big. Really big. You just won't believe how
vastly hugely mindbogglingly big it is. I mean you may think it's
a long way down the road to the chemist, but that's just peanuts to
space. Listen...' and so on.
The principle of generating small amounts of finite improbability
by simply hooking the logic circuits of a Bambleweeny 57
Sub-Meson Brain to an atomic vector plotter suspended in a strong
Brownian Motion producer (say a nice hot cup of tea) were of
course well understood.
Here I am, brain the size of a planet and they ask me to take you
down to the bridge. Call that job satisfaction? Cos I don't.
For years radios had been operated by means of pressing buttons
and turning dials; then as technology became more sophisticated
the controls were made touch sensitive - you merely had to brush
the panels with your fingers; now all you had to do was wave your
hand in the general direction of the components and hope. It saved
a lot muscular expenditure of course, but meant that you had to sit
infuriatingly still if you wanted to keep listening to the same
programme.
'Oh God,' said Zaphod. He hadn't worked with this computer for
long but had already learned to loathe it.
'Computer!' shouted Zaphod. 'Rotate angle of vision through
one-eighty degrees and don't talk about it!'
Many men of course became extremely rich, but this was perfectly
natural and nothing to be ashamed of because no one was
really poor - at least no one worth speaking of.
'Hi there! This is Eddie your shipboard computer, and I'm feeling
just great, guys, and I know I'm just going to get a bundle of kicks
out of any program you care to run through me.'
'You just let the machines get on with the adding up,' warned
Majikthise, 'and we'll take care of the eternal verities, thank you very much.
You want to check your legal position you do, mate. Under the law the
Quest for the Ultimate Truth is quite clearly the inalienable prerogative
of your working thinkers.'
'I think the problem, to be quite honest with you, is that you've
never actually known what the question is.'
R is a velocity measure, defined as a reasonable speed of travel
that is consistent with health, mental wellbeing and not being more than
say five minutes late.