Hello there Sir R-----! Come join me by the hearth for a dram of warming spirits! I trust that this cold spell has not chilled your desire for a wager?
Good man! Good man!
I must say that the contrast between the warmth of this fire and the frost outside brings most vividly to my mind an occasion during my tenure as the Empress's ambassador to the land of Oz; specifically the time that I attended King Quadling Rex's winter masked ball during which his southern palace was overrun by an infestation of Snobbles!
The change was reasonably simple: we had to denormalise some postcode data which was currently held in a centralised relational database into some new fields in every client’s database to remove some cross-database joins that would be unsupported on the new SQL platform we were migrating too .
As you might imagine the database schema changes were fairly simple – we just needed to add the new columns as nullable strings into every database. The next step was to update the service code to start populating these new fields as addresses were added or edited by using data from the centralised postcode database .
At this point any new data or data that changed going forward would have the correctly denormalised state. However we still needed to fix up any existing data and that’s the focus of this post.
To fix-up all the existing client data we needed to write a tool which would load each client’s address data that was missing its new postcode data, look it up against the centralised list, and then write back any changes. Given we were still using the cross-database joins in live for the time being to satisfy the existing reports we could roll this out in the background and avoiding putting any unnecessary load on the database cluster.
The tool wasn’t throw-away because the postcode dataset gets updated regularly and so the denormalised client data needs to be refreshed whenever the master list is updated. (This would not be that often but enough to make it worth spending a little extra time writing a reusable tool for the job for ops to run.)
Clearly this isn’t rocket science, it just requires loading the centralised data into a map, fetching the client’s addresses, looking them up, and writing back the relevant fields. The tool only took a few hours to write and test and so it was ready to run for the next release during a quiet period.
When that moment arrived the tool was run across the hundreds of client databases and plenty of data was fixed up in the process, so the task appeared completed.
With all the existing postcode data now correctly populated too we should have been in a position to switch the report generation feature toggle on so that it used the new denormalised data instead of doing a cross-database join to the existing centralised store.
While the team were generally confident in the changes to date I suggested we should just do a sanity check and make sure that everything was working as intended as I felt this was a reasonably simple check to run.
An initial SQL query someone knocked up just checked how many of the new fields had been populated and the numbers seemed about right, i.e. very high (we’d expect some addresses to be missing data due to missing postcodes, typos and stale postcode data). However I still felt that we should be able to get a definitive answer with very little effort by leveraging the existing we SQL we were about to discard, i.e. use the cross-database join one last time to verify the data population more precisely.
Close, but No Cigar
I massaged the existing report query to show where data from the dynamic join was different to that in the new columns that had been added (again, not rocket science). To our surprise there were quite a significant number of discrepancies.
Fortunately it didn’t take long to work out that those addresses which were missing postcode data all had postcodes which were at least partially written in lowercase whereas the ones that had worked were entirely written in uppercase.
Hence the bug was fairly simple to track down. The tool loaded the postcode data into a dictionary (map) keyed on the string postcode and did a straight lookup which is case-sensitive by default. A quick change to use a case-insensitive comparison and the tool was fixed. The data was corrected soon after and the migration verified.
Why didn’t this show up in the initial testing? Well, it turned out the tools used to generate the test data sets and also to anonymize real client databases were somewhat simplistic and this helped to provide a false level of confidence in the new tool.
Testing in Production
Whenever we make a change to our system it’s important that we verify we’ve delivered what we intended. Oftentimes the addition of a feature has some impact on the front-end and the customer and therefore it’s fairly easy to see if it’s working or not. (The customer usually has something to say about it.)
However back-end changes can be harder to verify thoroughly, but it’s still important that we do the best we can to ensure they have the expected effect. In this instance we could easily check every migrated address within a reasonable time frame and know for sure, but on large data sets this might unfeasible so you might have to settle for less. Also the use of feature switches and incremental delivery meant that even though there was a bug it did not affect the customers and we were always making forward progress.
Testing does not end with a successful run of the build pipeline or a sign-off from a QA team – it also needs to work in real life too. Ideally the work we put in up-front will make that more likely but for some classes of change, most notably where actual customer data is involved, we need to follow through and ensure that practice and theory tie up.
 Storage limitations and other factors precluded simply moving the entire postcode database into each customer DB before moving platforms. The cost was worth it to de-risk the overall migration.
 There was no problem with the web service having two connections to two different databases, we just needed to stop writing SQL queries that did cross-database joins.
The use of adjectives to analysis source code is something of a specialist topic. This post can only increase the number of people using adjectives for this purpose (because I don’t know anybody else who does
Until recently the only adjective related property I used to help analyse source was relative order. When using multiple adjective, people have a preferred order, e.g., in English size comes before color, as in “big red” (“red big” sounds wrong), and adjectives always appear before the noun they modify. Native speakers of different languages have various preferred orders. Source code may appear to only contain English words, but adjective order can provide a clue about the native language of the developer who wrote it, because native ordering leaks into English usage.
Searching for adjective patterns (or any other part-of-speech pattern) in identifiers used to be complicated (identifiers first had to be split into their subcomponents). Now, thanks to Vadim Markovtsev, 49 million token-split identifiers are available. Happy adjective pattern matching (Size Shape Age Color is a common order to start with; adjective pairs are found in around 0.1% of identifiers; some scripts).
Until recently, gradable adjectives were something that I had been vaguely aware of; these kinds of adjectives indicate a position on a scale, e.g., hot/warm/cold water. The study Grounding Gradable Adjectives through Crowdsourcing included some interesting data on the perceived change of an attribute produced by the presence by a gradable adjective. The following plot shows perceived change in quantity produced by some quantity adjectives (code+data):
How is information about gradable adjectives useful for analyzing source code?
One pattern that jumps out of the plot is that variability, between people, increases as the magnitude specified by the adjective increases (the x-axis shows standard deviations from the mean response). Perhaps the x-axis should use a log scale, there are lots of human related response characteristics that are linear on a log scale (I’m using the same scale as the authors of the study; the authors were a lot more aggressive in removing outliers than I have been), e.g., response to loudness of sound and Hick’s law.
At the moment, it looks as if my estimate of the value of a “small x” is going to be relatively closer to another developers “small x“, than our relative estimated value for a “huge x“.
In retrospect I think the presentation should have had a big question mark (“?”) in the title. In many ways I’m asking “Is the Product Owner role impossible to fill well?”. I had some really good discussions on this topic after I gave the presentation and I will blog more about the role soon. In the meantime check out my new book if you want more of my thinking, The Art of Agile Product Ownership.
Finally, while I was at Oredev I gave another presentation: Evolution: from #NoProjects to Continuous Digital (also available for download). This presentation itself was an evolution. So I’ve christened this version the “2020 edition” to distinguish it from the earlier version. I am attempting to do two things here:
One, be clear that the #NoProjects argument has itself moved forward. When #NoProjects began in 2013 the argument was very much “The project model is not a good fit for software development.” Now, as we approach 2020, the argument has moved on: business (and just about everything else) is digital, in a digital world advancement means technology (software) change. Therefore rather than following a start-stop-start-stop project model are organizations need to structure themselves for continuous digital technology enhancement.
Two, building on that argument I try to talk more about how our companies need to update their thinking. Specifically what does the new management model needs to look like?
More on all these subjects in my usual depth soon.
Like this post? – Like to receive these posts by e-mail?
I was pretty sure I had seen Borknagar support Cradle of Filth at the Astoria 2 in the ‘90s. It turns out that was Opeth and Extreme Noise Terror, so I don’t really remember how I got into them now.
Whatever the reason was, I really got into their 2000 album Quintessence. At the time I didn’t really enjoy their previous album, The Archaic Course, much so with the exception of the occasional relisten to Quintessence, Borknagar went by the wayside for me. That was until ICS Vortex got himself kicked out of Dimmu Borgir for allegedly poor performances, produced a really rather bland and unlistenable solo album called Storm Seeker, and then got back properly with Borknagar. That’s when things got interesting.
ICS Vortex has an incredible voice. When he joined Dimmu Borgir as bassist and second vocalist in time for Spiritual Black Dimensions, he brought a new dimension (pun intended) to an already amazing band. I’ve played Spiritual Black Dimensions to death since it came out and I think only Death Cult Armageddon is better.
ICS Vortex’s first album back with Borknagar is called Winter Thrice. Loving his voice and being bitterly disappointed with Storm Seeker I bought it desperately hoping for something more and I wasn’t disappointed. It’s an album with a cold feel and lyrical content about winter and the north. I loved it and played it constantly after release and regularly since. It’s progressive black metal which is the musical equivalent to walking through the snow early on a cold crisp morning.
This year Borknagar released a new album called True North. When I’ve loved an album intensely and the band brings out something new I always feel trepidation. Machine Head never bettered Burn My Eyes, WASP never bettered the Crimson Idol. I could go on, but you get the picture. True North is another album about winter and the north. So I ought to have been on safe ground, but then Arch Enemy have pretty much recorded the same album since Doomsday Machine, but never bettered it. They’re all good though.
My first listen to True North was tense, but it didn’t take long for that to dissipate. I had it on daily play for a few weeks, together with the new albums from Winterfylleth and Opeth. True North was so brilliant I thought it might be even better than Winter Thrice. So cautiously I tried Winter Thrice again, but I wasn’t disappointed to find it was the slightly better album. The brilliant thing is that I now have two similar, but different enough albums I can enjoy again and again and other than Enslaved’s In Times, I haven’t found anything else like it.
I hope they do what Evergrey did with Hymns for the Broken, The Storm Within and The Atlantic and make it a set of three. Cross your fingers for me.
It’s that time of year when students are looking for an interesting idea for a project (it might be a bit late for this year’s students, but I have been mulling over these ideas for a while, and might forget them by next year). A few years ago I listed some suggestions for student projects, as far as I know none got used, so let’s try again…
Checking the correctness of the Python compilers/interpreters. Lots of work has been done checking C compilers (e.g., Csmith), but I cannot find any serious work that has done the same for Python. There are multiple Python implementations, so it would be possible to do differential testing, another possibility is to fuzz test one or more compiler/interpreter and see how many crashes occur (the likely number of remaining fault producing crashes can be estimated from this data).
Talking to the Python people at the Open Source hackathon yesterday, testing of the compiler/interpreter was something they did not spend much time thinking about (yes, they run regression tests, but that seemed to be it).
Finding faults in published papers. There are tools that scan source code for use of suspect constructs, and there are various ways in which the contents of a published paper could be checked.
inconsistent statistics reported (e.g., “8 subjects aged between 18-25, average age 21.3″ may be correct because 21.3*8 == 170.4, ages must add to a whole number and the values 169, 170 and 171 would not produce this average), and various tools are available (e.g., GRIMMER).
Citation errors are relatively common, but hard to check automatically without a good database (I have found that a failure of a Google search to return any results is a very good indicator that the reference does not exist).
Number extraction. Numbers are some of the most easily checked quantities, and anybody interested in fact checking needs a quick way of extracting numeric values from a document. Sometimes numeric values appear as numeric words, and dates can appear as a mixture of words and numbers. Extracting numeric values, and their possible types (e.g., date, time, miles, kilograms, lines of code). Something way more sophisticated than pattern matching on sequences of digit characters is needed.
spaCy is my tool of choice for this sort of text processing task.
Some years ago we saw how we could use the Jacobi algorithm to find the eigensystem of a real valued symmetric matrix M, which is defined as the set of pairs of non-zero vectors vi and scalars λi that satisfy
M × vi = λi × vi
known as the eigenvectors and the eigenvalues respectively, with the vectors typically restricted to those of unit length in which case we can define its spectral decomposition as the product
M = V × Λ × VT
where the columns of V are the unit eigenvectors, Λ is a diagonal matrix whose ith diagonal element is the eigenvalue associated with the ith column of V and the T superscript denotes the transpose, in which the rows and columns of the matrix are swapped.
You may recall that this is a particularly convenient representation of the matrix since we can use it to generalise any scalar function to it with
f(M) = V × f(Λ) × VT
where f(Λ) is the diagonal matrix whose ith diagonal element is the result of applying f to the ith diagonal element of Λ.
You may also recall that I suggested that there's a more efficient way to find eigensystems and I think that it's high time that we took a look at it.
There is enough evidence in this chapter to slice-and-dice much of the nonsense that passes for software project wisdom. The problem is, there is no evidence to suggest what might be useful and effective theories of software development. My experience is that there is no point in debunking folktales unless there is something available to replace them. Nature abhors a vacuum; a debunked theory has to be replaced by something else, otherwise people continue with their existing beliefs.
There is still some polishing to be done, and a few promises of data need to be chased-up.
As always, if you know of any interesting software engineering data, please tell me.
Is the product owner role impossible to fill well?
Do we set product owners up to fail?
Have you ever worked with a really excellent product owner? Someone you would be eager to work with again?
The lack of really outstanding product owners isn’t the fault of the individuals. I think product owners are asked to do a difficult job and are not supported the way they should be. Worse still, in many organizations the role of product owners is misunderstood, they are seen as a type of delivery manager when in fact they are a type of product owner.
There questions have been on my mind for a while, next month I’m giving a new presentation I’m Oredev in Malmo – and which coincides perfectly with the publication of my new book The Art of Agile Product Ownership (funny that). So by way of preview…
1. Skills: the kind of thing a product owner learns on a Certified Scrum Product Owner course are table stakes. Yes POs need to be able to write user stories, split stories, write acceptance criteria, understand agile and scrum, work with teams, plan a little and so on. While necessary such skills are not sufficient.
The bigger question is:
How does a product owner know what they need to know in order to do these things?
How do they know what customers want?
How do they know what will make a difference?
Product owners need more skills. Some POs deliver products which must sell in the market to customers who have a choice. Such POs need to be able to identify customers, segment customers and markets, interview customers, analyse data, understand markets, monitor competitors and much more. In short they need the skills of a product manager.
Other POs work with internal customers who don’t have a choice over what product they use, here the PO needs other skills: stakeholder identification and management, business and process analysis, user observation and interviewing, they need to be aware of company politics and able to manage up. In other words, they need the skills of a business analyst.
And all POs need knowledge of their product domain. Many POs are POs because they are in fact subject matter experts.
That is a lot of skills for any one person. How many product owners have the right skills mix? And if they don’t, how many of them get the training they need?
2. Authority: Product owners need at least the authority to walk in to a planning meeting and state the work they would like done in the next two weeks. They need the authority to set this work without being contradicted by some other person, they need the authority to visit customers and get their expenses paid without having to provide a lengthy explation every time.
3. Legitimacy: Product owners need to be seen as the right person to set the priorities. The right person to visit customers, the right person to agree plans and write roadmaps. They need to be seen as the right person by the organisation, by peers and, most importantly, by the development team.
Authority and legitimacy are closely related but they are not the same thing. While the product owner needs both the lack of either results in the same problem: people don’t take their work seriously and other people try to set the agenda on what to build.
Unfortunately Scrum contains a seldom noticed problem here: product owners are team members, they are peers; the team are self organising and are responsible for delivering the product. (There is an egalitarian ethos even if this is only Implicit.)
But Scrum sets the PO as the one, and only one, who can tell he team what to do.
There is a contradiction.
4. Time: Product owners need time to do their work – which is a lot, just read that skills list and think about what the PO should be doing. And don’t forget the PO is a human being who needs to sleep for seven or eight hours a night, may well have a family and a home to go to.
When does the product owner get to do all of this?
Leave aside the question of where you find such people, or whether our companies pay them enough and ask yourself: do product owners get the support they need from their companies and teams?
So often the PO ends up in conflict with the company about what will be built and when it will be delivered, and they end up in conflict with their team about… well much the same issues every planning meeting.
Think about it: do we ask too much from our product owners?
Do we set up product owners to fail?
I’d love to hear your opinions, comment on this post or drop me a note or leave a comment.