Validate in Production

Chris Oldwood from The OldWood Thing

The change was reasonably simple: we had to denormalise some postcode data which was currently held in a centralised relational database into some new fields in every client’s database to remove some cross-database joins that would be unsupported on the new SQL platform we were migrating too [1].

As you might imagine the database schema changes were fairly simple – we just needed to add the new columns as nullable strings into every database. The next step was to update the service code to start populating these new fields as addresses were added or edited by using data from the centralised postcode database [2].

At this point any new data or data that changed going forward would have the correctly denormalised state. However we still needed to fix up any existing data and that’s the focus of this post.

Migration Plan

To fix-up all the existing client data we needed to write a tool which would load each client’s address data that was missing its new postcode data, look it up against the centralised list, and then write back any changes. Given we were still using the cross-database joins in live for the time being to satisfy the existing reports we could roll this out in the background and avoiding putting any unnecessary load on the database cluster.

The tool wasn’t throw-away because the postcode dataset gets updated regularly and so the denormalised client data needs to be refreshed whenever the master list is updated. (This would not be that often but enough to make it worth spending a little extra time writing a reusable tool for the job for ops to run.)

Clearly this isn’t rocket science, it just requires loading the centralised data into a map, fetching the client’s addresses, looking them up, and writing back the relevant fields. The tool only took a few hours to write and test and so it was ready to run for the next release during a quiet period.

When that moment arrived the tool was run across the hundreds of client databases and plenty of data was fixed up in the process, so the task appeared completed.

Next Steps

With all the existing postcode data now correctly populated too we should have been in a position to switch the report generation feature toggle on so that it used the new denormalised data instead of doing a cross-database join to the existing centralised store.

While the team were generally confident in the changes to date I suggested we should just do a sanity check and make sure that everything was working as intended as I felt this was a reasonably simple check to run.

An initial SQL query someone knocked up just checked how many of the new fields had been populated and the numbers seemed about right, i.e. very high (we’d expect some addresses to be missing data due to missing postcodes, typos and stale postcode data). However I still felt that we should be able to get a definitive answer with very little effort by leveraging the existing we SQL we were about to discard, i.e. use the cross-database join one last time to verify the data population more precisely.

Close, but No Cigar

I massaged the existing report query to show where data from the dynamic join was different to that in the new columns that had been added (again, not rocket science). To our surprise there were quite a significant number of discrepancies.

Fortunately it didn’t take long to work out that those addresses which were missing postcode data all had postcodes which were at least partially written in lowercase whereas the ones that had worked were entirely written in uppercase.

Hence the bug was fairly simple to track down. The tool loaded the postcode data into a dictionary (map) keyed on the string postcode and did a straight lookup which is case-sensitive by default. A quick change to use a case-insensitive comparison and the tool was fixed. The data was corrected soon after and the migration verified.

Why didn’t this show up in the initial testing? Well, it turned out the tools used to generate the test data sets and also to anonymize real client databases were somewhat simplistic and this helped to provide a false level of confidence in the new tool.

Testing in Production

Whenever we make a change to our system it’s important that we verify we’ve delivered what we intended. Oftentimes the addition of a feature has some impact on the front-end and the customer and therefore it’s fairly easy to see if it’s working or not. (The customer usually has something to say about it.)

However back-end changes can be harder to verify thoroughly, but it’s still important that we do the best we can to ensure they have the expected effect. In this instance we could easily check every migrated address within a reasonable time frame and know for sure, but on large data sets this might unfeasible so you might have to settle for less. Also the use of feature switches and incremental delivery meant that even though there was a bug it did not affect the customers and we were always making forward progress.

Testing does not end with a successful run of the build pipeline or a sign-off from a QA team – it also needs to work in real life too. Ideally the work we put in up-front will make that more likely but for some classes of change, most notably where actual customer data is involved, we need to follow through and ensure that practice and theory tie up.

 

[1] Storage limitations and other factors precluded simply moving the entire postcode database into each customer DB before moving platforms. The cost was worth it to de-risk the overall migration.

[2] There was no problem with the web service having two connections to two different databases, we just needed to stop writing SQL queries that did cross-database joins.

Adjectives in source code analysis

Derek Jones from The Shape of Code

The use of adjectives to analysis source code is something of a specialist topic. This post can only increase the number of people using adjectives for this purpose (because I don’t know anybody else who does 😉

Until recently the only adjective related property I used to help analyse source was relative order. When using multiple adjective, people have a preferred order, e.g., in English size comes before color, as in “big red” (“red big” sounds wrong), and adjectives always appear before the noun they modify. Native speakers of different languages have various preferred orders. Source code may appear to only contain English words, but adjective order can provide a clue about the native language of the developer who wrote it, because native ordering leaks into English usage.

Searching for adjective patterns (or any other part-of-speech pattern) in identifiers used to be complicated (identifiers first had to be split into their subcomponents). Now, thanks to Vadim Markovtsev, 49 million token-split identifiers are available. Happy adjective pattern matching (Size Shape Age Color is a common order to start with; adjective pairs are found in around 0.1% of identifiers; some scripts).

Until recently, gradable adjectives were something that I had been vaguely aware of; these kinds of adjectives indicate a position on a scale, e.g., hot/warm/cold water. The study Grounding Gradable Adjectives through Crowdsourcing included some interesting data on the perceived change of an attribute produced by the presence by a gradable adjective. The following plot shows perceived change in quantity produced by some quantity adjectives (code+data):

Gradable adjective ranking.

How is information about gradable adjectives useful for analyzing source code?

One pattern that jumps out of the plot is that variability, between people, increases as the magnitude specified by the adjective increases (the x-axis shows standard deviations from the mean response). Perhaps the x-axis should use a log scale, there are lots of human related response characteristics that are linear on a log scale (I’m using the same scale as the authors of the study; the authors were a lot more aggressive in removing outliers than I have been), e.g., response to loudness of sound and Hick’s law.

At the moment, it looks as if my estimate of the value of a “small x” is going to be relatively closer to another developers “small x“, than our relative estimated value for a “huge x“.

Retrospective cards, product Owners and #NoProjects

Allan Kelly from Allan Kelly Associates

Sample-2019-11-10-16-30.jpg

A quick follow up on my last two blog post.

First, Team Retrospective cards – above – are now available for sale:

Both sites accept other credit cards so don’t worry if you have another currency and we can post anywhere – if you get stuck get in touch and we’ll find a way that works.

Second, as discussed in my last blog – Mission Impossible: the Product Owner – I delivered a presentation on that subject at the Oredev conference in Malmo last week. The slides are available for download: Mission Impossible: the Product Owner.

In retrospect I think the presentation should have had a big question mark (“?”) in the title. In many ways I’m asking “Is the Product Owner role impossible to fill well?”. I had some really good discussions on this topic after I gave the presentation and I will blog more about the role soon. In the meantime check out my new book if you want more of my thinking, The Art of Agile Product Ownership.

Finally, while I was at Oredev I gave another presentation: Evolution: from #NoProjects to Continuous Digital (also available for download). This presentation itself was an evolution. So I’ve christened this version the “2020 edition” to distinguish it from the earlier version. I am attempting to do two things here:

One, be clear that the #NoProjects argument has itself moved forward. When #NoProjects began in 2013 the argument was very much “The project model is not a good fit for software development.” Now, as we approach 2020, the argument has moved on: business (and just about everything else) is digital, in a digital world advancement means technology (software) change. Therefore rather than following a start-stop-start-stop project model are organizations need to structure themselves for continuous digital technology enhancement.

Two, building on that argument I try to talk more about how our companies need to update their thinking. Specifically what does the new management model needs to look like?

More on all these subjects in my usual depth soon.

Like this post? – Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

New book: The Art of Agile Product Ownership

The post Retrospective cards, product Owners and #NoProjects appeared first on Allan Kelly Associates.

Borknagar

Paul Grenyer from Paul Grenyer

I was pretty sure I had seen Borknagar support Cradle of Filth at the Astoria 2 in the ‘90s. It turns out that was Opeth and Extreme Noise Terror, so I don’t really remember how I got into them now.

Whatever the reason was, I really got into their 2000 album Quintessence. At the time I didn’t really enjoy their previous album, The Archaic Course, much so with the exception of the occasional relisten to Quintessence, Borknagar went by the wayside for me.  That was until ICS Vortex got himself kicked out of Dimmu Borgir for allegedly poor performances, produced a really rather bland and unlistenable solo album called Storm Seeker, and then got back properly with Borknagar.  That’s when things got interesting.

ICS Vortex has an incredible voice. When he joined Dimmu Borgir as bassist and second vocalist in time for Spiritual Black Dimensions, he brought a new dimension (pun intended) to an already amazing band. I’ve played Spiritual Black Dimensions to death since it came out and I think only Death Cult Armageddon is better.

ICS Vortex’s first album back with Borknagar is called Winter Thrice. Loving his voice and being bitterly disappointed with Storm Seeker I bought it desperately hoping for something more and I wasn’t disappointed. It’s an album with a cold feel and lyrical content about winter and the north. I loved it and played it constantly after release and regularly since. It’s progressive black metal which is the musical equivalent to walking through the snow early on a cold crisp morning.

This year Borknagar released a new album called True North. When I’ve loved an album intensely and the band brings out something new I always feel trepidation. Machine Head never bettered Burn My Eyes, WASP never bettered the Crimson Idol. I could go on, but you get the picture. True North is another album about winter and the north. So I ought to have been on safe ground, but then Arch Enemy have pretty much recorded the same album since Doomsday Machine, but never bettered it. They’re all good though.

My first listen to True North was tense, but it didn’t take long for that to dissipate. I had it on daily
play for a few weeks, together with the new albums from Winterfylleth and Opeth. True North was so brilliant I thought it might be even better than Winter Thrice. So cautiously I tried Winter Thrice again, but I wasn’t disappointed to find it was the slightly better album. The brilliant thing is that I now have two similar, but different enough albums I can enjoy again and again and other than Enslaved’s In Times, I haven’t found anything else like it.

I hope they do what Evergrey did with Hymns for the Broken, The Storm Within and The Atlantic and make it a set of three. Cross your fingers for me.

Student projects for 2019/2020

Derek Jones from The Shape of Code

It’s that time of year when students are looking for an interesting idea for a project (it might be a bit late for this year’s students, but I have been mulling over these ideas for a while, and might forget them by next year). A few years ago I listed some suggestions for student projects, as far as I know none got used, so let’s try again…

Checking the correctness of the Python compilers/interpreters. Lots of work has been done checking C compilers (e.g., Csmith), but I cannot find any serious work that has done the same for Python. There are multiple Python implementations, so it would be possible to do differential testing, another possibility is to fuzz test one or more compiler/interpreter and see how many crashes occur (the likely number of remaining fault producing crashes can be estimated from this data).

Talking to the Python people at the Open Source hackathon yesterday, testing of the compiler/interpreter was something they did not spend much time thinking about (yes, they run regression tests, but that seemed to be it).

Finding faults in published papers. There are tools that scan source code for use of suspect constructs, and there are various ways in which the contents of a published paper could be checked.

Possible checks include (apart from grammar checking):

Number extraction. Numbers are some of the most easily checked quantities, and anybody interested in fact checking needs a quick way of extracting numeric values from a document. Sometimes numeric values appear as numeric words, and dates can appear as a mixture of words and numbers. Extracting numeric values, and their possible types (e.g., date, time, miles, kilograms, lines of code). Something way more sophisticated than pattern matching on sequences of digit characters is needed.

spaCy is my tool of choice for this sort of text processing task.

FAO The Householder – a.k.

a.k. from thus spake a.k.

Some years ago we saw how we could use the Jacobi algorithm to find the eigensystem of a real valued symmetric matrix M, which is defined as the set of pairs of non-zero vectors vi and scalars λi that satisfy

    M × vi = λi × vi

known as the eigenvectors and the eigenvalues respectively, with the vectors typically restricted to those of unit length in which case we can define its spectral decomposition as the product

    M = V × Λ × VT

where the columns of V are the unit eigenvectors, Λ is a diagonal matrix whose ith diagonal element is the eigenvalue associated with the ith column of V and the T superscript denotes the transpose, in which the rows and columns of the matrix are swapped.
You may recall that this is a particularly convenient representation of the matrix since we can use it to generalise any scalar function to it with

    f(M) = V × f(Λ) × VT

where f(Λ) is the diagonal matrix whose ith diagonal element is the result of applying f to the ith diagonal element of Λ.
You may also recall that I suggested that there's a more efficient way to find eigensystems and I think that it's high time that we took a look at it.