Adjectives in source code analysis

Derek Jones from The Shape of Code

The use of adjectives to analysis source code is something of a specialist topic. This post can only increase the number of people using adjectives for this purpose (because I don’t know anybody else who does 😉

Until recently the only adjective related property I used to help analyse source was relative order. When using multiple adjective, people have a preferred order, e.g., in English size comes before color, as in “big red” (“red big” sounds wrong), and adjectives always appear before the noun they modify. Native speakers of different languages have various preferred orders. Source code may appear to only contain English words, but adjective order can provide a clue about the native language of the developer who wrote it, because native ordering leaks into English usage.

Searching for adjective patterns (or any other part-of-speech pattern) in identifiers used to be complicated (identifiers first had to be split into their subcomponents). Now, thanks to Vadim Markovtsev, 49 million token-split identifiers are available. Happy adjective pattern matching (Size Shape Age Color is a common order to start with; adjective pairs are found in around 0.1% of identifiers; some scripts).

Until recently, gradable adjectives were something that I had been vaguely aware of; these kinds of adjectives indicate a position on a scale, e.g., hot/warm/cold water. The study Grounding Gradable Adjectives through Crowdsourcing included some interesting data on the perceived change of an attribute produced by the presence by a gradable adjective. The following plot shows perceived change in quantity produced by some quantity adjectives (code+data):

Gradable adjective ranking.

How is information about gradable adjectives useful for analyzing source code?

One pattern that jumps out of the plot is that variability, between people, increases as the magnitude specified by the adjective increases (the x-axis shows standard deviations from the mean response). Perhaps the x-axis should use a log scale, there are lots of human related response characteristics that are linear on a log scale (I’m using the same scale as the authors of the study; the authors were a lot more aggressive in removing outliers than I have been), e.g., response to loudness of sound and Hick’s law.

At the moment, it looks as if my estimate of the value of a “small x” is going to be relatively closer to another developers “small x“, than our relative estimated value for a “huge x“.

Retrospective cards, product Owners and #NoProjects

Allan Kelly from Allan Kelly Associates


A quick follow up on my last two blog post.

First, Team Retrospective cards – above – are now available for sale:

Both sites accept other credit cards so don’t worry if you have another currency and we can post anywhere – if you get stuck get in touch and we’ll find a way that works.

Second, as discussed in my last blog – Mission Impossible: the Product Owner – I delivered a presentation on that subject at the Oredev conference in Malmo last week. The slides are available for download: Mission Impossible: the Product Owner.

In retrospect I think the presentation should have had a big question mark (“?”) in the title. In many ways I’m asking “Is the Product Owner role impossible to fill well?”. I had some really good discussions on this topic after I gave the presentation and I will blog more about the role soon. In the meantime check out my new book if you want more of my thinking, The Art of Agile Product Ownership.

Finally, while I was at Oredev I gave another presentation: Evolution: from #NoProjects to Continuous Digital (also available for download). This presentation itself was an evolution. So I’ve christened this version the “2020 edition” to distinguish it from the earlier version. I am attempting to do two things here:

One, be clear that the #NoProjects argument has itself moved forward. When #NoProjects began in 2013 the argument was very much “The project model is not a good fit for software development.” Now, as we approach 2020, the argument has moved on: business (and just about everything else) is digital, in a digital world advancement means technology (software) change. Therefore rather than following a start-stop-start-stop project model are organizations need to structure themselves for continuous digital technology enhancement.

Two, building on that argument I try to talk more about how our companies need to update their thinking. Specifically what does the new management model needs to look like?

More on all these subjects in my usual depth soon.

Like this post? – Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

New book: The Art of Agile Product Ownership

The post Retrospective cards, product Owners and #NoProjects appeared first on Allan Kelly Associates.


Paul Grenyer from Paul Grenyer

I was pretty sure I had seen Borknagar support Cradle of Filth at the Astoria 2 in the ‘90s. It turns out that was Opeth and Extreme Noise Terror, so I don’t really remember how I got into them now.

Whatever the reason was, I really got into their 2000 album Quintessence. At the time I didn’t really enjoy their previous album, The Archaic Course, much so with the exception of the occasional relisten to Quintessence, Borknagar went by the wayside for me.  That was until ICS Vortex got himself kicked out of Dimmu Borgir for allegedly poor performances, produced a really rather bland and unlistenable solo album called Storm Seeker, and then got back properly with Borknagar.  That’s when things got interesting.

ICS Vortex has an incredible voice. When he joined Dimmu Borgir as bassist and second vocalist in time for Spiritual Black Dimensions, he brought a new dimension (pun intended) to an already amazing band. I’ve played Spiritual Black Dimensions to death since it came out and I think only Death Cult Armageddon is better.

ICS Vortex’s first album back with Borknagar is called Winter Thrice. Loving his voice and being bitterly disappointed with Storm Seeker I bought it desperately hoping for something more and I wasn’t disappointed. It’s an album with a cold feel and lyrical content about winter and the north. I loved it and played it constantly after release and regularly since. It’s progressive black metal which is the musical equivalent to walking through the snow early on a cold crisp morning.

This year Borknagar released a new album called True North. When I’ve loved an album intensely and the band brings out something new I always feel trepidation. Machine Head never bettered Burn My Eyes, WASP never bettered the Crimson Idol. I could go on, but you get the picture. True North is another album about winter and the north. So I ought to have been on safe ground, but then Arch Enemy have pretty much recorded the same album since Doomsday Machine, but never bettered it. They’re all good though.

My first listen to True North was tense, but it didn’t take long for that to dissipate. I had it on daily
play for a few weeks, together with the new albums from Winterfylleth and Opeth. True North was so brilliant I thought it might be even better than Winter Thrice. So cautiously I tried Winter Thrice again, but I wasn’t disappointed to find it was the slightly better album. The brilliant thing is that I now have two similar, but different enough albums I can enjoy again and again and other than Enslaved’s In Times, I haven’t found anything else like it.

I hope they do what Evergrey did with Hymns for the Broken, The Storm Within and The Atlantic and make it a set of three. Cross your fingers for me.

Student projects for 2019/2020

Derek Jones from The Shape of Code

It’s that time of year when students are looking for an interesting idea for a project (it might be a bit late for this year’s students, but I have been mulling over these ideas for a while, and might forget them by next year). A few years ago I listed some suggestions for student projects, as far as I know none got used, so let’s try again…

Checking the correctness of the Python compilers/interpreters. Lots of work has been done checking C compilers (e.g., Csmith), but I cannot find any serious work that has done the same for Python. There are multiple Python implementations, so it would be possible to do differential testing, another possibility is to fuzz test one or more compiler/interpreter and see how many crashes occur (the likely number of remaining fault producing crashes can be estimated from this data).

Talking to the Python people at the Open Source hackathon yesterday, testing of the compiler/interpreter was something they did not spend much time thinking about (yes, they run regression tests, but that seemed to be it).

Finding faults in published papers. There are tools that scan source code for use of suspect constructs, and there are various ways in which the contents of a published paper could be checked.

Possible checks include (apart from grammar checking):

Number extraction. Numbers are some of the most easily checked quantities, and anybody interested in fact checking needs a quick way of extracting numeric values from a document. Sometimes numeric values appear as numeric words, and dates can appear as a mixture of words and numbers. Extracting numeric values, and their possible types (e.g., date, time, miles, kilograms, lines of code). Something way more sophisticated than pattern matching on sequences of digit characters is needed.

spaCy is my tool of choice for this sort of text processing task.

FAO The Householder – a.k.

a.k. from thus spake a.k.

Some years ago we saw how we could use the Jacobi algorithm to find the eigensystem of a real valued symmetric matrix M, which is defined as the set of pairs of non-zero vectors vi and scalars λi that satisfy

    M × vi = λi × vi

known as the eigenvectors and the eigenvalues respectively, with the vectors typically restricted to those of unit length in which case we can define its spectral decomposition as the product

    M = V × Λ × VT

where the columns of V are the unit eigenvectors, Λ is a diagonal matrix whose ith diagonal element is the eigenvalue associated with the ith column of V and the T superscript denotes the transpose, in which the rows and columns of the matrix are swapped.
You may recall that this is a particularly convenient representation of the matrix since we can use it to generalise any scalar function to it with

    f(M) = V × f(Λ) × VT

where f(Λ) is the diagonal matrix whose ith diagonal element is the result of applying f to the ith diagonal element of Λ.
You may also recall that I suggested that there's a more efficient way to find eigensystems and I think that it's high time that we took a look at it.