Improving my vimrc live on stream

Andy Balaam from Andy Balaam's Blog

I was becoming increasingly uncomfortable with how crufty my neovim config was getting, and especially how I didn’t understand parts of it, so I decided to wipe it clean and rebuild it from scratch.

I did it live on stream, to make it feel like a worthwhile activity:

Headline features of the new vimrc:

  • A new theme, using the base16 theme framework
  • A file browser (NERDTree)
  • A minimalist status line with vim-airline
  • Search with ripgrep
  • Rust language support with Coc

Note: after the stream I managed to resolve the remaining issues with highlight colours not showing by triggering re-applying them after the theme has been applied:

augroup tabs_in_make
    autocmd!
    autocmd ColorScheme * highlight MatchParen cterm=none ctermbg=none ctermfg=green
augroup END

You can find my current neovim config at gitlab.com/andybalaam/configs/-/tree/main/.config/nvim.

Evidence-based Software Engineering: now in paperback form

Derek Jones from The Shape of Code

I made my Evidence-based Software Engineering book available as a pdf file. While making a printed version available looked possible, I was uncertain that the result would be of acceptable quality; the extensive use of color and an A4 page size restricted the number of available printers who could handle it. Email exchanges with several publishers suggested that the number of likely print edition copies sold would be small (based on experience with other books, under 100). The pdf was made available under a creative commons license.

Around half-million copies of the pdf have been downloaded (some partially).

A few weeks ago, I spotted a print version of this book on Amazon (USA). I have no idea who made this available. Is the quality any good? I was told that it was, so I bought a copy.

The printed version looks great, with vibrant colors, and is reasonably priced. It sits well in the hand, while reading. The links obviously don’t work for the paper version, but I’m well practised at using multiple fingers to record different book locations.

I have one report that the Kindle version doesn’t load on a Kindle or the web app.

If you love printed books, I heartily recommend the paperback version of Evidence-based Software Engineering; it even has a 5-star review on Amazon 😉

Programming language similarity based on their traits

Derek Jones from The Shape of Code

A programming language is sometimes described as being similar to another, more wide known, language.

How might language similarity be measured?

Biologists ask a very similar question, and research goes back several hundred years; phenetics (also known as taximetrics) attempts to classify organisms based on overall similarity of observable traits.

One answer to this question is based on distance matrices.

The process starts by flagging the presence/absence of each observed trait. Taking language keywords (or reserved words) as an example, we have (for a subset of C, Fortran, and OCaml):

            if   then  function   for   do   dimension  object
C            1     1       0       1     1       0        0
Fortran      1     0       1       0     1       1        0
OCaml        1     1       1       1     1       0        1

The distance between these languages is calculated by treating this keyword presence/absence information as an n-dimensional space, with each language occupying a point in this space. The following shows the Euclidean distance between pairs of languages (using the full dataset; code+data):

                C        Fortran      OCaml
C               0        7.615773     8.717798
Fortran      7.615773    0            8.831761
OCaml        8.717798    8.831761     0

Algorithms are available to map these distance pairs into tree form; for biological organisms this is known as a phylogenetic tree. The plot below shows such a tree derived from the keywords supported by 21 languages (numbers explained below, code+data):

Tree showing relative similarity of languages based on their keywords.

How confident should we be that this distance-based technique produced a robust result? For instance, would a small change to the set of keywords used by a particular language cause it to appear in a different branch of the tree?

The impact of small changes on the generated tree can be estimated using a bootstrap technique. The particular small-change algorithm used to estimate confidence levels for phylogenetic trees is not applicable for language keywords; genetic sequences contain multiple instances of four DNA bases, and can be sampled with replacement, while language keywords are a set of distinct items (i.e., cannot be sampled with replacement).

The bootstrap technique I used was: for each of the 21 languages in the data, was: add keywords to one language (the number added was 5% of the number of its existing keywords, randomly chosen from the set of all language keywords), calculate the distance matrix and build the corresponding tree, repeat 100 times. The 2,100 generated trees were then compared against the original tree, counting how many times each branch remained the same.

The numbers in the above plot show the percentage of generated trees where the same branching decision was made using the perturbed keyword data. The branching decisions all look very solid.

Can this keyword approach to language comparison be applied to all languages?

I think that most languages have some form of keywords. A few languages don’t use keywords (or reserved words), and there are some edge cases. Lisp doesn’t have any reserved words (they are functions), nor technically does Pl/1 in that the names of ‘word tokens’ can be defined as variables, and CHILL implementors have to choose between using Cobol or PL/1 syntax (giving CHILL two possible distinct sets of keywords).

To what extent are a language’s keywords representative of the language, compared to other languages?

One way to try and answer this question is to apply the distance/tree approach using other language traits; do the resulting trees have the same form as the keyword tree? The plot below shows the tree derived from the characters used to represent binary operators (code+data):

Tree showing relative similarity of languages based on their binary operator character representation.

A few of the branching decisions look as-if they are likely to change, if there are changes to the keywords used by some languages, e.g., OCaml and Haskell.

Binary operators don’t just have a character representation, they can also have a precedence and associativity (neither are needed in languages whose expressions are written using prefix or postfix notation).

The plot below shows the tree derived from combining binary operator and the corresponding precedence information (the distance pairs for the two characteristics, for each language, were added together, with precedence given a weight of 20%; see code for details).

Tree showing relative similarity of languages based on their binary operator character representation and corresponding precedence.

No bootstrap percentages appear because I could not come up with a simple technique for handling a combination of traits.

Are binary operators more representative of a language than its keywords? Would a combined keyword/binary operator tree would be more representative, or would more traits need to be included?

Does reducing language comparison to a single number produce something useful?

Languages contain a complex collection of interrelated components, and it might be more useful to compare their similarity by discrete components, e.g., expressions, literals, types (and implicit conversions).

What is the purpose of comparing languages?

If it is for promotional purposes, then a measurement based approach is probably out of place.

If the comparison has a source code orientation, weighting items by source code occurrence might produce a more applicable tree.

Sometimes one language is used as a reference, against which others are compared, e.g., C-like. How ‘C-like’ are other languages? Taking keywords as our reference-point, comparing languages based on just the keywords they have in common with C, the plot below is the resulting tree:

Tree showing similarity of languages based on the keywords they share with C.

I had expected less branching, i.e., more languages having the same distance from C.

New languages can be supported by adding a language file containing the appropriate trait information. There is a Github repo, prog-lang-traits, send me a pull request to add your language file.

It’s also possible to add support for more language traits.

On Pitfall – student

student from thus spake a.k.

Recall that in the Baron's latest wager, Sir R-----'s goal was to traverse a three by three checkerboard in steps determined by casts of a four sided die, each at a cost of two coins. Moving from left to right upon the first rank and advancing to the second upon its third file, thereafter from right to left and advancing upon the first file and finally from left to right again, he should have prevailed for a prize of twenty five coins had he landed upon the top right place. Frustrating his progress, however, were the rules that landing upon a black square dropped him back down to the first rank and that overshooting the last file upon the last rank required that he should move in reverse by as many places with which he had done so.

Devin Townsend at the Royal Albert Hall (again)

Paul Grenyer from Paul Grenyer

Leprous

There’s an obvious pull for me towards Leprous due to the association with Ihsahn and prog, but rock bands generally do little for me these days. I listened to a little of Aphelion before the gig, but it didn’t grip me.

They’re an odd live band and some of the time the cello player looked a bit out of place when he was without his cello. The sharing of the keyboards among various band members, often in the same song, was also weird. The singer was wearing a waistcoat and doing some very odd dancing and his voice can grate. For a prog band the lack of any guitar or keyboard lead breaks was also weird.

However, I quite enjoyed Leprous!


Devin Townsend

We’d only seen Devin Townsend a few months ago (in the summer at Bloodstock), but my wife loves him so we went again. We should have gone the night before as he played loads of songs we knew, in contrast to the night we went where he played nothing we knew! Most of it, I am reliably informed, was from the Ocean Machine and Infinity albums.

Devin still plays brilliantly and it was great to see him again with the session musicians he’d teamed up with for his Bloodstock performance. He creates a fantastic wall of sound and engages with the crowd like few others. I’m sure we’ll go and see him again, after all we’ve not heard Hyperdrive live yet!

Agile OKRs extra – yet another book

Allan Kelly from Allan Kelly

I blogged last week that I had begun work on a new book – How I Write Books which is now a work in progress at LeanPub – signup and be the first to know when the draft is published.

Well a funny thing happened while I was setting up my tool chain to write that book: I found another book! Well, perhaps half a book is a better description.

Succeeding with OKRs in Agile Extra is a companion to last year’s best seller, Succeeding with OKRs in Agile. But it isn’t a complete book in its own right, it isn’t really a sequel, it is a companion. It contains a mix of material. Material which didn’t really fit in the first book, material with was’t needed, ideas which didn’t develop far enough and some unfinished chapters.

As such it is like my Xanpan Appendix, unused material which is still interesting and might appear elsewhere in time.

I really want to work on How I write books so I don’t have any immediate plans to progress extra. If you enjoyed Succeeding with OKRs in Agile, if you would like to know more, or if you would like to just see how a writer’s mind works check out Succeeding with OKRs in Agile Extra.

The post Agile OKRs extra – yet another book appeared first on Allan Kelly.

Ecology as a model for the software world

Derek Jones from The Shape of Code

Changing two words in the Wikipedia description of Ecology gives “… the study of the relationships between software systems, including humans, and their physical environment”; where physical environment might be taken to include the hardware on which software runs and the hardware whose behavior it controls.

What do ecologists study? Wikipedia lists the following main areas; everything after the first sentence, in each bullet point, is my wording:

  • Life processes, antifragility, interactions, and adaptations.

    Software system life processes include its initial creation, devops, end-user training, and the sales and marketing process.

    While antifragility is much talked about, it is something of a niche research topic. Those involved in the implementations of safety-critical systems seem to be the only people willing to invest the money needed to attempt to build antifragile software. Is N-version programming the poster child for antifragile system software?

    Interaction with a widely used software system will have an influence on the path taken by cultures within associated microdomains. Users adapt their behavior to the affordance offered by a software system.

    A successful software system (and even unsuccessful ones) will exist in multiple forms, i.e., there will be a product line. Software variability and product lines is an active research area.

  • The movement of materials and energy through living communities.

    Is money the primary unit of energy in software ecosystems? Developer time is needed to create software, which may be paid for or donated for free. Supporting a software system, or rather supporting the needs of the users of the software is often motivated by a salary, although a few do provide limited free support.

    What is the energy that users of software provide? Money sits at the root; user attention sells product.

  • The successional development of ecosystems (“… succession is the process of change in the species structure of an ecological community over time.”)

    Before the Internet, monthly computing magazines used to run features on the changing landscape of the computer world. These days, we have blogs/podcasts telling us about the latest product release/update. The Ecosystems chapter of my software engineering book has sections on evolution and lifespan, but the material is sparse.

    Over the longer term, this issue is the subject studied by historians of computing.

    Moore’s law is probably the most famous computing example of succession.

  • Cooperation, competition, and predation within and between species.

    These issues are primarily discussed by those interested in the business side of software. Developers like to brag about how their language/editor/operating system/etc is better than the rest, but there is no substance to the discussion.

    Governments have an interest in encouraging effective competition, and have enacted various antitrust laws.

  • The abundance, biomass, and distribution of organisms in the context of the environment.

    These are the issues where marketing departments invest in trying to shift the distribution in their company’s favour, and venture capitalists spend their time trying to spot an opportunity (and there is the clickbait of language popularity articles).

    The abundance of tools/products, in an ecosystem, does not appear to deter people creating new variants (suggesting that perhaps ambition or dreams are the unit of energy for software ecosystems).

  • Patterns of biodiversity and its effect on ecosystem processes.

    Various kinds of diversity are important for biological systems, e.g., the mutual dependencies between different species in a food chain, and genetic diversity as a resource that provides a mechanism for species to adapt to changes in their environment.

    It’s currently fashionable to be in favour of diversity. Diversity is so popular in ecology that a 2003 review listed 24 metrics for calculating it. I’m sure there are more now.

    Diversity is not necessarily desired in software systems, e.g., the runtime behavior of source code should not depend on the compiler used (there are invariably edge cases where it does), and users want different editor command to be consistently similar.

    Open source has helped to reduce diversity for some applications (by reducing the sales volume of a myriad of commercial offerings). However, the availability of source code significantly reduces the cost/time needed to create close variants. The 5,000+ different cryptocurrencies suggest that the associated software is diverse, but the rapid evolution of this ecosystem has driven developers to base their code on the source used to implement earlier currencies.

    Governments encourage competitive commercial ecosystems because competition discourages companies charging high prices for their products, just because they can. Being competitive requires having products that differ from other vendors in a desirable way, which generates diversity.

How I write books – A book about books

Allan Kelly from Allan Kelly

In the last 15 years I’ve written and published 3 books with publishers, published 5 books myself, plus edited one conference proceedings and pushed out three “mini books” (one with 3 editions) which I never publicised.

In addition I’ve contributed forwards and chapters to at least six books and had two books translated.

Then there are countless magazine and journal articles but they stretch back further, closer to 25 years – and this blog from 2005.

Not bad for a kid who was thrown out of school after asking a teacher how to spell “at” – age of eight, a diagnosed dyslexic who had to learn to read three times – I can’t read my own handwriting its so bad.

As a result I’ve learned a lot about writing and publishing. In the last few years I’ve spoken to many people who want to know how to write and publish their own book. A couple of years ago Steve Smith suggested I write a book about writing books. I’ve been avoiding that until this month.

Now I’ve started: How I write books, https://leanpub.com/howIwrite – sign-up to be the first to known when the MVP is published. And if there is anything you would like me to write about in the book please let me know.

The post How I write books – A book about books appeared first on Allan Kelly.

NoEstimates panders to mismanagement and developer insecurity

Derek Jones from The Shape of Code

Why do so few software development teams regularly attempt to estimate the duration of the feature/task/functionality they are going to implement?

Developers hate giving estimates; estimating is very hard and estimates are often inaccurate (at a minimum making the estimator feel uncomfortable and worse when management treats an estimate as a quotation). The future is uncertain and estimating provides guidance.

Managers tell me that the fear of losing good developers dissuades them from requiring teams to make estimates. Developers have told them that they would leave a company that required them to regularly make estimates.

For most of the last 70 years, demand for software developers has outstripped supply. Consequently, management has to pay a lot more attention to the views of software developers than the views of those employed in most other roles (at least if they want to keep the good developers, i.e., those who will have no problem finding another job).

It is not difficult for developers to get a general idea of how their salary, working conditions and practices compares with other developers in their field/geographic region. They know that estimating is not a common practice, and unless the economy is in recession, finding a new job that does not require estimation could be straight forward.

Management’s demands for estimates has led to the creation of various methods for calculating proxy estimate values, none of which using time as the unit of measure, e.g., Function points and Story points. These methods break the requirements down into smaller units, and subcomponents from these units are used to calculate a value, e.g., the Function point calculation includes items such as number of user inputs and outputs, and number of files.

How accurate are these proxy values, compared to time estimates?

As always, software engineering data is sparse. One analysis of 149 projects found that Cost approx FunctionPoints^{0.75}, with the variance being similar to that found when time was estimated. An analysis of Function point calculation data found a high degree of consistency in the calculations made by different people (various Function point organizations have certification schemes that require some degree of proficiency to pass).

Managers don’t seem to be interested in comparing estimated Story points against estimated time, preferring instead to track the rate at which Story points are implemented, e.g., velocity, or burndown. There are tiny amounts of data comparing Story points with time and Function points.

The available evidence suggests a relationship connecting Function points to actual time, and that Function points have similar error bounds to time estimates; the lack of data means that Story points are currently just a source of technobabble and number porn for management power-points (send me Story point data to help change this situation).

Testing Our Students – a.k.

a.k. from thus spake a.k.

Last time we saw how we can use the chi-squared distribution to test whether a sample of values is consistent with pre-supposed expectations. A couple of months ago we took a look at Student's t-distribution which we can use to test whether a set of observations of a normally distributed random variable are consistent with its having a given mean when its variance is unknown.