human language – ACCU World of Code

Derek Jones from The Shape of Code

This year’s list of books for Christmas, or Isaac Newton’s birthday (in the Julian calendar in use when he was born), returns to its former length, and even includes a book published this year. My book Evidence-based Software Engineering also became available in paperback form this year, and would look great on somebodies’ desk.

The Mars Project by Wernher von Braun, first published in 1953, is a 91-page high-level technical specification for an expedition to Mars (calculated by one man and his slide-rule). The subjects include the orbital mechanics of travelling between Earth and Mars, the complications of using a planet’s atmosphere to slow down the landing craft without burning up, and the design of the spaceships and rockets (the bulk of the material). The one subject not covered is cost; von Braun’s estimated 950 launches of heavy-lift launch vehicles, to send a fleet of ten spacecraft with 70 crew, will not be cheap. I’ve no idea what today’s numbers might be.

The Fabric of Civilization: How textiles made the world by Virginia Postrel is a popular book full of interesting facts about the economic and cultural significance of something we take for granted today (or at least I did). For instance, Viking sails took longer to make than the ships they powered, and spinning the wool for the sails on King Canute‘s North Sea fleet required around 10,000 work years.

Wyclif’s Dust: Western Cultures from the Printing Press to the Present by David High-Jones is covered in an earlier post.

The Second World Wars: How the First Global Conflict Was Fought and Won by Victor Davis Hanson approaches the subject from a systems perspective. How did the subsystems work together (e.g., arms manufacturers and their customers, the various arms of the military/politicians/citizens), the evolution of manufacturing and fighting equipment (the allies did a great job here, Germany not very good, and Japan/Italy terrible) to increase production/lethality, and the prioritizing of activities to achieve aims. The 2011 Christmas books listed “Europe at War” by Norman Davies, which approaches the war from a data perspective.

Through the Language Glass: Why the world looks different in other languages by Guy Deutscher is a science driven discussion (written in a popular style) of the impact of language on the way its speakers interpret their world. While I have read many accounts of the Sapir–Whorf hypothesis, this book was the first to tell me that 70 years earlier, both William Gladstone (yes, that UK prime minister and Homeric scholar) and Lazarus Geiger had proposed theories of color perception based on the color words commonly used by the speakers of a language.

Derek Jones from The Shape of Code

When creating a variable name, to what extent are developers influenced by their native human language?

There is lots of evidence that variable names are either English words, abbreviations of English words, or some combination of these two. Source code containing a large percentage of identifiers using words from other languages does exist, but it requires effort to find; there is a widely expressed view that source should be English based (based on my experience of talking to non-native English speakers, and even the odd paper discussing the issue, e.g., Language matters).

Given that variable names can prove information that reduces the effort needed to understand code, and that most code is only ever read by the person who wrote it, developers should make the most of their expertise in using their native language.

To what extent do non-native English-speaking developers make use of their non-English native language?

I have found it very difficult to even have a discussion around this question. When I broach the subject with non-native English speakers, the response is often along the lines of “our develo0pers speak good English.” I am careful to set the scene by telling them of my interest in naming, and that I think there are benefits for developers to make use of their native language. The use of non-English languages in software development is not yet a subject that is open for discussion.

I knew that sooner or later somebody would run an experiment…

How Developers Choose Names is another interesting experiment involving Dror Feitelson (the paper rather confusingly refers to it as a survey, a post on an earlier experiment).

What makes this experiment interesting is that bilingual subjects (English and Hebrew) were used, and the questions were in English or Hebrew. The 230 subjects (some professional, some student) were given a short description and asked to provide an appropriate variable/function/data-structure name; English was used for 26 of the question, and Hebrew for the other 21 questions, and subjects answered a random subset.

What patterns of Hebrew usage are present in the variable names?

Out of 2017 answers, 14 contained Hebrew characters, i.e., not enough for statistical analysis. This does not mean that all the other variable names were only derived from English words, in some cases Hebrew words appeared via transcription using the 26 English letters. For instance, using “pinuk” for the Hebrew word that means “benefit” in English. Some variables were created from a mixture of Hebrew and English words, e.g., deservedPinuks and pinuksUsed.

Analysing this data requires someone who is fluent in Hebrew and English. I am not a fluent, or even non-fluent, Hebrew speaker. My role in this debate is encouraging others, and at last I have some interesting data to show people.

The paper spends time showing how for personal preferences result in a wide selection of names being chosen by different people for the same quantity. I cannot think of any software engineering papers that have addressed this issue for variable names, but there is lots of evidence from other fields; also see figure 7.33.

Those interested in searching source code for the impact of native-language might like to look at the names of variables appearing as operands of the bitwise and logical operators. Some English words occur much more frequently in the names of these variable, compared to variables that are operands of arithmetic operators, e.g., flag, status, and signal. I predict that non-native English-speaking developers will make use of corresponding non-English words.

Category: human language

Christmas books for 2022

Impact of native language on variable naming