Run bash inside any version of Linux using Docker

Andy Balaam from Andy Balaam's Blog

Docker is useful for some things, and not as useful as you think for others.

Here’s something massively useful: get a throwaway bash prompt inside any version of any Linux distribution in one command:

docker run -i -t --mount "type=bind,src=$HOME/Desktop,dst=/Desktop" ubuntu:18.10 bash

This command downloads a recent Ubuntu 18.10 image, mounts my desktop as /Desktop in the container, and gives me a bash prompt. From here I can install any packages I want and then use them.

For example, today I used it to decrypt a file that was encrypted with a cipher my main OS did not have a package for.

When I exit bash, the container stops and I can find it with docker ps -a then remove it with docker rm. To really clean up I can find the downloaded images with docker image ls and remove them with docker image rm.

Changes in the shape of code during the twenties?

Derek Jones from The Shape of Code

At the end of 2009 I made two predictions for the next decade; Chinese and Indian developers having a major impact on the shape of code (ok, still waiting for this to happen), and scripting languages playing a significant role (got that one right, but then they were already playing a large role).

Since this blog has just entered its second decade, I will bring the next decade’s predictions forward a year.

I don’t see any new major customer ecosystems appearing. Ecosystems are the drivers of software development, and no new ecosystems has several consequences, including:

  • No major new languages: Creating a language is a vanity endeavor. Vanity project can take off if they are in the right place at the right time. New ecosystems provide opportunities for new languages to become widely used by being in at the start and growing with the ecosystem. There is another opportunity locus; it is fashionable for companies that see themselves as thought-leaders to have their own language, e.g., Google, Apple, and Mozilla. Invent your language at the right time, while working for a thought-leader company and your language could become well-known enough to take-off.

    I don’t see any major new ecosystems appearing and all the likely companies already have their own language.

    Any new language also faces the problem of not having a large collection packages.

  • Software will be more thoroughly tested: When an ecosystem is new, the incentives drive early and frequent releases (to build a customer base); software just has to be good enough. Once a product is established, companies can invest in addressing issues that customers find annoying, like faulty behavior; the incentive change results in more testing.

    There are other forces at work around testing. Companies are experiencing some very expensive faults (testing may be expensive, but not testing may be more expensive) and automatic test generation is becoming commercially usable (i.e., the cost of some kinds of testing is decreasing).

The evolution of widely used languages.

  • I think Fortran and C will have new features added, with relatively little fuss, and will quietly continue to be widely used (to the dismay of the fashionista).
  • There is a strong expectation that C++ and Java should continue to evolve:

    • I expect the ISO C++ work to implode, because there are too many people pulling in too many directions. It makes sense for the gcc and llvm teams to cooperate in taking C++ in a direction that satisfies developers’ needs, rather than the needs of bored consultants. What are Microsoft’s views? They only have their own compiler for strategic reasons (they make little if any profit selling compilers, compilers are an unnecessary drain on management time; who cares what happens to the language).
    • It is going to be interesting watching the impact of Oracle’s move to charging for runtimes. I have no idea what might happen to Java.

In terms of code volume, the future surely has to be scripting languages, and in particular Python, Javascript and PHP. Ten years from now, will there be a widely used, single language? People have been predicting, for many years, that web languages will take over the world; perhaps there will be a sudden switch and I will see that the choice is obvious.

Moore’s law is now dead, which means researchers are going to have to look for completely new techniques for building logic gates. If photonic computers happen, then ternary notation may reappear again (it was used in at least one early Russian computer); I’m not holding my breath for this to occur.

Archimedean Review – a.k.

a.k. from thus spake a.k.

In the last couple of posts we've been taking a look at Archimedean copulas which define the dependency between the elements of vector values of a multivariate random variable by applying a generator function φ to the values of the cumulative distribution functions, or CDFs, of their distributions when considered independently, known as their marginal distributions, and applying the inverse of the generator to the sum of the results to yield the value of the multivariate CDF.
We have seen that the densities of Archimedean copulas are rather trickier to calculate and that making random observations of them is trickier still. Last time we found an algorithm for the latter, albeit with an implementation that had troubling performance and numerical stability issues, and in this post we shall add an improved version to the ak library that addresses those issues.

Does machine learning really involve data?

Frances Buontempo from BuontempoConsulting

Many definitions of machine learning start by proclaiming it uses data, to learn. I want to challenge this, or remind us where the term originally came from and consider why the meaning has shifted.

For a long time machine learning seemed to be a new technology, but I notice we're starting to say AI and machine learning interchangeably. Job postings often sneak the word scientist in there too. What is a data scientist? What do any of these words mean?

Current trends often come with an air of mystery. I suspect a lot of data science roles involve data entry, in order to clean input data. Not as appealing as the headline role suggests. Several day to day techniques being described as machine learning  could also be described as statistics. In fact, look at the table of contents of a statistics book, such as An Introduction to Statistical Learning. Look at a small selection of the topics:

  • accuracy
  • k-means clustering
  • making predictions
  • cross-validation
  • support vector machines, SVM
  • principal component analysis, PCA


Most, if not all, of these topics are covered in an average machine learning course and included in ML software packages. Yet statistics doesn't sound as exciting as machine learning, to many people.

Wikipedia defines statistics as "a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation." No mention of learning, though each of these activities form an essential part of data science. The article goes on to discuss descriptive and inferential statistics. Inference involves making predictions: many people use the term machine learning to mean the very same. Can you spot patterns in purchases automatically and suggest other items a customer might be interested in? Can you detect unusual or anomalous behaviour, indicating fraud or similar? Again, these are now labelled as AI or machine learning, but usually rely on well established statistical techniques. Admittedly, today's faster machines mean number crunching can happen quickly. This has contributed to the resurgence of machine learning.

Many problem solving algorithms are not about numbers. Some techniques, such as evolutionary computing, including genetic algorithms, don't fit comfortably into a data-driven view of learning. Do these methods count as machine learning? I'll leave that for you to think about. My book explores genetic algorithms and several other areas that do not need numbers to learn.

Arthur Samuel came out with the phrase "machine learning", by which he meant something along the lines of a "field of study that gives computers the ability to learn without being explicitly programmed." The abstract of his 1959 paper, "Some studies in machine learning using the game of checkers" states,

Two machine-learning procedures have been investigated in some detail using the game of checkers. Enough work has been done to verify the fact that a computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program. Furthermore, it can learn to do this in a remarkably short period of time (8 or 10 hours of machine-playing time) when given only the rules of the game, a sense of direction, and a redundant and incomplete list of parameters which are thought to have something to do with the game, but whose correct signs and relative weights are unknown and unspecified. The principles of machine learning verified by these experiments are, of course, applicable to many other situations.

AI and machine learning are both very old terms. I think they encompass a much broader field than data analysis. As a final thought, Turing designed an algorithm to play chess. In effect, he was trying to make an artificial brain, before the term AI was invented or computers, in their modern sense, existed.

I think machine learning is much broader than investigating data. Its history involves attempting to get computers to learn, and specifically to learn to play games.Let the games continue.


Read my book and see what you think.


Foundations for Evidence-Based Policymaking Act of 2017

Derek Jones from The Shape of Code

The Foundations for Evidence-Based Policymaking Act of 2017 was enacted by the US Congress on 21st December.

A variety of US Federal agencies are responsible for ensuring the safety of US citizens, in some cases this safety is dependent on the behavior of software. The FDA is responsible for medical device safety and the FAA publishes various software safety handbooks relating to aviation (the Department of transportation has a wider remit).

Where do people go to learn about the evidence for software related issues?

The book: Evidence-based software engineering: based on the publicly available evidence sounds like a good place to start.

Quickly skimming this (currently draft) book shows that no public evidence is available on lots of issues. Oops.

Another issue is the evidence pointing to some suggested practices being at best useless and sometimes fraudulent, e.g., McCabe’s cyclomatic complexity metric.

The initial impact of evidence-based policymaking will be companies pushing back against pointless government requirements, in particular requirements that cost money to implement. In some cases this is a good, e.g., no more charades about software being more testable because its code has a low McCable complexity.

In the slightly longer term, people are going to have to get serious about collecting and analyzing software related evidence.

The Open, Public, Electronic, and Necessary Government Data Act or the OPEN Government Data Act (which is about to become law) will be a big help in obtaining evidence. I think there is a lot of software related data sitting on disks and tapes, waiting to be analysed (NASA appears to have loads to data that they have down almost nothing with, including not making it publicly available).

Interesting times ahead.

Postmortem of the unexpected blog outage

Timo Geusch from The Lone C++ Coder's Blog

Straight from the “make work for yourself because there aren’t enough hours in the day already” files. I’ve mentioned before that I am self-hosting this blog rather than using a hosted instance. I hosted the WordPress instance on FreeBSD and it’s been running quite well for a while, but during a double FreeBSD port upgrade […]

The post Postmortem of the unexpected blog outage appeared first on The Lone C++ Coder's Blog.

Postmortem of the unexpected blog outage

The Lone C++ Coder's Blog from The Lone C++ Coder's Blog

Straight from the “make work for yourself because there aren’t enough hours in the day already” files.

I’ve mentioned before that I am self-hosting this blog rather than using a hosted instance. I hosted the WordPress instance on FreeBSD and it’s been running quite well for a while, but during a double FreeBSD port upgrade to WordPress 5.0.1 and PHP 7.2 – after the php 7.0 port had been discontinued – broke the blog. php-fpm failed regularly with a signal 10, but I wasn’t able to figure out why in a hurry, so I started looking at alternatives.