Analysis of a subset of the Linux Counter data

Derek Jones from The Shape of Code

The Linux Counter project was started in 1993, with the aim of tracking the growth of Linux users (the kernel was first released two years earlier). Anybody could register any of their machines running Linux; a user ran a script that gathered basic information about a machine, and the output was emailed to the project. Once registered, users received an annual reminder to update information in their entry (despite using Linux since before the 1.0 release, user #46406 didn’t register until 2001).

When it closed (reopened/closed/coming back) it had 120K+ registered users. That’s a lot of information about computers, which unfortunately is not publicly available. I have not had any replies to my emails to those involved, asking for a copy that could be released in anonymized form.

This week I found 15,906 rows of what looks like a subset of the Linux counter data, most entries are post-2005. What did I learn from this data?

An obvious use is the pattern to check is changes over time. While the data does not include any explicit date, it does include the Kernel version, from which the earliest date can be inferred.

An earlier post used SPEC data to estimate the growth in installed memory over time; it has been doubling every 840 days, give or take. That data contains one data point per distinct vendor computer; the Linux counter data contains one entry per computer in use. There is around thirty pairs of entries for updated systems, i.e., a user updated the entry for an existing system.

The plot below shows memory installed in each registered computer, over time, for servers, laptops and workstations, with fitted regression lines. The memory size doubling times are: servers 4,000 days, laptops 2,000 days, and workstations 1,300 days (code+data):

Memory reported for system owned by Linux counter users, from 1995 to 2015.

A regression model using dates is a good fit in the statistical sense, but explain very little of the variance in the data. The actual date on which the memory size was selected may have been earlier (because the kernel has been updated to a later release), or later (because memory was added, but the kernel was not updated).

Why is the memory doubling time so long?

Has memory size now reached the big-enough boundary, do Linux counter users keep the same system for many years without upgrading, are Linux counter systems retired Windows boxes that have been repurposed (data on installed memory Windows boxes would answer this point)?

Suggestions welcome.

When memory capacity is limited, it may be useful to swap least recently used memory contents to disc; Linux setup includes the specification of a swap partition. What is the optimal size of the size partition? A common recommendation is: if memory is less than 2G swap size is twice memory; if between 2-8G swap size is the same as memory, and for greater than 8G, half of memory size. The table below shows the percentage of particular system classes having a given swap/memory ratio (rounding the list of ratios to contain one decimal digit produces a list of over 100 ratio values).

swap/memory   Server  Workstation   Laptop 
   1.0         15.2       19.9       25.9
   2.0         10.3        9.6        8.6
   0.5          9.5        7.7        8.4

The plot below shows memory against swap partition size, for the system classes laptop, server and workstation, with fitted regression line (code+data):

Memory and swap size reported for system owned by Linux counter users.

The available disk space also has a (small) impact on swap partition size; the following model explains 46% of the variance in the data: swapSize approx memory^{0.65}diskSpace^{0.08}.

I was hoping to confirm the rate of installed memory growth suggested by the SPEC data, with installed systems lagging a few years behind the latest releases. This Linux counter data tells a very different growth story. Perhaps pre-2005 data will tell another story (I just need to find it).

I’m not sure if the swap/memory ratio analysis is of any use to systems people. It was something of a fishing expedition on my part.

Other counting projects have included the Ubuntu counter project, and Hardware for Linux which is still active and goes back to August 2014.

I’m interested in hearing about the availability of any other Linux counter data, or data from other computer counting projects.

Linux has a sleeper agent working as a core developer

Derek Jones from The Shape of Code

The latest news from Wikileaks, that GCHQ, the UK’s signal intelligence agency, has a sleeper agent working as a trusted member on the Linux kernel core development team should not come as a surprise to anybody.

The Linux kernel is embedded as a core component inside many critical systems; the kind of systems that intelligence agencies and other organizations would like full access.

The open nature of Linux kernel development makes it very difficult to surreptitiously introduce a hidden vulnerability. A friendly gatekeeper on the core developer team is needed.

In the Open source world, trust is built up through years of dedicated work. Funding the right developer to spend many years doing solid work on the Linux kernel is a worthwhile investment. Such a person eventually reaches a position where the updates they claim to have scrutinized are accepted into the codebase without a second look.

The need for the agent to maintain plausible deniability requires an arm’s length approach, and the GCHQ team made a wise choice in targeting device drivers as cost-effective propagators of hidden weaknesses.

Writing a device driver requires the kinds of specific know-how that is not widely available. A device driver written by somebody new to the kernel world is not suspicious. The sleeper agent has deniability in that they did not write the code, they simply ‘failed’ to spot a well hidden vulnerability.

Lack of know-how means that the software for a new device is often created by cutting-and-pasting code from an existing driver for a similar chip set, i.e., once a vulnerability has been inserted it is likely to propagate.

Perhaps it’s my lack of knowledge of clandestine control of third-party computers, but the leak reveals the GCHQ team having an obsession with state machines controlled by pseudo random inputs.

With their background in code breaking I appreciate that GCHQ have lots of expertise to throw at doing clever things with pseudo random numbers (other than introducing subtle flaws in public key encryption).

What about the possibility of introducing non-random patterns in randomised storage layout algorithms (he says waving his clueless arms around)?

Which of the core developers is most likely to be the sleeper agent? His codename, Basil Brush, suggests somebody from the boomer generation, or perhaps reflects some personal characteristic; it might also be intended to distract.

What steps need to be taken to prevent more sleeper agents joining the Linux kernel development team?

Requiring developers to provide a record of their financial history (say, 10-years worth), before being accepted as a core developer, will rule out many capable people. Also, this approach does not filter out ideologically motivated developers.

The world may have to accept that intelligence agencies are the future of major funding for widely used Open source projects.

Toggle window decorations on Linux GTK3 with Python3

Andy Balaam from Andy Balaam's Blog

The Internet is full of outdated Python code for doing things with windows, so here is what I got working today in a Python 3, GTK 3 environment.

This script toggles the window decorations on the active window on and off. I have it bound to Ctrl+NumPadMinus for easy access.

#!/usr/bin/env python3

import gi
gi.require_version('Gdk', '3.0')
gi.require_version('GdkX11', '3.0')
gi.require_version('Wnck', '3.0')
from gi.repository import Gdk
from gi.repository import GdkX11
from gi.repository import Wnck


def active_window(screen):
    for window in screen.get_windows():
       if window.is_active() == True:
            return window


def toggle_decorations(w):
    if w.get_decorations().decorations == 0:
        w.set_decorations(Gdk.WMDecoration.ALL)
    else:
        w.set_decorations(0)


screen = Wnck.Screen.get_default()
screen.force_update()
display = GdkX11.X11Display.get_default()
window = active_window(screen)
window_id = window.get_xid()

w = GdkX11.X11Window.foreign_new_for_display(display, window_id)
toggle_decorations(w)


window = None
screen = None
Wnck.shutdown()

Setting up rdiff-backup on FreeBSD 12.1

Timo Geusch from The Lone C++ Coder's Blog

My main PC workstation (as opposed to my Mac Pro) is a dual-boot Windows and Linux machine. While backing up the Windows portion is relatively easy via some cheap-ish commercial backup software, I ended up backing up my Linux home directories only very occasionally. Clearly, Something Had To Be Done ™. I had a look […]

The post Setting up rdiff-backup on FreeBSD 12.1 appeared first on The Lone C++ Coder's Blog.

Predicting the future with data+logistic regression

Derek Jones from The Shape of Code

Predicting the peak of data fitted by a logistic equation is attracting a lot of attention at the moment. Let’s see how well we can predict the final size of a software system, in lines of code, using logistic regression (code+data).

First up is the size of the GNU C library. This is not really a good test, since the peak (or rather a peak) has been reached.

Growth of glibc, in lines,, with logistic regression fit

We need a system that has not yet reached an easily recognizable peak. The Linux kernel has been under development for many years, and lots of LOC counts are available. The plot below shows a logistic equation fitted to the kernel data, assuming that the only available data was up to day: 2,900, 3,650, 4,200, and 5,000+. Can you tell which fitted line corresponds to which number of days?

Number lines in Linux kernel, on days since release1, and four fitted logistic regression models.

The underlying ‘problem’ is that we are telling the fitting software to fit a particular equation; the software does what it has been told to do, and fits a logistic equation (in this case).

A cubic polynomial is also a great fit to the existing kernel data (red line to the left of the blue line), and this fitted equation can be extended into future (to the right of the blue line); dotted lines are 95% confidence bounds. Do any readers believe the future size of the Linux kernel predicted by this cubic model?

Number of distinct silhouettes for a function containing four statements

Predicting the future requires lots of data on the underlying processes that drive events. Modeling events is an iterative process. Build a model, check against reality, adjust model, rinse and repeat.

If the COVID-19 experience trains people to be suspicious of future predictions made by models, it will have done something positive.

Automating Windows VM Creation on Ubuntu

Chris Oldwood from The OldWood Thing

TL;DR you can find my resulting Oz and Packer configuration files in this Oz gist and this Packer gist on my GitHub account.

As someone who has worked almost exclusively on Windows for the last 25 years I was somewhat surprised to find myself needing to create Windows VMs on Linux. Ultimately these were to be build server agents and therefore I needed to automate everything from creating the VM image, to installing Windows, and eventually the build toolchain. This post looks at the first two aspects of this process.

I did have a little prior experience with Packer, but that was on AWS where the base AMIs you’re provided have already got you over the initial OS install hurdle and you can focus on baking in your chosen toolchain and application. This time I was working on-premise and so needed to unpick the Linux virtualization world too.

In the end I managed to get two approaches working – Oz and Packer – on the Ubuntu 18.04 machine I was using. (You may find these instructions useful for other distributions but I have no idea how portable this information is.)

QEMU/KVM/libvirt

On the Windows-as-host side (until fairly recently) virtualization boiled down to a few classic options, such as Hyper-V and Virtual Box. The addition of Docker-style Windows containers, along with Hyper-V containers has padded things out a bit more but to me it’s still fairly manageable.

In contrast on the Linux front, where this technology has been maturing for much longer, we have far more choice, and ultimately, for a Linux n00b like me [1], this means far more noise to wade through on top of the usual “which distribution are you running” type questions. In particular the fact that any documentation on “virtualization” could be referring to containers or hypervisors (or something in-between), when you’re only concerned with hypervisors for running Windows VMs, doesn’t exactly aid comprehension.

Luckily I was pointed towards KVM as a good starting point on the Linux hypervisor front. QEMU is one of those minor distractions as it can provide full emulation, but it also provides the other bit KVM needs to be useful in practice – device emulation. (If you’re feeling nostalgic you can fire up an MS-DOS recovery boot-disk from “All Boot Disks” under QMEU/KVM with minimal effort which gives you a quick sense of achievement.)

What I also found mentioned in the same breath as these two was a virtualization “add-on layer” called libvirt which provides a layer on top of the underlying technology so that you can use more technology agnostic tools. Confusingly you might notice that Packer doesn’t mention libvirt, presumably because it already has providers that work directly with the lower layer.

In summary, using apt, we can install this lot with:

$ sudo apt install qemu qemu-kvm libvirt-bin  bridge-utils  virt-manager -y

Windows ISO & Product Key

We’re going to need a Windows ISO along with a related product key to make this work. While in the end you’ll need a proper license key I found the Windows 10 Evaluation Edition was perfect for experimentation as the VM only lasts for a few minutes before you bin it and start all over again.

You can download the latest Windows image from the MS downloads page which, if you’ve configured your browser’s User-Agent string to appear to be from a non-Windows OS, will avoid all the sign-up nonsense. Alternatively google for “care.dlservice.microsoft.com” and you’ll find plenty of public build scripts that have direct download URLs which are beneficial for automation.

Although the Windows 10 evaluation edition doesn’t need a specific license key you will need a product key to stick in the autounattend.xml file when we get to that point. Luckily you can easily get that from the MS KMS client keys page.

Windows Answer File

By default Windows presents a GUI to configure the OS installation, but if you give it a special XML file known as autounattend.xml (in a special location, which we’ll get to later) all the configuration settings can go in there and the OS installation will be hands-free.

There is a specific Windows tool you can use to generate this file, but an online version in the guise of the Windows Answer File Generator produced a working file with fairly minimal questions. You can also generate one for different versions of the Windows OS which is important as there are many examples that appear on the Internet but it feels like pot-luck as to whether it would work or not as the format changes slightly between releases and it’s not easy to discover where the impedance mismatch lies.

So, at this point we have our Linux hypervisor installed, and downloaded a Windows installation .iso along with a generated autounattend.xml file to drive the Windows install. Now we can get onto building the VM, which I managed to do with two different tools – Oz and Packer.

Oz

I was flicking through a copy of Mastering KVM Virtualization and it mentioned a tool called Oz which was designed to make it easy to build a VM along with installing an OS. More importantly it listed having support for most Windows editions too! Plus it’s been around for a fairly long time so is relatively mature. You can install it with apt:

$ sudo apt install oz -y

To use it you create a simple configuration file (.tdl) with the basic VM details such as CPU count, memory, disk size, etc. along with the OS details, .iso filename, and product key (for Windows), and then run the tool:

$ oz-install -d2 -p windows.tdl -x windows.libvirt.xml

If everything goes according to plan you end up with a QEMU disk image and an .xml file for the VM (called a “domain”) that you can then register with libvirt:

$ virsh define windows.libvirt.xml

Finally you can start the VM via libvirt with:

$ virsh start windows-vm

I initially tried this with the Windows 8 RTM evaluation .iso and it worked right out of the box with the Oz built-in template! However, when it came to Windows 10 the Windows installer complained about there being no product key, despite the Windows 10 template having a placeholder for it and the key was defined in the .tdl configuration file.

It turns out, as you can see from Issue #268 (which I raised in the Oz GitHub repo) that the Windows 10 template is broken. The autounattend.xml file also wants the key in the <UserData> section too it seems. Luckily for me oz-install can accept a custom autounattend.xml file via the -a option as long as we fill in any details manually, like the <AutoLogin> account username / password, product key, and machine name.

$ oz-install -d2 -p windows.tdl -x windows.libvirt.xml –a autounattend.xml

That Oz GitHub issue only contains my suggestions as to what I think needs fixing in the autounattend.xml file, I also have a personal gist on GitHub that contains both the .tdl and .xml files that I successfully used. (Hopefully I’ll get a chance to submit a formal PR at some point so we can get it properly fixed; it also needs a tweak to the Python code as well I believe.)

Note: while I managed to build the basic VM I didn’t try to do any post-processing, e.g. using WinRM to drive the installation of applications and tools from the outside.

Packer

I had originally put Packer to one side because of difficulties getting anything working under Hyper-V on Windows but with my new found knowledge I decided to try again on Linux. What I hadn’t appreciated was quite how much Oz was actually doing for me under the covers.

If you use the Packer documentation [2] [3] and online examples you should happily get the disk image allocated and the VM to fire up in VNC and sit there waiting for you to configure the Windows install. However, after selecting your locale and keyboard you’ll probably find the disk partitioning step stumps you. Even if you follow some examples and put an autounattend.xml on a floppy drive you’ll still likely hit a <DiskConfiguration> error during set-up. The reason is probably because you don’t have the right Windows driver available for it to talk to the underlying virtual disk device (unless you’re lucky enough to pick an IDE based example).

One of the really cool things Oz appears to do is handle this nonsense along with the autounattend.xml file which it also slips into the .iso that it builds on-the-fly. With Packer you have to be more aware and fetch the drivers yourself (which come as part of another .iso) and then mount that explicitly as another CD-ROM drive by using the qemuargs section of the Packer builder config. (In my example it’s mapped as drive E: inside Windows.)

[ "-drive", "file=./virtio-win.iso,media=cdrom,index=3" ]

Luckily you can download the VirtIO drivers .iso from a Fedora page and stick it alongside the Windows .iso. That’s still not quite enough though, we also need to tell the Windows installer where our drivers are located; we do that with a special section in the autounattend.xml file.

<DriverPaths>
  <PathAndCredentials wcm:action="add" wcm:keyValue="1">
    <Path>E:\NetKVM\w10\amd64\</Path>

Finally, in case you’ve not already discovered it, the autounattend.xml file is presented by Packer to the Windows installer as a file in the root of a floppy drive. (The floppy drive and extra CD-ROM drives both fall away once Windows has bootstrapped itself.)

"floppy_files":
[
  "autounattend.xml",

Once again, as mentioned right at the top, I have a personal gist on GitHub that contains the files I eventually got working.

With the QEMU/KVM image built we can then register it with libvirt by using virt-install. I thought the --import switch would be enough here as we now have a runnable image, but that option appears to be for a different scenario [4], instead we have to take two steps – generate the libvirt XML config file using the --print-xml option, and then apply it:

$ virt-install --vcpus ... --disk ...  --print-xml > windows.libvert.xml
$ virsh define windows.libvert.xml

Once again you can start the finalised VM via libvirt with:

$ virsh start windows-vm

Epilogue

While having lots of documentation is generally A Good Thing™, when it’s spread out over a considerable time period it’s sometimes difficult to know if the information you’re reading still applies today. This is particularly true when looking at other people’s example configuration files alongside reading the docs. The long-winded route might still work but the tool might also do it automatically now if you just let it, which keeps your source files much simpler.

Since getting this working I’ve seen other examples which suggest I may have fallen foul of this myself and what I’ve written up may also still be overly complicated! Please feel free to use the comments section on this blog or my gists to inform any other travellers of your own wisdom in any of this.

 

[1] That’s not entirely true. I ran Linux on an Atari TT and a circa v0.85 Linux kernel on a 386 PC in the early-to-mid ‘90s.

[2] The Packer docs can be misleading. For example it says the disk_size is in bytes and you can use suffixes like M or G to simplify matters. Except they don’t work and the value is actually in megabytes. No wonder a value of 15,000,000,000 didn’t work either :o).

[3] Also be aware that the version of Packer available via apt is only 1.0.x and you need to manually download the latest 1.4.x version and unpack the .zip. (I initially thought the bug in [2] was down to a stale version but it’s not.)

[4] The --import switch still fires up the VM as it appears to assume you’re going to add to the current image, not that it is the final image.


Looks like I get to redo my WireGuard VPN server

Timo Geusch from The Lone C++ Coder&#039;s Blog

I’ve blogged about setting up a WireGuard VPN server earlier this year. It’s been running well since, but I needed to take care of some overdue maintenance tasks. Trying to log into the server this morning and I am greeted with “no route to host”. Eh? A quick check on my Vultr UI showed that […]

The post Looks like I get to redo my WireGuard VPN server appeared first on The Lone C++ Coder's Blog.

2019 in the programming language standards’ world

Derek Jones from The Shape of Code

Last Tuesday I was at the British Standards Institute for a meeting of IST/5, the committee responsible for programming language standards in the UK.

There has been progress on a few issues discussed last year, and one interesting point came up.

It is starting to look as if there might be another iteration of the Cobol Standard. A handful of people, in various countries, have started to nibble around the edges of various new (in the Cobol sense) features. No, the INCITS Cobol committee (the people who used to do all the heavy lifting) has not been reformed; the work now appears to be driven by people who cannot let go of their involvement in Cobol standards.

ISO/IEC 23360-1:2006, the ISO version of the Linux Base Standard, has been updated and we were asked for a UK position on the document being published. Abstain seemed to be the only sensible option.

Our WG20 representative reported that the ongoing debate over pile of poo emoji has crossed the chasm (he did not exactly phrase it like that). Vendors want to have the freedom to specify code-points for use with their own emoji, e.g., pineapple emoji. The heady days, of a few short years ago, when an encoding for all the world’s character symbols seemed possible, have become a distant memory (the number of unhandled logographs on ancient pots and clay tablets was declining rapidly). Who could have predicted that the dream of a complete encoding of the symbols used by all the world’s languages would be dashed by pile of poo emoji?

The interesting news is from WG9. The document intended to become the Ada20 standard was due to enter the voting process in June, i.e., the committee considered it done. At the end of April the main Ada compiler vendor asked for the schedule to be slipped by a year or two, to enable them to get some implementation experience with the new features; oops. I have been predicting that in the future language ‘standards’ will be decided by the main compiler vendors, and the future is finally starting to arrive. What is the incentive for the GNAT compiler people to pay any attention to proposals written by a bunch of non-customers (ok, some of them might work for customers)? One answer is that Ada users tend to be large bureaucratic organizations (e.g., the DOD), who like to follow standards, and might fund GNAT to implement the new document (perhaps this delay by GNAT is all about funding, or lack thereof).

Right on cue, C++ users have started to notice that C++20’s added support for a system header with the name version, which conflicts with much existing practice of using a file called version to contain versioning information; a problem if the header search path used the compiler includes a project’s top-level directory (which is where the versioning file version often sits). So the WG21 committee decides on what it thinks is a good idea, implementors implement it, and users complain; implementors now have a good reason to not follow a requirement in the standard, to keep users happy. Will WG21 be apologetic, or get all high and mighty; we will have to wait and see.

Setting up my own VPN server on Vultr with Centos 7 and WireGuard

Timo Geusch from The Lone C++ Coder&#039;s Blog

As an IT consultant, I travel a lot. I mean, a lot. Part of the pleasure is having to deal with day-to-day online life on open, potentially free-for-all hotel and conference WiFi. In other words, the type of networks you really want to do your online banking, ecommerce and other potentially sensitive operations on. After […]

The post Setting up my own VPN server on Vultr with Centos 7 and WireGuard appeared first on The Lone C++ Coder's Blog.