Microsoft C++ versions explained

Anders Schau Knatten from C++ on a Friday

Microsoft has five different version numbers to think about when it comes to C++. Here’s an attempt to explain what they all mean.

  • Visual Studio release year (the “marketing version number”), e.g. Visual Studio 2022
  • Visual Studio actual version number, e.g. Visual Studio 17.0
  • Visual C++ (MSVC) version, e.g. MSVC 14.30
  • Toolset version, e.g. toolset 143
  • Compiler version, e.g. cl.exe 19.30

Visual Studio versions

What most people will see first is the Visual Studio release year. You’ll download Visual Studio 2022, Visual Studio 2019 etc. These however also have a more normal major.minor versioning scheme, and they bump the major version for every release year. So for instance VS 2017 is version 15, VS 2019 is version 16, and VS 2022 is version 17. Note that the year and the major version are not correlated in any way, except that Visual Studio 2010 just happened to also be version 10.

Visual Studio also has minor releases of each major version. Some examples (there are more minor releases per major than shown here):

Yearversion
Visual Studio 201715.0
15.3
Visual Studio 201916.0
16.1
Visual Studio 202217.0
17.1

source: Wikipedia

Visual C++ versions

Microsoft Visual C++, aka MSVC, ships as a part of Visual Studio, but has its own versioning scheme. Importantly, the major number signifies ABI compatibility, so something compiled with MSVC at one major version number can be linked against something compiled with any other MSVC at the same major version. (Some restrictions apply.) The MSVC major version number luckily gets bumped a lot less often than the Visual Studio version itself. As of Visual Studio 2015, they have kept the MSVC major version at 14. The first digit of the minor version seems to be bumped for each major version of Visual Studio itself. The Visual C++ version number is also used for the Visual C++ Redistributable.

Some examples:

VS YearVS versionMSVC version
Visual Studio 201715.014.1
15.314.11
Visual Studio 201916.014.20
16.114.21
Visual Studio 202217.014.30
17.114.31

source: Wikipedia

C++ toolset versions

Closely related to the MSVC version number is the C++ toolset version number. I can’t find a good source for it, but from Microsoft’s article it seems that the toolset version is made up of the MSVC major version and the first digit of the MSVC minor version. Some examples:

VS YearVS versionMSVC versionToolset version
Visual Studio 201715.014.1141
15.314.11141
Visual Studio 201916.014.20142
16.114.21142
Visual Studio 202217.014.30143
17.114.31143

Source: Microsoft

The linker (link.exe) also uses the C++ toolset version number as its version number, so e.g. for toolset 14.32 I might see link.exe version 14.32.31332.0.

Compiler versions

Finally, there’s the compiler version, which is what cl.exe reports. E.g. 19.16.27048. The major.minor version scheme correlates with the _MSC_VER macro which you can check in your source code (godbolt). So e.g. cl.exe version 19.21 has _MSC_VER 1921. (I’ll be nice and count those as one version number.)

VS YearVS versionMSVC versionToolset versionCompiler version
Visual Studio 201715.014.114119.10
15.314.1114119.11
Visual Studio 201916.014.2014219.20
16.114.2114219.21
Visual Studio 202217.014.3014319.30
17.114.3114319.31

The _MSC_VER version number is incremented monotonically at each Visual C++ toolset update, so if you want to only compile some stuff if the compiler is new enough, you can do e.g. #if _MSC_VER >= 1930.

Appendix: Running out of version numbers

Interestingly, the scheme where they bump the first digit of the Visual C++ minor version for each major release of Visual Studio means that they can only have nine minor versions of MSVC per Visual Studio major version! And looking at wikipedia, it seems they actually ran out of toolset versions at the end of Visual Studio 2019 and reused 14.28 and 14.29 for the final four Visual Studio 2019 releases (Visual Studio 16.8 and 16.9 had MSVC 14.28, Visual Studio 16.10 and 16.11 had MSVC 14.29).

semgrep: the future of static analysis tools

Derek Jones from The Shape of Code

When searching for a pattern that might be present in source code contained in multiple files, what is the best tool to use?

The obvious answer is grep, and grep is great for character-based pattern searches. But patterns that are token based, or include information on language semantics, fall outside grep‘s model of pattern recognition (which does not stop people trying to cobble something together, perhaps with the help of complicated sed scripts).

Those searching source code written in C have the luxury of being able to use Coccinelle, an industrial strength C language aware pattern matching tool. It is widely used by the Linux kernel maintainers and people researching complicated source code patterns.

Over the 15+ years that Coccinelle has been available, there has been a lot of talk about supporting other languages, but nothing ever materialized.

About six months ago, I noticed semgrep and thought it interesting enough to add to my list of tool bookmarks. Then, a few days ago, I read a brief blog post that was interesting enough for me to check out other posts at that site, and this one by Yoann Padioleau really caught my attention. Yoann worked on Coccinelle, and we had an interesting email exchange some 13-years ago, when I was analyzing if-statement usage, and had subsequently worked on various static analysis tools, and was now working on semgrep. Most static analysis tools are created by somebody spending a year or so working on the implementation, making all the usual mistakes, before abandoning it to go off and do other things. High quality tools come from people with experience, who have invested lots of time learning their trade.

The documentation contains lots of examples, and working on the assumption that things would be a lot like using Coccinelle, I jumped straight in.

The pattern I choose to search for, using semgrep, involved counting the number of clauses contained in Python if-statement conditionals, e.g., the condition in: if a==1 and b==2: contains two clauses (i.e., a==1, b==2). My interest in this usage comes from ideas about if-statement nesting depth and clause complexity. The intended use case of semgrep is security researchers checking for vulnerabilities in code, but I’m sure those developing it are happy for source code researchers to use it.

As always, I first tried building the source on the Github repo, (note: the Makefile expects a git clone install, not an unzipped directory), but got fed up with having to incrementally discover and install lots of dependencies (like Coccinelle, the code is written on OCaml {93k+ lines} and Python {13k+ lines}). I joined the unwashed masses and used pip install.

The pattern rules have a yaml structure, specifying the rule name, language(s), message to output when a match is found, and the pattern to search for.

After sorting out various finger problems, writing C rather than Python, and misunderstanding the semgrep output (some of which feels like internal developer output, rather than tool user developer output), I had a set of working patterns.

The following two patterns match if-statements containing a single clause (if.subexpr-1), and two clauses (if.subexpr-2). The option commutative_boolop is set to true to allow the matching process to treat Python’s or/and as commutative, which they are not, but it reduces the number of rules that need to be written to handle all the cases when ordering of these operators is not relevant (rules+test).

rules:
- id: if.subexpr-1
  languages: [python]
  message: if-cond1
  patterns:
   - pattern: |
      if $COND1:  # we found an if statement
         $BODY
   - pattern-not: |
      if $COND2 or $COND3: # must not contain more than one condition
         $BODY
   - pattern-not: |
      if $COND2 and $COND3:
         $BODY
  severity: INFO

- id: if.subexpr-2
  languages: [python]
  options:
   commutative_boolop: true # Reduce combinatorial explosion of rules
  message: if-cond2
  pattern-either:
   - patterns:
      - pattern: |
         if $COND1 or $COND2: # if statement containing two conditions
            $BODY
      - pattern-not: |
         if $COND3 or $COND4 or $COND5: # must not contain more than two conditions
            $BODY
      - pattern-not: |
         if $COND3 or $COND4 and $COND5:
            $BODY
   - patterns:
      - pattern: |
         if $COND1 and $COND2:
            $BODY
      - pattern-not: |
         if $COND3 and $COND4 and $COND5:
            $BODY
      - pattern-not: |
         if $COND3 and $COND4 or $COND5:
            $BODY
  severity: INFO

The rules would be simpler if it were possible for a pattern to not be applied to code that earlier matched another pattern (in my example, one containing more clauses). This functionality is supported by Coccinelle, and I’m sure it will eventually appear in semgrep.

This tool has lots of rough edges, and is still rapidly evolving, I’m using version 0.82, released four days ago. What’s exciting is the support for multiple languages (ten are listed, with experimental support for twelve more, and three in beta). Roughly what happens is that source code is mapped to an abstract syntax tree that is common to all supported languages, which is then pattern matched. Supporting a new language involves writing code to perform the mapping to this common AST.

It’s not too difficult to map different languages to a common AST that contains just tokens, e.g., identifiers and their spelling, literals and their value, and keywords. Many languages use the same operator precedence and associativity as C, plus their own extras, and they tend to share the same kinds of statements; however, declarations can be very diverse, which makes life difficult for supporting a generic AST.

An awful lot of useful things can be done with a tool that is aware of expression/statement syntax and matches at the token level. More refined semantic information (e.g., a variable’s type) can be added in later versions. The extent to which an investment is made to support the various subtleties of a particular language will depend on its economic importance to those involved in supporting semgrep (Return to Corp is a VC backed company).

Outside of a few languages that have established tools doing deep semantic analysis (i.e., C and C++), semgrep has the potential to become the go-to static analysis tool for source code. It will benefit from the network effects of contributions from lots of people each working in one or more languages, taking their semgrep skills and rules from one project to another (with source code language ceasing to be a major issue). Developers using niche languages with poor or no static analysis tool support will add semgrep support for their language because it will be the lowest cost path to accessing an industrial strength tool.

How are the VC backers going to make money from funding the semgrep team? The traditional financial exit for static analysis companies is selling to a much larger company. Why would a large company buy them, when they could just fork the code (other company sales have involved closed-source tools)? Perhaps those involved think they can make money by selling services (assuming semgrep becomes the go-to tool). I have a terrible track record for making business predictions, so I will stick to the technical stuff.

Pricing by quantity of source code

Derek Jones from The Shape of Code

Software tool vendors have traditionally licensed their software on a per-seat basis, e.g., the cost increases with the number of concurrent users. Per-seat licensing works well when there is substantial user interaction, because the usage time is long enough for concurrent usage to build up. When a tool can be run non-interactively in the cloud, its use is effectively instantaneous. For instance, a tool that checks source code for suspicious constructs. Charging by lines of code processed is a pricing model used by some tool vendors.

Charging by lines of code processed creates an incentive to reduce the number of lines. This incentive was once very common, when screens supporting 24 lines of 80 characters were considered a luxury, or the BASIC interpreter limited programs to 1023 lines, or a hobby computer used a TV for its screen (a ‘tiny’ CRT screen, not a big flat one).

It’s easy enough to splice adjacent lines together, and halve the cost. Well, ease of splicing depends on programming language; various edge cases have to be handled (somebody is bound to write a tool that does a good job).

How does the tool vendor respond to a (potential) halving of their revenue?

Blindly splicing pairs of lines creates some easily detectable patterns in the generated source. In fact, some of these patterns are likely to be flagged as suspicious, e.g., if (x) a=1;b=2; (did the developer forget to bracket the two statements with { }).

The plot below shows the number of lines in gcc 2.95 containing a given number of characters (left, including indentation), and the same count after even-numbered lines (with leading whitespace removed) have been appended to odd-numbered lines (code+data, this version of gcc was using in my C book):

North Star Horizon with cover removed.

The obvious change is the introduction of a third straight’ish line segment (the increase in the offset of the sharp decline might be explained away as a consequence of developers using wider windows). By only slicing the ‘right’ pairs of lines together, the obvious patterns won’t be present.

Using lines of codes for pricing has the advantage of being easy to explain to management, the people who sign off the expense, who might not know much about source code. There are other metrics that are much harder for developers to game. Counting tokens is the obvious one, but has developer perception issues: Brackets, both round and curly. In the grand scheme of things, the use/non-use of brackets where they are optional has a minor impact on the token count, but brackets have an oversized presence in developer’s psyche.

Counting identifiers avoids the brackets issue, along with other developer perceptions associated with punctuation tokens, e.g., a null statement in an else arm.

If the amount charged is low enough, social pressure comes into play. Would you want to work for a company that penny pinches to save such a small amount of money?

As a former tool vendor, I’m strongly in favour of tool vendors making a healthy profit.

Creating an effective static analysis requires paying lots of attention to lots of details, which is very time-consuming. There are lots of not particularly good Open source tools out there; the implementers did all the interesting stuff, and then moved on. I know of several groups who got together to build tools for Java when it started to take-off in the mid-90s. When they went to market, they quickly found out that Java developers expected their tools to be free, and would not pay for claimed better versions. By making good enough Java tools freely available, Sun killed the commercial market for sales of Java tools (some companies used their own tools as a unique component of their consulting or service offerings).

Could vendors charge by the number of problems found in the code? This would create an incentive for them to report trivial issues, or be overly pessimistic about flagging issues that could occur (rather than will occur).

Why try selling a tool, why not offer a service selling issues found in code?

Back in the day a living could be made by offering a go-faster service, i.e., turn up at a company and reduce the usage cost of a company’s applications, or reducing the turn-around time (e.g., getting the daily management numbers to appear in less than 24-hours). This was back when mainframes ruled the computing world, and usage costs could be eye-watering.

Some companies offer bug-bounties to the first person reporting a serious vulnerability. These public offers are only viable when the source is publicly available.

There are companies who offer a code review service. Having people review code is very expensive; tools are good at finding certain kinds of problem, and investing in tools makes sense for companies looking to reduce review turn-around time, along with checking for more issues.

Simple Tables From JSON Data With JQ and Column

Chris Oldwood from The OldWood Thing

My current role is more of a DevOps role and I’m spending more time than usual monitoring and administrating various services, such as the GitLab instance we use for source control, build pipelines, issue management, etc. While the GitLab UI is very useful for certain kinds of tasks the rich RESTful API allows you to easily build your own custom tools to to monitor, analyse, and investigate the things you’re particularly interested in.

For example one of the first views I wanted was an alphabetical list of all runners with their current status so that I could quickly see if any had gone AWOL during the night. The alphabetical sorting requirement is not something the standard UI view provides hence I needed to use the REST API or hope that someone had already done something similar first.

GitLab Clients

I quickly found two candidates: python-gitlab and go-gitlab-client which looked promising but they only really wrap the API – I’d still need to do some heavy lifting myself and understand what the GitLab API does. Given how simple the examples were, even with curl, it felt like I wasn’t really saving myself anything at this point, e.g.

curl --header "PRIVATE-TOKEN: $token" "https://gitlab.example.com/api/v4/runners"

So I decided to go with a wrapper script [1] approach instead and find a way to prettify the JSON output so that the script encapsulated a shell one-liner that would request the data and format the output in a simple table. Here is the kind of JSON the GitLab API would return for the list of runners:

[
  {
   "id": 6,
   "status": "online"
   . . .
  }
,
  {
   "id": 8,
   "status": "offline"
   . . .
  }
]

JQ – The JSON Tool

I’d come across the excellent JQ tool for querying JSON payloads many years ago so that was my first thought for at least simplifying the JSON payloads to the fields I was interested in. However on further reading I found it could do some simple formatting too. At first I thought the compact output using the –c option was what I needed (perhaps along with some tr magic to strip the punctuation), e.g.

$ echo '[{"id":1, "status":"online"}]' |\
  jq -c
[{"id":1,"status":"online"}]

but later I discovered the –r option provided raw output which formatted the values as simple text and removed all the JSON punctuation, e.g.

$ echo '[{"id":1, "status":"online"}]' |\
  jq -r '( .[] | "\(.id) \(.status)" )'
1 online

Naturally my first thought for the column headings was to use a couple of echo statements before the curl pipeline but I also discovered that you can mix-and match string literals with the output from the incoming JSON stream, e.g.

$ echo '[{"id":1, "status":"online"}]' |\
   jq -r '"ID Status",
          "-- ------",
          ( .[] | "\(.id) \(.status)" )'
ID Status
-- ------
1 online

This way the headings were only output if the command succeeded.

Neater Tables with Column

While these crude tables were readable and simple enough for further processing with grep and awk they were still pretty unsightly when the values of a column were too varied in length such as a branch name or description field. Putting them on the right hand side kind of worked but I wondered if I could create fixed width fields ala printf via jq.

At this point I stumbled across the StackOverflow question How to format a JSON string as a table using jq? where one of the later answers mentioned a command line tool called “column” which takes rows of text values and arranges them as columns of similar width by adjusting the spacing between elements.

This almost worked except for the fact that some fields had spaces in their input and column would treat them by default as separate elements. A simple change of field separator from a space to a tab meant that I could have my cake and eat it, e.g.

$ echo '[ {"id":1, "status":"online"},
          {"id":2, "status":"offline"} ]' |\
  jq -r '"ID\tStatus",
         "--\t-------",
         ( .[] | "\(.id)\t\(.status)" )' |\
  column -t -s $'\t'
ID  Status
--  -------
1   online
2   offline

Sorting and Limiting

While many of the views I was happy to order by ID, which is often the default for the API, or in the case of jobs and pipelines was a proxy for “start time”, there were cases where I needed to control the sorting. For example we used the runner description to store the hostname (or host + container name) so it made sense to order by that, e.g.

jq 'sort_by(.description|ascii_downcase)'

For the runner’s jobs the job ID ordering wasn’t that useful as the IDs were allocated up front but the job might start much later if it’s a latter part of the pipeline so I chose to order by the job start time instead with descending order so the most recent jobs were listed first, e.g.

jq ‘sort_by(.started_at) | reverse’

One other final trick that proved useful occasionally when there was no limiting in the API was to do it with jq instead, e.g

jq "sort_by(.name) | [limit($max; .[])]"

 

[1] See my 2013 article “In The Toolbox – Wrapper Scripts” for more about this common technique of simplifying tools.

Automating Windows VM Creation on Ubuntu

Chris Oldwood from The OldWood Thing

TL;DR you can find my resulting Oz and Packer configuration files in this Oz gist and this Packer gist on my GitHub account.

As someone who has worked almost exclusively on Windows for the last 25 years I was somewhat surprised to find myself needing to create Windows VMs on Linux. Ultimately these were to be build server agents and therefore I needed to automate everything from creating the VM image, to installing Windows, and eventually the build toolchain. This post looks at the first two aspects of this process.

I did have a little prior experience with Packer, but that was on AWS where the base AMIs you’re provided have already got you over the initial OS install hurdle and you can focus on baking in your chosen toolchain and application. This time I was working on-premise and so needed to unpick the Linux virtualization world too.

In the end I managed to get two approaches working – Oz and Packer – on the Ubuntu 18.04 machine I was using. (You may find these instructions useful for other distributions but I have no idea how portable this information is.)

QEMU/KVM/libvirt

On the Windows-as-host side (until fairly recently) virtualization boiled down to a few classic options, such as Hyper-V and Virtual Box. The addition of Docker-style Windows containers, along with Hyper-V containers has padded things out a bit more but to me it’s still fairly manageable.

In contrast on the Linux front, where this technology has been maturing for much longer, we have far more choice, and ultimately, for a Linux n00b like me [1], this means far more noise to wade through on top of the usual “which distribution are you running” type questions. In particular the fact that any documentation on “virtualization” could be referring to containers or hypervisors (or something in-between), when you’re only concerned with hypervisors for running Windows VMs, doesn’t exactly aid comprehension.

Luckily I was pointed towards KVM as a good starting point on the Linux hypervisor front. QEMU is one of those minor distractions as it can provide full emulation, but it also provides the other bit KVM needs to be useful in practice – device emulation. (If you’re feeling nostalgic you can fire up an MS-DOS recovery boot-disk from “All Boot Disks” under QMEU/KVM with minimal effort which gives you a quick sense of achievement.)

What I also found mentioned in the same breath as these two was a virtualization “add-on layer” called libvirt which provides a layer on top of the underlying technology so that you can use more technology agnostic tools. Confusingly you might notice that Packer doesn’t mention libvirt, presumably because it already has providers that work directly with the lower layer.

In summary, using apt, we can install this lot with:

$ sudo apt install qemu qemu-kvm libvirt-bin  bridge-utils  virt-manager -y

Windows ISO & Product Key

We’re going to need a Windows ISO along with a related product key to make this work. While in the end you’ll need a proper license key I found the Windows 10 Evaluation Edition was perfect for experimentation as the VM only lasts for a few minutes before you bin it and start all over again.

You can download the latest Windows image from the MS downloads page which, if you’ve configured your browser’s User-Agent string to appear to be from a non-Windows OS, will avoid all the sign-up nonsense. Alternatively google for “care.dlservice.microsoft.com” and you’ll find plenty of public build scripts that have direct download URLs which are beneficial for automation.

Although the Windows 10 evaluation edition doesn’t need a specific license key you will need a product key to stick in the autounattend.xml file when we get to that point. Luckily you can easily get that from the MS KMS client keys page.

Windows Answer File

By default Windows presents a GUI to configure the OS installation, but if you give it a special XML file known as autounattend.xml (in a special location, which we’ll get to later) all the configuration settings can go in there and the OS installation will be hands-free.

There is a specific Windows tool you can use to generate this file, but an online version in the guise of the Windows Answer File Generator produced a working file with fairly minimal questions. You can also generate one for different versions of the Windows OS which is important as there are many examples that appear on the Internet but it feels like pot-luck as to whether it would work or not as the format changes slightly between releases and it’s not easy to discover where the impedance mismatch lies.

So, at this point we have our Linux hypervisor installed, and downloaded a Windows installation .iso along with a generated autounattend.xml file to drive the Windows install. Now we can get onto building the VM, which I managed to do with two different tools – Oz and Packer.

Oz

I was flicking through a copy of Mastering KVM Virtualization and it mentioned a tool called Oz which was designed to make it easy to build a VM along with installing an OS. More importantly it listed having support for most Windows editions too! Plus it’s been around for a fairly long time so is relatively mature. You can install it with apt:

$ sudo apt install oz -y

To use it you create a simple configuration file (.tdl) with the basic VM details such as CPU count, memory, disk size, etc. along with the OS details, .iso filename, and product key (for Windows), and then run the tool:

$ oz-install -d2 -p windows.tdl -x windows.libvirt.xml

If everything goes according to plan you end up with a QEMU disk image and an .xml file for the VM (called a “domain”) that you can then register with libvirt:

$ virsh define windows.libvirt.xml

Finally you can start the VM via libvirt with:

$ virsh start windows-vm

I initially tried this with the Windows 8 RTM evaluation .iso and it worked right out of the box with the Oz built-in template! However, when it came to Windows 10 the Windows installer complained about there being no product key, despite the Windows 10 template having a placeholder for it and the key was defined in the .tdl configuration file.

It turns out, as you can see from Issue #268 (which I raised in the Oz GitHub repo) that the Windows 10 template is broken. The autounattend.xml file also wants the key in the <UserData> section too it seems. Luckily for me oz-install can accept a custom autounattend.xml file via the -a option as long as we fill in any details manually, like the <AutoLogin> account username / password, product key, and machine name.

$ oz-install -d2 -p windows.tdl -x windows.libvirt.xml –a autounattend.xml

That Oz GitHub issue only contains my suggestions as to what I think needs fixing in the autounattend.xml file, I also have a personal gist on GitHub that contains both the .tdl and .xml files that I successfully used. (Hopefully I’ll get a chance to submit a formal PR at some point so we can get it properly fixed; it also needs a tweak to the Python code as well I believe.)

Note: while I managed to build the basic VM I didn’t try to do any post-processing, e.g. using WinRM to drive the installation of applications and tools from the outside.

Packer

I had originally put Packer to one side because of difficulties getting anything working under Hyper-V on Windows but with my new found knowledge I decided to try again on Linux. What I hadn’t appreciated was quite how much Oz was actually doing for me under the covers.

If you use the Packer documentation [2] [3] and online examples you should happily get the disk image allocated and the VM to fire up in VNC and sit there waiting for you to configure the Windows install. However, after selecting your locale and keyboard you’ll probably find the disk partitioning step stumps you. Even if you follow some examples and put an autounattend.xml on a floppy drive you’ll still likely hit a <DiskConfiguration> error during set-up. The reason is probably because you don’t have the right Windows driver available for it to talk to the underlying virtual disk device (unless you’re lucky enough to pick an IDE based example).

One of the really cool things Oz appears to do is handle this nonsense along with the autounattend.xml file which it also slips into the .iso that it builds on-the-fly. With Packer you have to be more aware and fetch the drivers yourself (which come as part of another .iso) and then mount that explicitly as another CD-ROM drive by using the qemuargs section of the Packer builder config. (In my example it’s mapped as drive E: inside Windows.)

[ "-drive", "file=./virtio-win.iso,media=cdrom,index=3" ]

Luckily you can download the VirtIO drivers .iso from a Fedora page and stick it alongside the Windows .iso. That’s still not quite enough though, we also need to tell the Windows installer where our drivers are located; we do that with a special section in the autounattend.xml file.

<DriverPaths>
  <PathAndCredentials wcm:action="add" wcm:keyValue="1">
    <Path>E:\NetKVM\w10\amd64\</Path>

Finally, in case you’ve not already discovered it, the autounattend.xml file is presented by Packer to the Windows installer as a file in the root of a floppy drive. (The floppy drive and extra CD-ROM drives both fall away once Windows has bootstrapped itself.)

"floppy_files":
[
  "autounattend.xml",

Once again, as mentioned right at the top, I have a personal gist on GitHub that contains the files I eventually got working.

With the QEMU/KVM image built we can then register it with libvirt by using virt-install. I thought the --import switch would be enough here as we now have a runnable image, but that option appears to be for a different scenario [4], instead we have to take two steps – generate the libvirt XML config file using the --print-xml option, and then apply it:

$ virt-install --vcpus ... --disk ...  --print-xml > windows.libvert.xml
$ virsh define windows.libvert.xml

Once again you can start the finalised VM via libvirt with:

$ virsh start windows-vm

Epilogue

While having lots of documentation is generally A Good Thing™, when it’s spread out over a considerable time period it’s sometimes difficult to know if the information you’re reading still applies today. This is particularly true when looking at other people’s example configuration files alongside reading the docs. The long-winded route might still work but the tool might also do it automatically now if you just let it, which keeps your source files much simpler.

Since getting this working I’ve seen other examples which suggest I may have fallen foul of this myself and what I’ve written up may also still be overly complicated! Please feel free to use the comments section on this blog or my gists to inform any other travellers of your own wisdom in any of this.

 

[1] That’s not entirely true. I ran Linux on an Atari TT and a circa v0.85 Linux kernel on a 386 PC in the early-to-mid ‘90s.

[2] The Packer docs can be misleading. For example it says the disk_size is in bytes and you can use suffixes like M or G to simplify matters. Except they don’t work and the value is actually in megabytes. No wonder a value of 15,000,000,000 didn’t work either :o).

[3] Also be aware that the version of Packer available via apt is only 1.0.x and you need to manually download the latest 1.4.x version and unpack the .zip. (I initially thought the bug in [2] was down to a stale version but it’s not.)

[4] The --import switch still fires up the VM as it appears to assume you’re going to add to the current image, not that it is the final image.


CI/CD Server Inline Scripts

Chris Oldwood from The OldWood Thing

As you might have already gathered if you’d read my 2014 post “Building the Pipeline - Process Led or Product Led?” I’m very much in favour of developing a build and deployment process locally first, then automating that, rather than clicking buttons in a dedicated CI/CD tool and hoping I can debug it later. I usually end up at least partially scripting builds anyway [1] to save time waiting for the IDE to open [2] when I just need some binaries for a dependency, so it’s not wasted effort.

Inline Scripts

If other teams prefer to configure their build or deployment through a tool’s UI I don’t really have a problem with that if I know I can replay the same steps locally should I need to test something out as the complexity grows. What I do find disturbing though is when some of the tasks use inline scripts to do something non-trivial, like perform the entire deployment. What’s even more disturbing is when that task script is then duplicated across environments and maintained independently.

Versioning

There are various reasons why we use a version control tool, but first and foremost they provide a history, which implies that we can trace back any changes that have been made and we have a natural backup should we need to roll back or restore the build server.

Admittedly most half-decent build and deployment tools come with some form of versioning built in which you gives that safety net. However having that code versioned in a separate tool and repository from the main codebase means that you have to work harder to correlate what version of the system requires what version of the build process. CI/CD tools tend to present you with a fancy UI for looking at the history rather than giving you direct access to, say, it’s internal git repo. And even then what the tool usually gives you is “what” changed, but does not also provide the commentary on “why” it was changed. Much of what I wrote in my “Commit Checklist” equally applies to build and deployment scripts as it does production code.

Although Jenkins isn’t the most polished of tools compared to, say, TeamCity it is pretty easy to configure one of the 3rd party plugins to yank the configuration files out and check them into the same repo as the source code along with a suitable comment. As a consequence any time the repo is tagged due to a build being promoted the Jenkins build configuration gets included for free.

Duplication

My biggest gripe is not with the versioning aspect though, which I believe is pretty important for any non-trivial process, but it’s when the script is manually duplicated across environments. Having no single point of truth, from a logic perspective, is simply asking for trouble. The script will start to drift as subtleties in the environmental differences become enshrined directly in the logic rather than becoming parameterised behaviours.

The tool’s text editor for inline script blocks is usually a simple edit box designed solely for trivial changes; anything more significant is expected to be handled by pasting into a real editor instead. But we all know different people like different editors and so this becomes another unintentional source of difference as tabs and spaces fight for domination.

Fundamentally there should be one common flow of logic that works for every environment. The differences between them should boil down to simple settings, like credentials, or cardinality of resources, e.g. the number of machines in the cluster. Occasionally there may be custom branches in logic, such as the need for a proxy server, but it should be treated as a minor deviation that could apply to any environment, but just happens to only be applicable to, say, one at the moment.

Testability

This naturally leads onto the inherent lack of testability outside of the tool and workflow. It’s even worse if the script makes use of some variable substitution system that the CI/CD tool provides because that means you have to manually fix-up the code before running it outside the tool, or keep running it in the tool and use printf() style debugging by looking at the task’s output.

All script engines I’m aware of accept arguments, so why not run the script as an external script and pass the arguments from the tool in the tried and tested way? This means the tool runs it pretty much the same way you do except perhaps for some minor environmental differences, like user account or current working directory which are all common problems and easily overcome. Most modern scripting languages come with a debugger too which seems silly to give up.

Of course this doesn’t mean that you have to make every single configuration setting a separate parameter to the script, that would be overly complicated too. Maybe you just provide one parameter which is a settings file for the environment with a bunch of key/value pairs. You can then tweak the settings as appropriate while you test and debug. While idempotence and the ideas behind Desired State Configuration (DSC) are highly desirable, there is no reason we can’t also borrow from the Design for Testability guidebook here too by adding features making it easier to test.

Don’t forget that scripting languages often come with unit test frameworks these days too which can allow you to mock out code which has nasty side-effects so you can check your handling and orchestration logic. For example PowerShell has Pester which really helps bring some extra discipline to script development; an area which has historically been tough due to the kinds of side-effects created by executing the code.

Complexity

When an inline script has grown beyond the point where Hoare suggests “there are obviously no deficiencies”, which is probably anything more than a trivial calculation or invocation of another tool, then it should be decomposed into smaller functional units. Then each of these units can be tested and debugged in isolation and perhaps the inline script then merely contains a couple of lines of orchestration code, which would be trivial to replicate at a REPL / prompt.

For example anything around manipulating configuration files is a perfect candidate for factoring out into a function or child script. It might be less efficient to invoke the same function a few times rather than read and write the file once, but in the grand scheme of things I’d bet it’s marginal in comparison to the rest of the build or deployment process.

Many modern scripting languages have a mechanism for loading some sort of module or library of code. Setting up an internal package manager is a pretty heavyweight option in comparison to publishing a .zip file of scripts but if it helps keep the script complexity under control and provides a versioned repository that can be reliably queried at execution time, then why not go for that instead?

Scripts are Artefacts

It’s easy to see how these things happen. What starts off as a line or two of script code eventually turns into a behemoth before anyone realises it’s not been versioned and there are multiple copies. After all, the deployment requirements historically come up at the end of the journey, after the main investment in the feature has already happened. The pressure is then on to get it live, and build & deployment, like tests, is often just another second class citizen.

The Walking Skeleton came about in part to push back against this attitude and make the build pipeline and tests part and parcel of the whole delivery process; it should not be an afterthought. This means it deserves the same rigour we apply elsewhere in our process.

Personally I like to see everything go through the pipeline, by which I mean that source code, scripts, configuration, etc. all enter the pipeline as versioned inputs and are passed along until the deployed product pops out the other end. The way you build your artefacts is inherently tied to the source code and project configuration that produces it. Configuration, whether it be infrastructure or application settings, is also linked to the version of the tools, scripts, and code which consumes it. It’s more awkward to inject version numbers into scripts, like you do with binaries, but even pushing them through the pipeline in a .zip file with version number in the filename makes a big difference to tracking the “glue”.

Ultimately any piece of the puzzle that directly affects the ability to safely deliver continuous increments of a product needs to be held in high regard and treated with the respect it deserves.

 

[1] See “Cleaning the Workspace” for more about why I don’t trust my IDE to clean up after itself.

[2] I’m sure I could load Visual Studio, etc. in “safe mode” to avoid waiting for all the plug-ins and extensions to initialise but it still seems “wrong” to load an entire IDE just to invoke the same build tool I could invoke almost directly from the command line myself.

Surgical Support Needs Surgical Tools

Chris Oldwood from The OldWood Thing

In the world of IT support there is the universal solution to every problem – turn it off and on again. You would hope that this kind of drastic action is only drawn upon when all other options have been explored or that the problem is known a priori to require such a response and is the least disruptive course of action. Sadly what was often just a joke in the past has become everyday life for many systems as rebooting or restarting is woven into the daily support routine.

In my professional career I have worked on a number of systems where there are daily scheduled jobs to reboot machines or restart services. I’m not talking about the modern, proactive kind like the Chaos Monkey which is used to probe for weaknesses, or when you force cluster failovers to check everything’s healthy; I’m talking about jobs where the restart is required to ensure the correct functioning of the system – disabling them would cripple it.

Sledgehammers

The need for the restart is usually to overcome some kind of corrupt or immutable internal state, or to compensate for a resource leak, such as memory, which degrades the service to an unacceptable level. An example of the former I’ve seen is to flush a poisoned cache or pick up the change in date, while for the latter it might be unclosed database connections or file handles. The notion of “recycling” processes to overcome residual effects has become so prominent that services like IIS provide explicit support for it [1].

Depending on where the service sits in the stack the restart could be somewhat disruptive, if it’s located on the edge, or largely benign if it’s purely internal, say, a background message queue listener. For example I once worked on a compute farm where one of the front-end services was restarted every night and that caused all clients to drop their connection resulting in a support email being sent due to the “unhandled exception”. Needless to say everyone just ignored all the emails as they only added to the background noise making genuine failures harder to spot.

These kind of draconian measures to try and attain some system stability actually make matters worse as the restarts then begin to hide genuine stability issues which eventually start happening during business hours as well and therefore cause grief for customers as unplanned downtime starts occurring. The impetus for one of my first ACCU articles “Utilising More Than 4GB of Memory in a 32-bit Windows Process” came from just such an issue where a service suddenly starting failing with out-of-memory errors even after a restart if the load was awkwardly skewed. It took almost four weeks to diagnose and properly fix the issue during which there were no acceptable workarounds – just constant manual intervention from the support team.

I also lost quite a few hours on the system I mentioned earlier debugging a problem in the caching mechanism which was masked by a restart and only surfaced because the restart failed to occur. No one had remembered about this failure mode because everyone was so used to the restart hiding it. Having additional complexity in the code for a feature that will never be used in practice is essentially waste.

Cracking Nuts

Although it’s not true in all cases (the memory problem just described being a good example) the restart option may be avoidable if the process exposed additional behaviours that allowed for a more surgical approach to recovery to take place. Do you really need to restart the entire process just to flush some internal cache, or maybe just a small part of it? Similarly if you need to bump the business date via an external stimulus can that not be done through a “discoverable” API instead of hidden as part of a service restart [2]?

In some of my previous posts and articles, e.g “From Test Harness To Support Tool”, “Home-Grown Tools” and more recently in “Libraries, Console Apps, and GUIs”, I have described how useful I have found writing simple custom tools to be for development and deployment but also, more importantly, for support. What I think was missing from those posts that I have tried to capture in this one, most notably through its title, is to focus on resolving system problems with the minimal amount of disruption. Assuming that you cannot actually fix the underlying design issue without a serious amount of time and effort can you at least alleviate the problem in the shorter term by adding simple endpoints and tools that can be used to make surgical-like changes inside the critical parts of the system?

For example, imagine that you’re working on a grid computing system where tasks are dished out to different processes and the results are aggregated. Ideally you would assume that failures are going to happen and that some kind of timeout and retry mechanism would be in place so that when a process dies the system recovers automatically [3]. However, if for some reason that automatic mechanism does not exist how are you going to recover? Given that someone (or something) somewhere is waiting for their results how are you going to “unblock the system” and allow them to make progress, without also disturbing all your other users who are unaffected?

You can either try and re-submit the failed task and allow the entire job to complete or kill the entire job and get the client to re-submit its job. As we’ve seen one way to achieve the latter would be to restart parts of the system thereby flushing the job from it. When this is a once in a million event it might make sense [4] but once the failures start racking up throwing away every in-flight request just to fix the odd broken one becomes more and more disruptive. Instead you need a way to identify the failed task (perhaps using the logs) and then instruct the system to recover such as by killing just that job or by asking it to resubmit it to another node.

Hence, ideally you’d just like to hit one admin endpoint and say something like this:

> admin-cli kill-job --server prod --job-id 12345

If that’s not easily achievable and there is distributed state to clear up you might need a few separate little tools instead that can target each part of system, for example:

> admin-cli remove-node –s broker-prod --node NODE99
> admin-cli remove-results -s results-prod --job 12345
> admin-cli remove-tasks –s taskq-prod --job 12345
> admin-cli reset-client –s ui-prod --client CLT42
> admin-cli add-node –s broker-prod --node NODE99

This probably seems like a lot of work to write all these little tools but what I’ve found in practice is that usually most of the tricky logic in the services already exists – you just need to find a way to invoke it externally with the correct arguments.

These days it’s far easier to graft a simple administration endpoint onto an existing service. There are plenty of HTTP libraries available that will allow you to expose a very basic API which you could even poke with CURL. If you’re already using something more meaty like WCF then adding another interface should be trivial too.

Modern systems are becoming more and more decoupled through the use of message queues which also adds a natural extension point as they typically already do all the heavy lifting and you just need to add new message handlers for the extra behaviours. One of the earliest distributed systems I worked on used pub/sub on a system-wide message bus both for functional and administrative use. Instead of using lots of different tools we had a single admin command line tool that the run playbook generally used (even for some classic sysadmin stuff like service restarts) as it made the whole support experience simpler.

Once you have these basic tools it then becomes easy to compose and automate them. For example on a similar system I started by adding a simple support stored procedure to find failed tasks in a batch processing system. This was soon augmented by another one to resubmit a failed task, which was then automated by a simple script. Eventually it got “productionised” and became a formal part of the system providing the “slow retry” path [3] for automatic recovery from less transient outages.

Design for Supportability

One of the design concepts I’ve personally found really helpful is Design for Testability; something which came out of the hardware world and pushes us to design our systems in a way that makes them much easier test. A knock-on effect of this is that you can reduce your end-to-end testing burden and deliver quicker.

A by-product of Design for Testability is that it causes you to design your system in a way that allows internal behaviours to be observed and controlled in isolation. These same behaviours are often the same ones that supporting a system in a more fine-grained manner will also require. Hence by thinking about how you test your system components you are almost certainly thinking about how they would need to be supported too.

Ultimately of course those same thoughts should also be making you think about how the system will likely fail and therefore what needs to be put in place beforehand to help it recover automatically. In the unfortunate event that you can’t recover automatically in the short term you should still have some tools handy that should facilitate a swift and far less disruptive manual recovery.

 

[1] Note that this is different from a process restarting itself because it has detected that it might have become unstable, perhaps through the use of the Circuit Breaker pattern.

[2] Aside from the benefits of comprehension this makes the system far more testable as it means you can control the date and therefore write deterministic tests.

[3] See “When Does a Transient Failure Stop Being Transient” for a tangent about fast and slow retries.

[4] Designing any distributed system that does not tolerate network failures is asking for trouble in my book but the enterprise has a habit of believing the network is “reliable enough”.

Support-Friendly Tooling

Chris Oldwood from The OldWood Thing

One of the techniques I briefly mentioned in my last post “Treat All Test Environments Like Production” was how constraining the test environments by adhering to the Principle of Least Privilege drove us to add diagnostic specific features to our services and tools.

In some cases that might be as simple as exposing some existing functionality through an extra command line verb or service endpoint. For example a common technique these days is to add a “version” verb or “–-version” switch to allow you to check which build of a particular tool or service you have deployed [1].

As Bertrand Meyer suggests in his notion of Command/Query Separation (CQS) any behaviour which is a query in nature should have no side-effects and therefore could also be freely available to use for diagnostic purposes – security and performance issues notwithstanding. Naturally these queries would be over-and-above any queries you might run directly against your various data stores, i.e. databases, file-system, etc. using the vendors own lower-level tools.

Where it gets a little more tricky is on the “command” side as we might need to investigate the operation but without disturbing the current state of the system. In an ideal world it should be possible to execute them against a part of the system reserved for such eventualities, e.g. a special customer or schema that looks and acts like a real one but is owned by the team and therefore its side-effects are invisible to any real users. (This is one of the techniques that falls under the in-vogue term of “testing in production”.)

If the issue can be isolated to a particular component then it’s probably more effective to focus on that part of the system by replaying the operation whilst simultaneously redirecting the side-effects somewhere else (or avoiding them altogether) so that the investigation can be safely repeated. One technique here is to host the component in another type of process, such as a GUI or command line tool and provide a variety of widgets or switches to control the input and output locations. Alternatively you could use the Null Object pattern to send the side-effects into oblivion.

In its most simplest form it might be a case of adding a “--ReadOnly” switch that disables all attempts to write to back-end stores (but leaves logging intact if that won’t interfere). This would give you the chance to safely debug the process locally using production inputs. As an aside this idea has been formalised in the PowerShell world via the “-WhatIf” switch which allows you to run a script whilst disabling (where supported) the write actions of any cmdlets.

If the operation requires some form of bulk processing where there is likely to be far too much output for stdout or because you need a little more structure to the data then you can add multiple switches instead, such as the folder to write to and perhaps even a different format to use which is easier to analyse with the usual UNIX command line tools. If implementing a whole different persistence mechanism for support is considered excessive [2] you could just allow, say, an alternative database connection string to be provided for the writing side and point to a local instance.

Earlier I mentioned that the Principle of Least Privilege helped drive out the need for these customisations and that’s because restricting your access affects you in two ways. The first is that by not allowing you to make unintentional changes you cannot make the situation worse simply through your analysis. For example if you happened to be mistaken that a particular operation had no side-effects but it actually does now, then they would be blocked as a matter of security and an error reported. If done in the comfort of a test environment you now know what else you need to “mock out” to be able to execute the operation safely in future. And if the mocking feature somehow gets broken, your lack of privilege has always got your back. This is essentially just the principle of Defence in Depth applied for slightly different reasons.

The second benefit you get is a variation of yet another principle – Design for Testability. To support such features we need to be able to substitute alternative implementations for the real ones, which effectively means we need to “program to an interface, not an implementation”. Of course this will likely already be a by-product of any unit tests we write, but it’s good to know that it serves another purpose outside that use case.

What I’ve described might seem like a lot of work but you don’t have to go the whole hog and provide a separate host for the components and a variety of command-line switches to enable these behaviours, you could probably get away with just tweaking various configuration settings, which is the approach that initially drove my 2011 post “Testing Drives the Need for Flexible Configuration”. What has usually caused me to go the extra step though is the need to use these features more than just once in a blue moon, often to automate their use for longer term use. This is something I covered in much more detail very recently in “Libraries, Console Apps & GUIs”.

 

[1] Version information has been embedded in Windows binaries since the 3.x days back in the ‘90s but accessing it easily usually involved using the GUI shell (i.e. Explorer) unless the machine is remote and has limited access, e.g. the cloud. Luckily PowerShell provides an alternative route here and I’m sure there are plenty of third party command line tools as well.

[2] Do not underestimate how easy it is these days to serialise whole object graphs into JSON files and then process them with tools like JQ.

Don’t Hide the Solution Structure

Chris Oldwood from The OldWood Thing

Whenever you join an existing team and start work on their codebase you need to orientate yourself so that you have a feel for the system’s architecture and design. If you’re lucky there is some documentation, perhaps nice diagrams to give you an overview. Hopefully you also have an extensive suite of tests to tell you how the system behaves.

More than likely there is nothing or very little to go on, and if it’s a truly legacy system any documentation could well be way out of date. At this point you pretty much only have the source code to work from. Whilst this is the source of truth, the amount of code you need to read to become au fait with all the various high-level concepts depends in part on how well it’s laid out.

Static Structure

Irrespective of whether you like to think of your layers in terms of onions or brick walls, all code essentially gets organised on disk and that means the solution structure is hierarchical in nature. In the most popular languages that support namespaces, these are also hierarchical and are commonly laid out on disk to reflect the same hierarchy [1].

Although the compiler is happy to just hoover up source code from the entire solution and largely ignore the relative position of the callers and callees there are useful conventions, which if honoured, allow you to reason and refactor the code more easily due to lower coupling. For example, defining an interface in the same source file as a class that implements it suggests a different inheritance use than when the interface sits externally further up the hierarchy. Also, seeing code higher up the hierarchy referencing types deeper down in an unrelated branch is another smell, of an abstraction potentially depending on an implementation detail.

Navigating the Structure

One of the things I’ve noticed in recent years whilst pairing is that many developers appear to navigate the source code solely through their IDE, and within the IDE by using features like “go to definition (implementation)”. Some very rarely see the solution structure because they hide it to gain more screen real estate for the source file of current interest [2].

Hence the only time the solution structure is visible is when there is a need to add a new source file. My purely anecdotal evidence suggests that this will be added without a great deal of thought as the code can be easy located in future directly by the author through its class name or another reference; they never have to consider where it “logically” resides.

Sprawling Suburbs

The net result is that namespaces and packages suffer from urban sprawl as they slowly accrete more and more code. This newer code adds more dependencies and so the package as a whole acquires an ever increasing number of dependencies. Left unchecked this can lead to horrible cyclic dependencies that are a nightmare to resolve.

I recently had the opportunity to revisit the codebase for a greenfield system I had started a few years before. We initially partitioned the code into a few key assemblies to get ourselves going and so I was somewhat surprised to still see the same assemblies a few years later, albeit massively overgrown with extra responsibilities. As a consequence even their simple home-grown tools had bizarre dependencies dragged in through bloated shared libraries [3].

Take a Stroll

So in future, instead of taking the Underground (subway) through your codebase every day, stop, and take a stroll every now-and-then around the paths. The same rules about cohesion within the methods of a class also apply at the higher levels of design – classes in a namespace, namespaces in an assembly, assemblies in a solution, etc. Then you’ll find that as the system grows it’s easier to refactor at the package level [3].

(For more on this topic see my older post “Who’s Maintaining the 100 Foot View?”.)

 

[1] Annoyingly this is not a common practice in the C++ codebases I’ve worked on.

[2] If I was being flippant I might suggest that if you really need the space the code may be too complicated, as I once did on Twitter here.

[3] I once dragged in a project’s shared library for a few useful extension methods to use in a simple console app and found I had pulled in an IoC container and almost a dozen other NuGet dependencies!

[4] In C# the internal access modifier has zero effect if you stick all your code into one assembly.

Excel-style DDE Requests

Chris Oldwood from The OldWood Thing

Despite being over 2 decades old Microsoft’s Dynamic Data Exchange (DDE) in Windows still seems to be in use for Windows IPC by a not insignificant number of companies. At least, if the frequency of DDE questions in my inbox is anything to go by [1][2].

Earlier this year I got a question from someone who was trying to use my DDE Command tool (a command line tool for querying DDE servers) to get data out of the MetaTrader 4 platform. Finance is the area I first came across DDE in anger and it still seems to be a popular choice there even to this day.

Curious Behaviour

The problem was that when they used the ddecmd “request” verb to send an XTYP_REQUEST message to the MetaTrader 4 DDE Server (MT4) for a symbol they always got an immediate result of “N/A”. As a workaround they tried using the “advise” verb, which sends an XTYP_ADVSTART, to listen for updates for a short period instead. This worked for symbols which changed frequently but missed those that didn’t change during the interval. Plus this was a dirty hack as they had to find a way to send a CTRL+C to my tool to stop it after this short interval.

Clearly the MetaTrader DDE server couldn’t be this broken, and the proof was that it worked fine with Microsoft Excel – the other stalwart of the finance industry. Hence the question posed to me was why Excel appeared to work, but sending a request from my tool didn’t, i.e. was there a bug in my tool?

Reproducing the Problem

Given the popularity of the MetaTrader platform and Microsoft Excel the application of Occam’s Razor would suggest a bug in my tool was clearly the most likely answer, so I investigated…

Luckily MetaTrader 4 is a free download and they will even give you a demo account to play with which is super welcome for people like me who only want to use the platform to fix interop problems in their own tools and don’t actually want to use it to trade.

I quickly reproduced the problem by sending a DDE request for a common symbol:

> DDECmd.exe request -s MT4 -t QUOTE -i COPPER
N\A

And then I used the DDE advise command to see it working for background updates:

> DDECmd.exe advise -s MT4 -t QUOTE -i COPPER
2.5525 2.5590
2.5520 2.5590
. . . 

I also tried it in Excel too to see that it was successfully managing to request the current value, even for slow ticking symbols.

How Excel Requests Data Via DDE

My DDE Command tool has a nice feature where it can also act as a DDE server and logs the different requests sent to it. This was originally added by me to help diagnose problems in my own DDE client code but it’s also been useful to see how other DDE clients behave.

As you can see below, when Excel opens a DDE link (=TEST|TEST!X) it actually sends a number of different XTYP_ADVSTART messages as it tries to find the highest fidelity format to receive the data in:

> DDECmd.exe listen –s TEST –t TEST
XTYP_CONNECT: 'TEST', 'TEST'
XTYP_CONNECT_CONFIRM: 'TEST', 'TEST'
XTYP_ADVSTART: 'T…', 'T…', 'StdDocumentName', '49157'
XTYP_ADVSTART: 'TEST', 'TEST', 'X', '50018'
. . .
XTYP_REQUEST: 'TEST', 'TEST', 'X', '50018'

After it manages to set-up the initial advise loop it then goes on to send a one-off XTYP_REQUEST to retrieve the initial value. So, apart from the funky data formats it asks for, there is nothing unusual about the DDE request Excel seems to make.

Advise Before Request

And then it dawned on me, what if the MetaTrader DDE server required an advise loop to be established on a symbol before you’re allowed to request it? Sure enough, I hacked a bit of code into my request command to start an advise loop first and the subsequent DDE request succeeded.

I don’t know if this is a bug in the MetaTrader 4 DDE server or the intended behaviour. I suspect the fact that it works with Excel covers the vast majority of users so maybe it’s never been a priority to support one-off data requests. The various other financial DDE servers I coded against circa 2000 never exhibited this kind of requirement – you could make one-off requests for data with a standalone XTYP_REQUEST message.

The New Fetch Command

The original intent of my DDE Command tool was to provide a tool that allows each XTYP_* message to be sent to a DDE server in isolation, mostly for testing purposes. As such the tools’ verbs pretty much have a one-to-one correspondence with the DDE messages you might send yourself.

To allow people to use my tool against the MetaTrader 4 platform to snapshot data would therefore mean making some kind of small change. I did consider adding various special switches to the existing request and advise verbs, either to force an advise first or to force a request if no immediate update was received but that seemed to go against the ethos a bit.

In the end I decided to add a new verb called “fetch” which acts just like “request”, but starts an advise loop for every item first, then sends a request message for the latest value, thereby directly mimicking Excel.

> DDECmd.exe fetch -s MT4 -t QUOTE -i COPPER -i SILVER
COPPER|2.6075 2.6145
SILVER|16.771 16.821

Hey presto it now works!

This feature was released in DDE Command v1.6.

 

[1] This is a bit of artistic licence :o), they are not a daily occurrence but once every couple of months wouldn’t be far off. So yes, “DDE Is Still Alive & Kicking”.

[2] Most recently it seems quite a few people are beginning to discover that Microsoft dropped NetDDE support way back in Windows Vista.