Writing: Becoming a Better Programmer

Pete Goodliffe from Pete Goodliffe

It's finally here!

My new book, Becoming a Better Programmer, is fully edited, laid out, and is now available as a final product for your reading pleasure, published by O'Reilly. You can purchase it in printed form or as a digital version for your e-reader of choice.

Find out more about the book from the O'Reilly product page. You can view the full table of contents there. Or head over to Amazon to purchase. If you are a Safari subscriber, you can read it here. Grab your iBook here.

The cover image is a flying fish. I'll leave it to your imagination to work out the significance.

It's great to finally see this labour of love come to fruition, and I do hope that stands as a useful resource for programmers today.

One of my favourite parts of the book is a family in-joke in the "advance praise" at the front. Nestled amongst the luminaries and expert programmers who graciously contributed their honest thoughts on the book is another very honest opinion:

How Visual Lint parses projects and makefiles

Products, the Universe and Everything from Products, the Universe and Everything

Code analysis tools can require a lot of configuration to be useful. Whilst some (e.g. Vera++ or cpplint.py) need very little configuration to make use if effectively, others such as PC-lint (and even, to a lesser extent, CppCheck) may need to be fed pretty much the same configuration as the compiler itself to be useful. As a result the command lines you need to use with some analysis tools are pretty complex in practice (which is of course a disincentive to using them, and so where Visual Lint comes in. But I digress...). In fact, more capable code analysis tools may need to be given even more information than the compiler iteself to generate meaningful results - for example they may need to be configured with the built-in preprocessor settings of the compiler itself. In a PC-lint analysis configuration, this is part of the job compiler indirect files such as co-msc110.lnt do. As a result of the above, a product such as Visual Lint which effectively acts as a "front end" to complex (and almost infinitely configurable) code analysis tools like PC-lint needs to care about the details of how your compiler actually sees your code every bit as much as the analysis tool itself does. What this means in practice is that Visual Lint needs be able to determine the properties of each project it sees and understand how they are affected by the properties of project platforms, configurations and whatever compiler(s) a project uses. When it generates an analysis command line, it may need to reflect those properties so that the analysis tool sees the details of the preprocessor symbols, include folders etc. each file in the project would normally be compiled with - including not only the properties exposed in the corresponding project or makefile, but also built-in symbols etc. normally provided by the corresponding compiler. That's a lot of data to get right, and inevitably sometimes there will be edge cases where don't quite get it right the first time. It goes without saying that if you find one of those edge cases - please tell us! So, background waffle over - in this post I'm going to talk about one of the things Visual Lint does - parsing project files to identify (among other things) the preprocessor symbols and include folders for each file in order to be able to reflect this information in the analysis tool configuration. When analysing with CppCheck, preprocessor and include folder data read in this way can be included on the generated command line as -D and -I directives. Although we could do the same with PC-lint, it is generally better to write the preprocessor and include folder configuration to an indirect ("project.lnt") file which also includes details of which implementation (.c, .cpp, .cxx etc.) files are included in the project -as well as any other project specific options. For example:
// Generated by Visual Lint from file: SourceVersioner_vs90.vcproj
// -dConfiguration=Release|Win32
-si4 -sp4 // Platform = "Win32"
+ffb // ForceConformanceInForLoopScope = "TRUE"
-D_UNICODE;UNICODE // CharacterSet = "1"
-DWIN32;NDEBUG;_CONSOLE // PreprocessorDefinitions = "WIN32;NDEBUG;_CONSOLE"
-D_CPPRTTI // RuntimeTypeInfo = "TRUE"
-D_MT // RuntimeLibrary = "0"
// AdditionalIncludeDirectories = ..\Include"
-save -e686 //
-i"..\Include" //
-restore //
// SystemIncludeDirectories = "
// F:\Projects\Libraries\boost\boost_1_55_0;
// C:\Program Files (x86)\
// Microsoft Visual Studio 9.0\VC\include;
// C:\Program Files (x86)\
// Microsoft Visual Studio 9.0\VC\atlmfc\include;
// C:\Program Files\
// Microsoft SDKs\Windows\v6.0A\include;
// F:\Projects\Libraries\Wtl\8.1\Include"
-save -e686 //
+libdir("C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include")
+libdir("C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\atlmfc\include")
+libdir("C:\Program Files\Microsoft SDKs\Windows\v6.0A\include")
+libdir("C:\Program Files\Microsoft SDKs\Windows\v6.0A\include")
-i"C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include"
-i"C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\atlmfc\include"
-i"C:\Program Files\Microsoft SDKs\Windows\v6.0A\include"
-restore //
SourceVersioner.cpp // RelativePath = "SourceVersioner.cpp"
SourceVersionerImpl.cpp // RelativePath = "SourceVersionerImpl.cpp"
stdafx.cpp // RelativePath = "stdafx.cpp"
Shared\FileUtils.cpp // RelativePath = "Shared\FileUtils.cpp"
Shared\FileVersioner.cpp // RelativePath = "Shared\FileVersioner.cpp"
Shared\PathUtils.cpp // RelativePath = "Shared\PathUtils.cpp"
// RelativePath = "Shared\ProjectConfiguration.cpp"
Shared\ProjectFileReader.cpp // RelativePath = "Shared\ProjectFileReader.cpp"
Shared\SolutionFileReader.cpp // RelativePath = "Shared\SolutionFileReader.cpp"
Shared\SplitPath.cpp // RelativePath = "Shared\SplitPath.cpp"
Shared\StringUtils.cpp // RelativePath = "Shared\StringUtils.cpp"
Shared\XmlUtils.cpp // RelativePath = "Shared\XmlUtils.cpp"
A file like this is written for every project configuration we analyse, but the mechanism used to actually read the configuration data from projects varies slightly depending on the project type and structure. In the case of the Visual Studio 2002, 2003, 2005 and 2008 .vcproj files Visual Lint was originally designed to work with, this is straightforward as the project file contains virtually all of the information needed in a simple to read, declarative form. Hence to parse a .vcproj file we simply load its contents into an XML DOM object and query the properties in a straightforward manner. Built-in preprocessor symbols such as _CPPUNWIND are defined by reading the corresponding properties (EnableExceptions in the case above) or inferred from the active platform etc. Similarly, for Visual C++ 6.0 and eMbedded Visual C++ 4.0 project files we just parse the compiler command lines in the .dsp or .vcp file directly. This is pretty straightforward as well as although .dsp and .vcp files are really makefiles they have a very predictable structure. Some other development environments (e.g. Green Hills MULTI, CodeVisionAVR) have bespoke project file formats which are generally easy to parse using conventional techniques. Visual Studio 2010, 2012 and 2013 .vcxproj project files are far more tricky, as the MSBuild XML format they use is effectively a scripting language rather than a declarative format. To load them, we effectively have to execute them (but obviously without running the commands they contain). As you can imagine this is rather tricky! We basically had to write a complete MSBuild parsing engine* for this task, which effectively executes the MSBuild script in memory with defined parameters to identifiy its detailed properties. * Although there are MSBuild APIs which could conceivably help, there are implemented in managed code - which we can't use in a plug-in environment due to .NET framework versioning restrictions. Instead, our MSBuild parsing engine is written natively in C++. To work out the parameters to apply to the MSBuild script, we prescan the contents of the .vcxproj file to identify the configurations and platforms available. Once we have those, we can run the script in memory and inspect the properties it generates for the required build configuration. This process is also used with CodeGear C++ and Atmel Studio projects, both of which are MSBuild based. Eclipse projects (.project and .cproject files) are also XML based and are not too difficult to parse, but whereas it is straightforward to work out how things are defined in a .vcproj file, Eclipse project files are much less clearly defined and more variable (not surprising as they effectively represent serialised Java object data). To make matters worse, each compiler toolchain can have its own sub-format, so things can change in very subtle ways between based projects for different Eclipse based environments such as Eclipse/CDT, QNX Momentics and CodeWarrior. In addition, Eclipse C/C++ projects come in two flavours - managed builder and makefile. Of the two, managed builder projects are easy to deal with - for example to determine the built-in settings of the compiler in a managed builder we read the scanner (*.sc) file produced by the IDE when it builds the project, and add that data to the configuration data we have been able to read from the project file. The scanner (.sc) file is a simple XML file located within the Eclipse workspace .metadata folder. Here's an extract to give you an idea what it looks like:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?scdStore version="2"?>
<scannerInfo id="org.eclipse.cdt.make.core.discoveredScannerInfo">
<instance id="cdt.managedbuild.config.gnu.mingw.exe.debug.116390618;
<collector id="org.eclipse.cdt.make.core.PerProjectSICollector">
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include/c++"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include/c++/mingw32"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include/c++/backward"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/../../../../include"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include"/>
<includePath path="c:\mingw\bin\../lib/gcc/mingw32/4.5.2/include-fixed"/>
<definedSymbol symbol="__STDC__=1"/>
<definedSymbol symbol="__cplusplus=1"/>
<definedSymbol symbol="__STDC_HOSTED__=1"/>
<definedSymbol symbol="__GNUC__=4"/>
<definedSymbol symbol="__GNUC_MINOR__=5"/>
<definedSymbol symbol="__GNUC_PATCHLEVEL__=2"/>
<definedSymbol symbol="__GNUG__=4"/>
<definedSymbol symbol="__SIZE_TYPE__=unsigned int"/>
Unfortunately scanner files are not generated for makefile projects, and the enclosing project files do not give details of the preprocessor symbols and include folders needed to analyse them. So how do we do it? The answer is similar to the way we load MSBuild projects - by running the makefile without executing the commands within. In this case however the make tools themselves can (fortunately!) lend a hand, as both NMake and GNU Make have an option which runs the makefile and echoes the commands which would be executed without actually running them. For NMake this is /n and GNU Make -n or --just-print. The /B (NMake) -B (GNU Make) switch can also be used to ensure that all targets are executed regardless of the build status of the project (otherwise we wouldn't be able to read anything if the project build was up to date). If we run a makefile with these switches and capture the output we can then parse the compiler command lines themselves and extract preprocessor symbols etc. from it. As so many compilers have similar command line switches for things like preprocessor symbols and include folders this is generally a pretty straightforward matter. Many variants of GCC also have command line options which can report details of built-in preprocessor symbols and system include folders, which we can obviously make use of - so the fact that many embedded C/C++ compilers today are based on GCC is very useful indeed. There's a lot of detail I've not covered here, but I hope you get the general idea. It follows from the above that adding support for new project file types to Visual Lint is generally not difficult - provided we have access to representative project files and (preferably) a test installation of the IDE in question. If there is a project file type you would like us to look at, please tell us.

A More Full-Featured Emacs company-mode Backend

Austin Bingham from Good With Computers

In the first article in this series we looked at how to define the simplest company-mode backend. [1] This backend drew completion candidates from a predefined list of options, and allowed you to do completion in buffers in fundamental mode. The main purpose of that article was to introduce the essential plumbing of a company-mode backend.

In this article we'll expand upon the work of the first, adding some useful UI elements like annotations and metadata. We'll also implement a rough form of fuzzy matching, wherein candidates will be presented to the user when they mostly match the prefix. After this article you'll know almost everything you need to know about writing company-mode backends, and you'll be in a great position to learn the rest on your own.

Most of what we'll be doing in the article revolves around handling completion candidate "metadata", data associated in some way with our completion candidates. In practice this kind of data covers things like documentation strings, function signatures, symbols types, and so forth, but for our purposes we'll simply associate some biographical data with the names in our completion set sample-completions.

company-mode provides affordances for displaying metadata as part of the completion process. For example, if your backend is showing completions for function names, you could display the currently-selected function's signature in the echo area. We'll develop a backend that displays a sentence about the selected candidate in the echo area, and we'll also display their initials as an annotation in the candidate selection popup menu.

Adding more data to our completion candidates

First we need to add some metadata to our existing completion candidates. To do this we'll use Emacs text properties. ((Text properties allow you to associate arbitrary data with strings. You can read about them here. Specifically, we use the special read syntax for text properties.)) For each completion candidate we define an :initials property containing their initials and a :summary property containing a one-sentence summary of the candidate. [2] To add these properties, update sample-completions to look like this:

(defconst sample-completions
  '(#("alan" 0 1
      (concat "Alan Mathison Turing, OBE, FRS (/ˈtjʊərɪŋ/ "
              "tewr-ing; 23 June 1912 – 7 June 1954) was a "
              "British mathematician, logician, cryptanalyst, "
              "philosopher, pioneering computer scientist, "
              "mathematical biologist, and marathon and ultra "
              "distance runner.")))
    #("john" 0 1
      (concat "John von Neumann (/vɒn ˈnɔɪmən/; December 28, "
              "1903 – February 8, 1957) was a Hungarian and "
              "American pure and applied mathematician, physicist, "
              "inventor and polymath.")))
    #("ada" 0 1
      (concat "Augusta Ada King, Countess of Lovelace (10 December "
              "1815 – 27 November 1852), born Augusta Ada Byron "
              "and now commonly known as Ada Lovelace, was an "
              "English mathematician and writer chiefly known for "
              "her work on Charles Babbage's early mechanical "
              "general-purpose computer, the Analytical Engine.")))
    #("don" 0 1
      (concat "Donald Ervin Knuth (/kəˈnuːθ/[1] kə-nooth; born "
              "January 10, 1938) is an American computer "
              "scientist, mathematician, and Professor Emeritus "
              "at Stanford University.")))))

Attaching properties like this is a very convenient way to store metadata for completion candidates. Of course in a real backend you probably wouldn't have a hard-coded list of candidates, and you'd be fetching them dynamically from a server, database, or external process. In that case, you'd need to also dynamically fetch the metadata you want and attach it to the candidate strings you serve through your backend. In the end, text properties work well in this context because they transparently transport the metadata - which company-mode doesn't know about - with the completion strings that company-mode definitely knows about.

Adding completion menu annotations

This change by itself doesn't really do anything, of course. All we've done is add properties to some strings, and we need to instruct company-mode on how to actually use them for display. The first way we'll use this metadata, then, is to add a small annotation to each entry in the popup menu used for candidate selection. To add this annotation, we need to update company-sample-backend to respond to the annotation command. This command should resolve to the annotation you want to use for the given candidate. Typically this means calling a function taking the completion candidate string arg and returning the annotation string.

First let's define a function that takes a completion candidate string and returns an annotation. Remember that our candidate strings store their metadata as text properties, so fundamentally this function simply needs to extract a property. For the annotation, we'll extract the :initials property and return it (prefixed with a blank.) That function looks like this:

(defun sample-annotation (s)
  (format " [%s]" (get-text-property 0 :initials s)))

Next we need to update our backend to respond to the annotation command like this:

(defun company-sample-backend (command &optional arg &rest ignored)
  (interactive (list 'interactive))``

  (case command
    (interactive (company-begin-backend 'company-sample-backend))
    (prefix (and (eq major-mode 'fundamental-mode)
      (lambda (c) (string-prefix-p arg c))
    (annotation (sample-annotation arg))))

In the last line we tell the backend to call sample-annotation with the candidate string to produce an annotation.

Now when we do completion we see the candidates' initials in the popup menu:


Displaying metadata in the echo area

Where the annotation command adds a small annotation to the completion popup menu, the meta backend command produces text to display in the echo area. [3] The process for producing the metadata string is almost exactly like that of producing the annotation string. First we write a function that extracts the string from the candidate text properties. Then we wire that function into the backend through the meta command.

As you've probably guessed, the function for extracting the metadata string will simply read the :summary property from a candidate string. It looks like this:

(defun sample-meta (s)
  (get-text-property 0 :summary s))

The changes to the backend look like this:

(defun company-sample-backend (command &optional arg &rest ignored)
  (interactive (list 'interactive))

  (case command
    (interactive (company-begin-backend 'company-sample-backend))
    (prefix (and (eq major-mode 'fundamental-mode)
      (lambda (c) (string-prefix-p arg c))
    (annotation (sample-annotation arg))
    (meta (sample-meta arg))))

As before, in the last line we associate the meta command with our sample-meta function.

Here's how the metadata looks when displayed in the echo area:

Screen Shot 2014-11-03 at 12.02.10 PM

Fuzzy matching

As a final improvement to our backend, let's add support for fuzzy matching. This will let us do completion on prefixes which don't exactly match a candidate, but which are close enough. [4] For our purposes we'll implement a very crude form of fuzzy matching wherein a prefix matches a candidate if the set of letters in the prefix is a subset of the set of letters in the candidate. The function for performing fuzzy matching looks like this:

(defun sample-fuzzy-match (prefix candidate )
  (cl-subsetp (string-to-list prefix)
              (string-to-list candidate)))

Now we just need to modify our backend a bit. First we need to modify our response to the candidates command to use our new fuzzy matcher. Then we need to respond to the no-cache command by returning true. [5] Here's how that looks:

(defun company-sample-backend (command &optional arg &rest ignored)
  (interactive (list 'interactive))

  (case command
    (interactive (company-begin-backend 'company-sample-backend))
    (prefix (and (eq major-mode 'fundamental-mode)
      (lambda (c) (sample-fuzzy-match arg c))
    (annotation (sample-annotation arg))
    (meta (sample-meta arg))
    (no-cache 't)))

As you can see, we've replaced string-prefix-p in the candidates response with sample-fuzzy-match, and we've added (no-cache 't).

Here's how our fuzzy matching looks in action:

Screen Shot 2014-11-03 at 12.23.45 PM

That's all, folks

We've seen how to use Emacs' text properties to attach metadata to candidate strings. This is a really useful technique to use when developing company-mode backends, and one that you'll see used in real-world backends. With that metadata in place, we've also seen that it's very straightforward to tell your backend to display annotations in popup menus and metadata in the echo area. Once you've got the basic techniques under your belt, you can display anything you want as part of completion.

There are still more aspects to developing company-mode backends, but with what we've covered in this series you can get very far. More importantly, you know the main concepts and infrastructure for the backends, so you can learn the rest on your own. If you want to delve into all of the gory details, you'll need to read the company-mode source code, and specifically the documentation for company-backends. [6]

For an example of a fairly full-featured backend implementation that's currently in use (and under active development), you can see the emacs-ycmd project. [7] Happy hacking!

[1]The first article in this series..
[2]The summaries are simply the first sentences of the respective Wikipedia articles.
[3]The Emacs manual entry on the echo area.
[4]Fuzzy matching is commonly used for completion tools because it addresses common cases where users transpose characters, accidentally leave characters out, or consciously leverage fuzzy matching for increased speed.
[5]The details of why this is the case are murky, but the company-mode source code specifically states this. The same source also says that we technically should be implementing a response to match, but that doesn't seem to affect this implementation.
[6]The company-mode project page.
[7]The emacs-ycmd project is on github. In particular, see company-ycmd.el.