Automate Only What You Need To

Chris Oldwood from The OldWood Thing

The meme tells us to “automate all the things” and it’s a noble cause which has sprung up as a backlash against the ridiculous amount of manual work we’ve often had to do in the past. However in our endeavour to embrace the meme we should not go overboard and lose sight of what we’re automating and why.

The Value

The main reason we tend to automate things is to save ourselves time (and by extension, money) by leveraging tools that can perform tasks quicker than we can, but also with more determinism and reliability, thereby saving even more time. For example, pasting a complex set of steps off a wiki page into a command prompt to perform a task is slower than an interpreter running a script and is fraught with danger as we might screw up at various points along the way and so end up not doing exactly what we’d intended. Ultimately computers are good at boring repetitive tasks whilst we humans are not.

However if we only do this operation once every six months and there are too many potential points of failure we might spend far longer trying to automate it than it actually takes to do carefully, manually. It’s a classic trade-off and like most things in IT there are some XKCD’s for that – “Automation” and “Is It Worth the Time”. They make sobering reading when you’re trying to work out how much time you might save automating something and therefore also gives a good indication of the maximum amount of time you should spend on achieving that.

Orchestration First, Actor Later

Where I think the meme starts to break down is when we get this balance wrong and begin to lose sight of where the real value is, thereby wasting time trying to automate not only all the steps but also wire it into some job scheduling system (e.g. CI server) so that once in a blue moon we can push a button and the whole task from start to finish is executed for us without further intervention.

The dream suggests at that point we can go off and do something else more valuable instead. Whilst this notion of autonomy is idyllic it can also come with a considerable extra up-front cost and any shortcuts are likely to buy us false security (i.e. it silently fails and we lose time investigating downstream failures instead).

For example there are many crude command prompt one-liners I’ve written in the past to pick up common mistakes that are trivial for me to run because they’ve been written to automate the expensive bit, not the entire problem. I often rely on my own visual system to filter out the noise and compensate for the impurities within the process. Removing these wrinkles is often where the proverbial “last 10% that takes 90% of the time” goes [1].

It’s all too easy to get seduced by the meme and believe that no automation task is truly complete until it’s fully automated.

An Example

In .Net when you publish shared libraries as NuGet packages you have a .nuspec file which lists the package dependencies. The library .csproj build file also has project dependencies for use with compilation. However these two sets of dependencies should be kept in sync [2].

Initially with only a couple of NuGet packages it was easy to do manually as I knew it was unlikely to change. However once the monolithic library got split it up the dependencies started to grow and manually comparing the relevant sections got harder and more laborious.

Given the text based nature of the two files (XML) it was pretty easy to write a simple shell one-liner to grep the values from the two sets of relevant XML tags, dump them in a file, and then use diff to show a side-by-side comparison. Then it just needed wrapping in a for loop to traverse the solution workspace.

Because the one-liner was mine I got to take various shortcuts like hardcoding the input path and temporary files along with “knowing” that a certain project was always misreported. At this point a previously manual process has largely been automated and as long as I run it regularly will catch any mistakes.

Of course it’s nice to share things like this so that others can take advantage after I’m gone, and it might be even better if the process can be added as a build step so that it’s caught the moment the problem surfaces rather than later in response to a more obscure issue. Now things begin to get tricky and we start to see diminishing returns.

First, the Gnu on Windows (GoW) toolset I used isn’t standard on Windows so now I need to make the one-liner portable or make everyone else match my tooling choice [3]. I also need to fix the hard coded paths and start adding a bit of error handling. I also need to find a way to remove the noise caused by the one “awkward” project.

None of this is onerous, but this all takes time and whilst I’m doing it I’m not doing something (probably) more valuable. The majority of the value was in being able to scale out this safety check, there is (probably) far less value in making it portable and making it run reliably as part of an automated build. This is because essentially it only needs to be run whenever the project dependencies change and that was incredibly rare once the initial split was done. Additionally the risk of not finding an impedance mismatch was small and should be caught by other automated aspects of the development process, i.e. the deployment and test suite.

Knowing When to Automate More

This scenario of cobbling something together and then finding you need to do it more often is the bread and butter of build & deployment pipelines. You often start out with a bunch of hacked together scripts which do just enough to allow the team to bootstrap itself in to an initial fluid state of delivery. This is commonly referred to as a walking skeleton because it forms the basis for the entire development process.

The point of starting with the walking skeleton rather than just diving headlong into features is to try and tackle some of the problems that historically got left until it was too late, such as packaging and deployment. In the modern era of continuous delivery we look to deliver a thin slice of functionality quickly and then build upon it piecemeal.

However it’s all too easy to get bogged down early on in a project and spend lots of time just getting the build pipeline up and running and have nothing functional to show for it. This has always made me feel a little uncomfortable as it feels as though we should be able to get away with far less than perhaps we think we need to.

In “Building the Pipeline - Process Led or Automation Led” and my even earlier post “Layered Builds” I’ve tried to promote a more organic approach that focuses on what I think really matters most which is a consistent and extensible approach. In essence we focus first on producing a simple, repeatable process that can be used locally to enable the application skeleton to safely evolve and then balance the need for automating this further along with the other features. If quality or speed of delivery drops and more automation looks to be the answer then it can be added with the knowledge that it’s being done for deliberate reasons, rather than because we’ve got carried away gold plating the build system based on what other people think it should do (i.e. a cargo cult mentality).

Technical Risk

The one caveat to being leaner about your automation is that you may (accidentally) put off addressing one or more technical risks because you don’t perceive them as risks. This leads us back to why the meme exists in the first place – failing to address certain aspects of software delivery until it’s too late. If there is a technical concern, address it, but only to the extent that the risk is understood, you may not need to do anything about it now.

With a team of juniors there is likely to be far more unknowns [4] than with a team of experienced programmers, therefore the set of perceived risks will be higher. Whilst you might not know the most elegant approach to solving a problem, knowing an approach already reduces the risk because you know that you can trade technical debt in the short term for something else more valuable if necessary.

Everything is Negotiable

The thing I like most about an agile development process is that every trade-off gets put front-and-centre, everything is now negotiable [5]. Every task now comes with an implicit question: is this the most valuable thing we could be doing?

Whilst manually building a private cloud for your production system using a UI is almost certainly not the most scalable approach, neither is starting day one of a project by diving into, say, Terraform when you don’t even know what you’re supposed to be building. There is nothing wrong with starting off manually, you just need to be diligent and ensure that your decision to only automate “enough of the things” is always working in your favour.

 

[1] See “The Curse of NTLM Based HTTP Proxies”.

[2] I’m not aware of Visual Studio doing this yet although there may now be extensions and tools written by others I’m not aware of.

[3] Yes, the Unix command line tools should be ubiquitous and maybe finally they will be with Bash on Windows.

[4] See “Turning Unconscious Incompetence to Conscious Incompetence”.

[5] See “Estimating is Liberating”.

LINQ: Did You Mean First(), or Really Single()?

Chris Oldwood from The OldWood Thing

TL;DR: if you see someone using the LINQ method First() without a comparator it’s probably a bug and they should have used Single().

I often see code where the author “knows” that a sequence (i.e. an Enumerable<T>) will result in just one element and so they use the LINQ method First() to retrieve the value, e.g.

var value = sequence.First();

However there is also the Single() method which could be used to achieve a similar outcome:

var value = sequence.Single();

So what’s the difference and why do I think it’s probably a bug if you used First?

Both First and Single have the same semantics for the case where the the sequence is empty (they throw) and similarly when the sequence contains only a single element (they return it). The difference however is when the sequence contains more than one element – First discards the extra values and Single will throw an exception.

If you’re used to SQL it’s the difference between using “top” to filter and trying extract a single scalar value from a subquery:

select top 1 x as [value] from . . .

and

select a, (select x from . . .) as [value] from . . .

(The latter tends to complain loudly if the result set from the subquery is not just a single scalar value or null.)

While you might argue that in the face of a single-value sequence both methods could be interchangeable, to me they say different things with Single begin the only “correct” choice.

Seeing First says to me that the author knows the sequence might contain multiple values and they have expressed an ordering which ensures the right value will remain after the others have been consciously discarded.

Whereas Single suggests to me that the author knows this sequence contains one (and only one) element and that any other number of elements is wrong.

Hence another big clue that the use of First is probably incorrect is the absence of a comparator function used to order the sequence. Obviously it’s no guarantee as the sequence might be being returned from a remote service or function which will do the sorting instead but I’d generally expect to see the two used together or some other clue (method or variable name, or parameter) nearby which defines the order.

The consequence of getting this wrong is that you don’t detect a break in your expectations (a multi-element sequence). If you’re lucky it will just be a test that starts failing for a strange reason, which is where I mostly see this problem showing up. If you’re unlucky then it will silently fail and you’ll be using the wrong data which will only manifest itself somewhere further down the road where it’s harder to trace back.

Generating Sequences

Anthony Williams from Just Software Solutions Blog

I was having a discussion with my son over breakfast about C++ and Python, and he asked me if C++ had anything equivalent to Python's range() function for generating a sequence of integers. I had to tell him that no, the C++ standard library didn't supply such a function, but there were algorithms for generating sequences (std::generate and std::generate_n) into an existing container, and you could write something that would provide a "virtual" container that would supply a sequence as you iterated over it with range-for.

He was happy with that, but I felt the urge to write such a container, and blog about it. There are existing implementations available (e.g. boost::counting_range), but hopefully people will find this instructive, if not useful.

Iterating over a "virtual" container

Our initial goal is to be able write

for(int x: range(10))
  std::cout<<x<<std::endl;

and have it print out the numbers 0 to 9. This requires that our new range function returns something we can use with range-for — something for which we can call begin and end to get iterators.

    class numeric_range{
    public:
        class iterator;
        iterator begin();
        iterator end();
    };

    numeric_range range(int count);

Our container is virtual, so we don't have real values we can refer to. This means our iterators must be input iterators — forward iterators need real objects to refer to. That's OK though; we're designing this for use with range-for, which only does one pass anyway.

Input Iterator requirements

All iterators have 5 associated types. They can either be provided as member types/typedefs, or by specializing std::iterator_traits<>. If you provide them as part of your iterator class, then the std::iterator_traits<> primary template works fine, so we'll do that. The types are:

iterator_category

The tag type for the iterator category: for input iterators it's std::input_iterator_tag.

value_type

This is the type of the element in the sequence. For an integer sequence, it would be int, but we'll want to allow sequences of other types, such as unsigned, double, or even a custom class.

reference

The result of operator* on an iterator. For forward iterators it has to be an lvalue reference, but for our input iterator it can be anything convertible to value_type. For simplicity, we'll make it the same as value_type.

pointer

The return type of operator-> on an iterator. value_type* works for us, and is the simplest option.

difference_type

The type used to represent the difference between two iterators. For input iterators, this is irrelevant, as they provide a one-pass abstraction with no exposed size. We can therefore use void.

The types aren't the only requirements on input iterators: we also have to meet the requirements on valid expressions.

Input Iterator expression requirements

Input iterators can be valid or invalid. If an iterator is invalid (e.g. because it was invalidated by some operation), then doing anything to it except assigning to it or destroying it is undefined behaviour, so we don't have to worry about that. All requirements below apply only to valid iterators.

Valid iterators can either refer to an element of a sequence, in which case they are dereferencable, or they can be past-the-end iterators, which are not dereferencable.

Comparisons

For iterators a and b, a==b is only a valid expression if a and b refer to the same sequence. If they do refer to the same sequence, a==b is true if and only if both a and b are past-the-end iterators, or both a and b are dereferencable iterators that refer to the same element in the sequence.

a!=b returns !(a==b), so is again only applicable if a and b are from the same sequence.

Incrementing

Only dereferencable iterators can be incremented.

If a is dereferencable, ++a must move a to the next value in the sequence, or make a a past-the-end iterator. Incrementing an iterator a may invalidate all copies of a: input iteration is single-pass, so you cannot use iterators to an element after you've incremented any iterator that referenced that element. ++a returns a reference to a.

(void)a++ is equivalent to (void)++a. The return type is unspecified, and not relevant to the increment.

Dereferencing

For a dereferencable iterator a, *a must return the reference type for that iterator, which must be convertible to the value_type. Dereferencing the same iterator multiple times without modifying the iterator must return an equivalent value each time. If a and b are iterators such that a==b is true, *a and *b must be return equivalent values.

a->m must be equivalent to (*a).m.

*a++ must return a value convertible to the value_type of our iterator, and is equivalent to calling following the function:

iterator::value_type increment_and_dereference(iterator& a) {
  iterator::value_type temp(*a);
  ++a;
  return temp;
}

This is why the return type for a++ is unspecified: for some iterators it will need to be a special type that can generate the required value on dereferencing; for others it can be a reference to a real value_type object.

Basic implementation

Our initial implementation can be really simple. We essentially just need a current value, and a final value. Our dereferencable iterators can then just hold a pointer to the range object, and past-the-end iterators can hold a null pointer. Iterators are thus equal if they are from the same range. Comparing invalidated iterators, or iterators from the wrong range is undefined behaviour, so that's OK.

class numeric_range{
    int current;
    int final;
public:
    numeric_range(int final_):current(0),final(final_){}

    iterator begin(){ return iterator(this); }
    iterator end() { return iterator(nullptr); }

    class iterator{
      numeric_range* range;
    public:
      using value_type = T;
      using reference = T;
      using iterator_category=std::input_iterator_tag;
      using pointer = T*;
      using difference_type = void;

      iterator(numeric_range* range_):range(range_){}

      int operator*() const{ return range->current; }
      int* operator->() const{ return &range->current;}

      iterator& operator++(){ // preincrement
          ++range->current;
          if(range->current==range->final)
            range=nullptr;
            return *this;
      }

      ??? operator++(int); // postincrement

      friend bool operator==(iterator const& lhs,iterator const& rhs){
        return lhs.range==rhs.range;
      }
      friend bool operator!=(iterator const& lhs,iterator const& rhs){
        return !(lhs==rhs);
      }
    };
};

numeric_range range(int max){
  return numeric_range(max);
}

Note that I've left out the definition of the iterator postincrement operator. That's because it warrants a bit more discussion.

Remember the spec for *a++: equivalent to calling

iterator::value_type increment_and_dereference(iterator& a) {
  iterator::value_type temp(*a);
  ++a;
  return temp;
}

Since this is a combination of two operators, we can't do it directly: instead, our postincrement operator has to return a special type to hold the temporary value. Our postincrement operator thus looks like this:

class numeric_range::iterator{
  class postinc_return{
    int value;
  public:
    postinc_return(int value_): value(value_){}
    int operator*(){ return value; }
  };
public:
  postinc_return operator++(int){
    postinc_return temp(range->current);
    ++*this;
    return temp;
  }
};

That's enough for our initial requirement:

int main(){
    for(int x: range(10)){
        std::cout<<x<<",";
    }
    std::cout<<std::endl;
}

will now print 0 to 9.

More complete implementation

The Python range function provides more options than just this, though: you can also specify start and end values, or start, end and step values, and the step can be negative. We might also like to handle alternative types, such as unsigned or double, and we need to ensure we handle things like empty ranges.

Alternative types is mostly easy: just make numeric_range a class template, and replace int with our new template parameter T everywhere.

Setting the initial and final values is also easy: just provide a new constructor that takes both the current and final values, rather than initializing the current value to 0.

Steps are a bit more tricky: if the final value is not initial+n*step for an integer n, then the direct comparison of current==final to check for the end of the sequence won't work, as it will always be false. We therefore need to check for current>=final, to account for overshoot. This then doesn't work for negative steps, so we need to allow for that too: if the step is negative, we check for current<=final instead.

Final code

The final code is available for download with simple examples, under the BSD license.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Visual Lint 6.0 has been released

Products, the Universe and Everything from Products, the Universe and Everything

The first public build of Visual Lint 6.0 has just been uploaded to our website. As of today, Visual Lint 6.0 replaces Visual Lint 5.5 as the current supported Visual Lint version. Customers with active Visual Lint 5.x priority support, floating and site licence subscriptions should shortly be receiving updated licence keys for the new version, and upgrades for Visual Lint 1.x, 2.x, 3.x, 4.x and 5.x per user licences should become available in our online store shortly. If you have purchased any full per-user Visual Lint licences since the start of October, your licence keys are already v6.x compatible so can start using the new version of the software right now without any need to upgrade. If you have purchased any upgrade licences during the same period, you will shortly receive new Visual Lint 6.x compatible licence keys. The new version adds more complete support for the forthcoming PC-lint Plus, as well as support for Visual Studio 2017 RC and many other improvements. Full details of the changes in this version are as follows: Host Environments:
  • Added support for analysing standalone makefile projects.
  • Added support for makefiles within custom project (.vlproj) files. This allows makefiles requiring additional parameters to be analysed more easily.
  • Added support for Visual Studio 2017 RC. Note that the IDE will currently issue a warning that the extension is not compatible during installation.
  • Speeded up the loading of solutions and workspaces by loading the projects within them in parallel.
Analysis Tools:
  • PC-lint Plus crashes ("Serious Errors") are now parsed and reported as Fatal Errors with ID 399.
  • PC-lint Plus messages 890-897 are now correctly recognised as referencing ("supplemental", in PC-lint Plus parlance) issues.
  • Added PC-lint Plus specific compiler indirect files co-rb-vs2012.lnt, co-rb-vs2013.lnt, co-rb-vs2015.lnt and co-rb-vs2017.lnt, as well as a PC-lint Plus version of lib-rb-win32.lnt. All such indirect files are installed into the Visual Lint installation folder and referenced via -i directives on the generated PC-lint Plus command line.
  • When a PC-lint Plus per-project analysis task which uses more than one thread is running, Visual Lint can now dynamically reduce the number of active analysis tasks to take account of the number of analysis threads launched by the analysis task. [Visual Lint Professional and Enterprise Editions]
  • PC-lint and PC-lint Plus project indirect (project.lnt) files are now written when analysis tasks requiring them are created rather than when a solution/workspace is loaded.
Installation:
  • If the Visual Studio plug-in is selected for installation, a VSIX extension (VisualLintPackage.vsix) supporting Visual Studio 2017 RC is now installed into the Visual Lint installation folder. If Visual Studio 2017 is installed on the target system, the installer will automatically attempt to run the appropriate instance of VsixInstaller.exe to install the extension into Visual Studio 2017 RC.
Configuration:
  • When invoked, the Configuration Wizard now opens on the Analysis Tools page. As a result, the "Introductory" page has been removed.
  • The indirect files listed in the Configuration Wizard PC-lint Compiler Configuration page are now sorted primarily by description rather than filename. This works better than sorting by filename as the naming conventions for MSVC compiler indirect files between PC-lint and PC-lint Plus are slightly different.
  • Added an "analysis threads" option to the Command Line Options page to allow the value of the PC-lint Plus -max_threads option to be set on the command line in per-project analysis. [Visual Lint Professional and Enterprise Editions]
  • Renamed the "Maximum background analysis thread count" option on the Analysis Options page to "Maximum background analysis tasks".
User Interface:
  • Added a new Projects Display to VisualLintGui to provide a way of viewing the structure and properties of projects.
  • Added a new Project Properties Dialog to provide a way of viewing (and in the case of custom projects, editing) project properties. The dialog is available via context menu commands in the Projects Display and Analysis Status Display.
  • Added a new Project Variables Dialog to display the build variables defined in the associated project configuration.
  • Added an "Issue IDs" control (similar to the one in the Analysis Statistics Display) to the Display Filter Dialog as an alternative to editing the raw issue filter directly. Note that this control will be hidden if the dialog does not have access to a list of the issues which can be detected by the current analysis tool.
  • Added a new Makefile Properties Dialog. Where appropriate, the makefile command line can be edited here and the makefile output seen immediately.
  • Replaced the discrete "Clear Text Filter" buttons in the primary displays with integrated ones built into the corresponding text filter edit controls.
  • Added infotips to the Analysis Statistics Display.
  • When the current host environment is changed (e.g. when projects for different versions of Visual Studio are loaded) the details are now reflected in the Status View.
  • The "Analysis Tool:" message displayed in the Status View when Visual Lint starts now shows which host environment/project type the active analysis tool relates to.
Commands:
  • Added an "Export Project" command to the Analysis Status Display context menu to export the current project as a custom project (.vlproj) file.
  • Added an "Export Analysis Results" context menu command to the Analysis Status Display context menu.
  • Added a "File | New Project" command to VisualLintGui to allow new custom projects to be created directly within the environment.
  • Rearranged the VisualLintGui "File" menu and added accelerators to the "File | Open | Solution/Workspace/Project..." and "File | Open | Eclipse Workspace..." commands.
  • Added a "Send Feedback" command to the main menus in VisualLintGui and the Visual Studio and Eclipse plug-ins.
Reports:
  • HTML analysis reports are now displayed in the internal browser of the host environment (if available) by default.
Bug Fixes:
  • The platform name is now inferred from the targetPlatform element while reading Eclipse C/C++ (.cproject) project files.
  • Variables within file paths are now expanded correctly while reading IAR Embedded Workbench project files.
  • The Analysis Status Display "Troubleshooting | View Project Indirect File" context menu command is now disabled if Visual Lint has not been configured for PC-lint analysis of the associated project type, or if the project is still loading.
  • Fixed a bug relating to ProjectVersion elements within Atmel AVR Studio 5.0 project files.
  • Fixed a potential bug when interfacing to CppCheck, which could have caused cppcheck.exe to be omitted from the analysis tool pathname under some conditions - potentially causing analysis errors.
Download Visual Lint 6.0.0.273