Building Emacs 25.2 on XUbuntu 17.04

Timo Geusch from The Lone C++ Coder's Blog

I haven’t done much with Ubuntu recently, but had to set up a laptop with XUbuntu 17.04. That came with Emacs 24.5 as the default emacs package, and as skeeto pointed out in the comments, with a separate emacs25 package for Emacs 25.1. I tend to run the latest release Emacs everywhere out of habit, […]

The post Building Emacs 25.2 on XUbuntu 17.04 appeared first on The Lone C++ Coder's Blog.

Automate Only What You Need To

Chris Oldwood from The OldWood Thing

The meme tells us to “automate all the things” and it’s a noble cause which has sprung up as a backlash against the ridiculous amount of manual work we’ve often had to do in the past. However in our endeavour to embrace the meme we should not go overboard and lose sight of what we’re automating and why.

The Value

The main reason we tend to automate things is to save ourselves time (and by extension, money) by leveraging tools that can perform tasks quicker than we can, but also with more determinism and reliability, thereby saving even more time. For example, pasting a complex set of steps off a wiki page into a command prompt to perform a task is slower than an interpreter running a script and is fraught with danger as we might screw up at various points along the way and so end up not doing exactly what we’d intended. Ultimately computers are good at boring repetitive tasks whilst we humans are not.

However if we only do this operation once every six months and there are too many potential points of failure we might spend far longer trying to automate it than it actually takes to do carefully, manually. It’s a classic trade-off and like most things in IT there are some XKCD’s for that – “Automation” and “Is It Worth the Time”. They make sobering reading when you’re trying to work out how much time you might save automating something and therefore also gives a good indication of the maximum amount of time you should spend on achieving that.

Orchestration First, Actor Later

Where I think the meme starts to break down is when we get this balance wrong and begin to lose sight of where the real value is, thereby wasting time trying to automate not only all the steps but also wire it into some job scheduling system (e.g. CI server) so that once in a blue moon we can push a button and the whole task from start to finish is executed for us without further intervention.

The dream suggests at that point we can go off and do something else more valuable instead. Whilst this notion of autonomy is idyllic it can also come with a considerable extra up-front cost and any shortcuts are likely to buy us false security (i.e. it silently fails and we lose time investigating downstream failures instead).

For example there are many crude command prompt one-liners I’ve written in the past to pick up common mistakes that are trivial for me to run because they’ve been written to automate the expensive bit, not the entire problem. I often rely on my own visual system to filter out the noise and compensate for the impurities within the process. Removing these wrinkles is often where the proverbial “last 10% that takes 90% of the time” goes [1].

It’s all too easy to get seduced by the meme and believe that no automation task is truly complete until it’s fully automated.

An Example

In .Net when you publish shared libraries as NuGet packages you have a .nuspec file which lists the package dependencies. The library .csproj build file also has project dependencies for use with compilation. However these two sets of dependencies should be kept in sync [2].

Initially with only a couple of NuGet packages it was easy to do manually as I knew it was unlikely to change. However once the monolithic library got split it up the dependencies started to grow and manually comparing the relevant sections got harder and more laborious.

Given the text based nature of the two files (XML) it was pretty easy to write a simple shell one-liner to grep the values from the two sets of relevant XML tags, dump them in a file, and then use diff to show a side-by-side comparison. Then it just needed wrapping in a for loop to traverse the solution workspace.

Because the one-liner was mine I got to take various shortcuts like hardcoding the input path and temporary files along with “knowing” that a certain project was always misreported. At this point a previously manual process has largely been automated and as long as I run it regularly will catch any mistakes.

Of course it’s nice to share things like this so that others can take advantage after I’m gone, and it might be even better if the process can be added as a build step so that it’s caught the moment the problem surfaces rather than later in response to a more obscure issue. Now things begin to get tricky and we start to see diminishing returns.

First, the Gnu on Windows (GoW) toolset I used isn’t standard on Windows so now I need to make the one-liner portable or make everyone else match my tooling choice [3]. I also need to fix the hard coded paths and start adding a bit of error handling. I also need to find a way to remove the noise caused by the one “awkward” project.

None of this is onerous, but this all takes time and whilst I’m doing it I’m not doing something (probably) more valuable. The majority of the value was in being able to scale out this safety check, there is (probably) far less value in making it portable and making it run reliably as part of an automated build. This is because essentially it only needs to be run whenever the project dependencies change and that was incredibly rare once the initial split was done. Additionally the risk of not finding an impedance mismatch was small and should be caught by other automated aspects of the development process, i.e. the deployment and test suite.

Knowing When to Automate More

This scenario of cobbling something together and then finding you need to do it more often is the bread and butter of build & deployment pipelines. You often start out with a bunch of hacked together scripts which do just enough to allow the team to bootstrap itself in to an initial fluid state of delivery. This is commonly referred to as a walking skeleton because it forms the basis for the entire development process.

The point of starting with the walking skeleton rather than just diving headlong into features is to try and tackle some of the problems that historically got left until it was too late, such as packaging and deployment. In the modern era of continuous delivery we look to deliver a thin slice of functionality quickly and then build upon it piecemeal.

However it’s all too easy to get bogged down early on in a project and spend lots of time just getting the build pipeline up and running and have nothing functional to show for it. This has always made me feel a little uncomfortable as it feels as though we should be able to get away with far less than perhaps we think we need to.

In “Building the Pipeline - Process Led or Automation Led” and my even earlier post “Layered Builds” I’ve tried to promote a more organic approach that focuses on what I think really matters most which is a consistent and extensible approach. In essence we focus first on producing a simple, repeatable process that can be used locally to enable the application skeleton to safely evolve and then balance the need for automating this further along with the other features. If quality or speed of delivery drops and more automation looks to be the answer then it can be added with the knowledge that it’s being done for deliberate reasons, rather than because we’ve got carried away gold plating the build system based on what other people think it should do (i.e. a cargo cult mentality).

Technical Risk

The one caveat to being leaner about your automation is that you may (accidentally) put off addressing one or more technical risks because you don’t perceive them as risks. This leads us back to why the meme exists in the first place – failing to address certain aspects of software delivery until it’s too late. If there is a technical concern, address it, but only to the extent that the risk is understood, you may not need to do anything about it now.

With a team of juniors there is likely to be far more unknowns [4] than with a team of experienced programmers, therefore the set of perceived risks will be higher. Whilst you might not know the most elegant approach to solving a problem, knowing an approach already reduces the risk because you know that you can trade technical debt in the short term for something else more valuable if necessary.

Everything is Negotiable

The thing I like most about an agile development process is that every trade-off gets put front-and-centre, everything is now negotiable [5]. Every task now comes with an implicit question: is this the most valuable thing we could be doing?

Whilst manually building a private cloud for your production system using a UI is almost certainly not the most scalable approach, neither is starting day one of a project by diving into, say, Terraform when you don’t even know what you’re supposed to be building. There is nothing wrong with starting off manually, you just need to be diligent and ensure that your decision to only automate “enough of the things” is always working in your favour.

 

[1] See “The Curse of NTLM Based HTTP Proxies”.

[2] I’m not aware of Visual Studio doing this yet although there may now be extensions and tools written by others I’m not aware of.

[3] Yes, the Unix command line tools should be ubiquitous and maybe finally they will be with Bash on Windows.

[4] See “Turning Unconscious Incompetence to Conscious Incompetence”.

[5] See “Estimating is Liberating”.

Automated Integration Testing with TIBCO

Chris Oldwood from The OldWood Thing

In the past few years I’ve worked on a few projects where TIBCO has been the message queuing product of choice within the company. Naturally being a test-oriented kind of guy I’ve used unit and component tests for much of the donkey work, but initially had to shy away from writing any automated integration tests due to the inherent difficulties of getting the system into a known state in isolation.

Organisational Barriers

For any automated integration tests to run reliably we need to control the whole environment, which ideally is our development workstations but also our CI build environment (see “The Developer’s Sandbox”). The main barriers to this with a commercial product like TIBCO are often technological, but also more often than not, organisational too.

In my experience middleware like this tends to be proprietary, very expensive, and owned within the organisation by a dedicated team. They will configure the staging and production queues and manage the fault-tolerant servers, which is probably what you’d expect as you near production. A more modern DevOps friendly company would recognise the need to allow teams to test internally first and would help them get access to the product and tools so they can build their test scaffolding that provides the initial feedback loop.

Hence just being given the client access libraries to the product is not enough, we need a way to bring up and tear down the service endpoint, in isolation, so that we can test connectivity and failover scenarios and message interoperability. We also need to be able develop and test our logic around poisoned messages and dead-letter queues. And all this needs to be automatable so that as we develop and refactor we can be sure that we’ve not broken anything; manually testing this stuff is not just not scalable in a shared test environment at the pace modern software is developed.

That said, the TIBCO EMS SDK I’ve been working with (v6.3.0) has all the parts I needed to do this stuff, albeit with some workarounds to avoid needing to run the tests with administrator rights which we’ll look into later.

The only other thorny issue is licensing. You would hope that software product companies would do their utmost to get developers on their side and make it easy for them to build and test their wares, but it is often hard to get clarity around how the product can be used outside of the final production environment. For example trying to find out if the TIBCO service can be run on a developer’s workstation or in a cloud hosted VM solely for the purposes of running some automated tests has been a somewhat arduous task.

This may not be solely the fault of the underlying product company, although the old fashioned licensing agreements often do little to distinguish production and modern development use [1]. No, the real difficulty is finding the right person within the client’s company to talk to about such matters. Unless they are au fait with the role modern automated integrated testing takes place in the development process you will struggle to convince them your intended use is in the interests of the 3rd party product, not stealing revenue from them.

Okay, time to step down from the soap box and focus on the problems we can solve…

Hosting TIBEMSD as a Windows Service

From an automated testing perspective what we need access to is the TIBEMSD.EXE console application. This provides us with one or more TIBCO message queues that we can host on our local machine. Owning thing process means we can therefore create, publish to and delete queues on demand and therefore tightly control the environment.

If you only want to do basic integration testing around the sending and receiving of messages you can configure it as a Windows service and just leave it running in the background. Then your tests can just rely on it always being there like a local database or the file-system. The build machine can be configured this way too.

Unfortunately because it’s a console application and not written to be hosted as a service (at least v6.3 isn’t), you need to use a shim like SRVANY.EXE from the Windows 2003 Resource Kit or something more modern like NSSM. These tools act as an adaptor to the console application so that the Windows SCM can control them.

One thing to be careful of when running TIBEMSD in this way is that it will stick its data files in the CWD (Current Working Directory), which for a service is %SystemRoot%\System32, unless you configure the shim to change it. Putting them in a separate folder makes them a little more obvious and easier to delete when having a clear out [2].

Running TIBEMSD On Demand

Running the TIBCO server as a service makes certain kinds of tests easier to write as you don’t have to worry about starting and stopping it, unless that’s exactly the kinds of test you want to write.

I’ve found it’s all too easy when adding new code or during a refactoring to accidentally break the service so that it doesn’t behave as intended when the network goes up and down, especially when you’re trying to handle poisoned messages.

Hence I prefer to have the TIBEMSD.EXE binary included in the source code repository, in a known place so that it can be started and stopped on demand to verify the connectivity side is working properly. For those classes of integration tests where you just need it to be running you can add it to your fixture-level setup and even keep it running across fixtures to ensure the tests running at an adequate pace.

If, like me, you don’t run as an Administrator all the time (or use elevated command prompts by default) you will find that TIBEMSD doesn’t run out-of-the-box in this way. Fortunately it’s easy to overcome these two issues and run in a LUA (Limited User Account).

Only Bind to the Localhost

One of the problems is that by default the server will try and listen for remote connections from anywhere which means it wants a hole in the firewall for its default port. This of course means you’ll get that firewall popup dialog which is annoying when trying to automate stuff. Whilst you could grant it permission with a one-off NETSH ADVFIREWALL command I prefer components in test mode to not need any special configuration if at all possible.

Windows will allow sockets that only listen for connections from the local host to avoid generating the annoying firewall popup dialog (and this was finally extended to include HTTP too). However we need to tell the TIBCO server to do just that, which we can achieve by creating a trivial configuration file (e.g. localhost.conf) with the following entry:

listen=tcp://127.0.0.1:7222

Now we just need to start it with the –conf switch:

> tibemsd.exe -config localhost.conf

Suppressing the Need For Elevation

So far so good but our other problem is that when you start TIBEMSD it wants you to elevate its permissions. I presume this is a legacy thing and there may be some feature that really needs it but so far in my automated tests I haven’t hit it.

There are a number of ways to control elevation for legacy software that doesn’t have a manifest, like using an external one, but TIBEMSD does and that takes priority. Luckily for us there is a solution in the form of the __COMPAT_LAYER environment variable [3]. Setting this, either through a batch file or within our test code, supresses the need to elevate the server and it runs happily in the background as a normal user, e.g.

> set __COMPAT_LAYER=RunAsInvoker
> tibemsd.exe -config localhost.conf

Spawning TIBEMSD From Within a Test

Once we know how to run TIBEMSD without it causing any popups we are in a position to do that from within an automated test running as any user (LUA), e.g. a developer or the build machine.

In C#, the language where I have been doing this most recently, we can either hard-code a relative path [4] to where TIBEMSD.EXE resides within the repo, or read it from the test assembly’s app.config file to give us a little more flexibility.

<appSettings>
  <add key=”tibemsd.exe”
       value=”..\..\tools\TIBCO\tibemsd.exe” />
  <add key=”conf_file”
       value=”..\..\tools\TIBCO\localhost.conf” />
</appSettings>

We can also add our special .conf file to the same folder and therefore find it in the same way. Whilst we could generate it on-the-fly it never changes so I see little point in doing this extra work.

Something to be wary of if you’re using, say, NUnit to write your integration tests is that it (and ReSharper) can copy the test assemblies to a random location to aid in insuring your tests have no accidental dependencies. In this instance we do, and a rather large one at that, so we need the relative distance between where the test assemblies are built and run (XxxIntTests\bin\Debug) and the TIBEMSD.EXE binary to remain fixed. Hence we need to disable this copying behaviour with the /noshadow switch (or “Tools | Unit Testing | Shadow-copy assemblies being tested” in ReSharper).

Given that we know where our test assembly resides we can use Assembly.GetExecutingAssembly() to create a fully qualified path from the relative one like so:

private static string GetExecutingFolder()
{
  var codebase = Assembly.GetExecutingAssembly()
                         .CodeBase;
  var folder = Path.GetDirectoryName(codebase);
  return new Uri(folder).LocalPath;
}
. . .
var thisFolder = GetExecutingFolder();
var tibcoFolder = “..\..\tools\TIBCO”;
var serverPath = Path.Combine(
            thisFolder, tibcoFolder, “tibemsd.exe”);
var configPath = Path.Combine(
            thisFolder, tibcoFolder, “localhost.conf”);

Now that we know where the binary and config lives we just need to stop the elevation by setting the right environment variable:

Environment.SetEnvironmentVariable("__COMPAT_LAYER", "RunAsInvoker");

Finally we can start the TIBEMSD.EXE console application in the background (i.e. no distracting console window) using Diagnostics.Process:

var process = new System.Diagnostics.Process
{
  StartInfo = new ProcessStartInfo(path, args)
  {
    UseShellExecute = false,
    CreateNoWindow = true,
  }
};
process.Start();

Stopping the daemon involves calling Kill(). There are more graceful ways of remotely stopping a console application which you can try first, but Kill() is always the fall-back approach and of course the TIBCO server has been designed to survive such abuse.

Naturally you can wrap this up with the Dispose pattern so that your test code can be self-contained:

// Arrange
using (RunTibcoServer())
{
  // Act
}

// Assert

Or if you want to amortise the cost of starting it across your tests you can use the fixture-level set-up and tear down:

private IDisposable _server;

[FixtureSetUp]
public void GivenMessageQueueIsAvailable()
{
  _server = RunTibcoServer();
}

[FixtureTearDown]
public void StopMessageQueue()
{
  _server?.Dispose();
  _server = null;
}

One final issue to be aware of, and it’s a common one with integration tests like this which start a process on demand, is that the server might still be running unintentionally across test runs. This can happen when you’re debugging a test and you kill the debugger whilst still inside the test body. The solution is to ensure that the server definitely isn’t already running before you spawn it, and that can be done by killing any existing instances of it:

Process.GetProcessesByName(“tibemsd”)
       .ForEach(p => p.Kill());

Naturally this is a sledgehammer approach and assumes you aren’t using separate ports to run multiple disparate instances, or anything like that.

Other Gottchas

This gets us over the biggest hurdle, control of the server process, but there are a few other little things worth noting.

Due to the asynchronous nature and potential for residual state I’ve found it’s better to drop and re-create any queues at the start of each test to flush them. I also use the Assume.That construct in the arrangement to make it doubly clear I expect the test to start with empty queues.

Also if you’re writing tests that cover background connect and failover be aware that the TIBCO reconnection logic doesn’t trigger unless you have multiple servers configured. Luckily you can specify the same server twice, e.g.

var connection= “tcp://localhost,tcp://localhost”;

If you expect your server to shutdown gracefully, even in the face of having no connection to the queue, you might find that calling Close() on the session and/or connection blocks whilst it’s trying to reconnect (at least in EMS v6.3 it does). This might not be an expected production scenario, but it can hang your tests if something goes awry, hence I’ve used a slightly distasteful workaround where the call to Close() happens on a separate thread with a timeout:

Task.Run(() => _connection.Close()).Wait(1000);

Conclusion

Writing automated integration tests against a middleware product like TIBCO is often an uphill battle that I suspect many don’t have the appetite or patience for. Whilst this post tackles the technical challenges, as they are at least surmountable, the somewhat harder problem of tackling the organisation is sadly still left as an exercise for the reader.

 

[1] The modern NoSQL database vendors appear to have a much simpler model – use it as much as you like outside production.

[2] If the data files get really large because you leave test messages in them by accident they can cause your machine to really grind after a restart as the service goes through recovery.

[3] How to Run Applications Manifested as Highest Available With a Logon Script Without Elevation for Members of the Administrators Group

[4] A relative path means the repo can then exist anywhere on the developer’s file-system and also means the code and tools are then always self-consistent across revisions.