Branching 0 – Git 1

Chris Oldwood from The OldWood Thing

My recent tirade against unnecessary branching – “Git is Not the Problem” – might have given the impression that I don’t appreciate the power that git provides. That’s not true and hopefully the following example highlights the appreciation I have for the power git provides but also why I dislike being put in that position in the first place.

The Branching Strategy

I was working in a small team with a handful of experienced developers making an old C++/ATL based GUI more accessible for users with disabilities. Given the codebase was very mature and maintenance was minimal, our remit only extended so far as making the minimal changes we needed to both the code and resource files. Hence this effectively meant no refactoring – a strictly surgical approach.

The set-up involved an integration branch per-project with us on one and the client’s team on another – master was reserved for releases. However, as they were using Stash for their repos they also wanted us to make use of its ability to create separate pull requests (PR) for every feature. This meant we needed to create independent branches for every single feature as we didn’t have permission to push directly to the integration branch even if we wanted to.

The Bottleneck

For those who haven’t had the pleasure of working with Visual Studio and C++/ATL on a native GUI with other people, there are certain files which tend to be a bottleneck, most notably resource.h. This file contains the mapping for the symbols (nay #defines) to the resource file IDs. Whenever you add a new resource, such as a localizable string, you add a new symbol and bump the two “next ID” counters at the bottom. This project ended up with us adding a lot of new resource strings for the various (localizable) annotations we used to make the various dialog controls more accessible [1].

Aside from the more obvious bottleneck this resource.h file creates, in terms of editing it in a team scenario, it also has one other undesirable effect – project rebuilds. Being a header file, and also one that has a habit of being used across most of the codebase (whether intentionally or not) if it changes then most of the codebase needs re-building. On a GUI of the size we were working on, using the development VMs we had been provided, this amounted to 45 minutes of thumb twiddling every time it changed. As an aside we couldn’t use the built-in Visual Studio editor either as the file had been edited by hand for so long that when it was saved by the editor you ended up with the diff from hell [2].

The Side-Effects

Consequently we ran into two big problems working on this codebase that were essentially linked to that one file. The first was that adding new resources meant updating the file in a way that was undoubtedly going to generate a merge conflict with every other branch because most tasks meant adding new resources. Even though we tried to coordinate ourselves by introducing padding into the file and artificially bumping the IDs we still ended up causing merge conflicts most of the time.

In hindsight we probably could have made this idea work if we added a huge amount of padding up front and reserved a large range of IDs but we knew there was another team adding GUI stuff on another branch and we expected to integrate with them more often than we did. (We had no real contact with them and the plethora of open branches made it difficult to see what code they were touching.)

The second issue was around the rebuilds. While you can git checkout –b <branch> to create your feature branch without touching resource.h again, the moment you git pull the integration branch and merge you’re going to have to take the hit [3]. Once your changes are integrated and you push your feature branch to the git server it does the integration branch merge for you and moves it forward.

Back on your own machine you want to re-sync by switching back to the integration branch, which I’d normally do with:

> git checkout <branch>
> git pull --ff-only

…except the first step restores the old resource.h before updating it again in the second step back to where you just were! Except now we’ve got another 45 minute rebuild on our hands [3].

Git to the Rescue

It had been some years since any of us had used Visual Studio on such a large GUI and therefore it took us a while to work out why the codebase always seemed to want rebuilding so much. Consequently I looked to the Internet to see if there was a way of going from my feature branch back to the integration branch (which should be identical from a working copy perspective) without any files being touched. It’s git, of course there was a way, and “Fast-forwarding a branch without checking it out” provided the answer [4]:

> git fetch origin <branch>:<branch>
> git checkout <branch>

The trick is to fetch the branch changes from upstream and point the local copy of that branch to its tip. Then, when you do checkout, only the branch metadata needs to change as the versions of the files are identical and nothing gets touched (assuming no other upstream changes have occurred in the meantime).

Discontinuous Integration

In a modern software development world where we strive to integrate as frequently as possible with our colleagues it’s issues like these that remind us what some of the barriers are for some teams. Visual C++ has been around a long time (since 1993) so this problem is not new. It is possible to break up a GUI project – it doesn’t need to have a monolithic resource file – but that requires time & effort to fix and needs to be done in a timely fashion to reap the rewards. In a product this old which is effectively on life-support this is never going to happen now.

As Gerry Weinberg once said “Things are the way they are because they got that way” which is little consolation when the clock is ticking and you’re watching the codebase compile, again.

 

[1] I hope to write up more on this later as the information around this whole area for native apps was pretty sparse and hugely diluted by the same information for web apps.

[2] Luckily it’s a fairly easy format but laying out controls by calculating every window rectangle is pretty tedious. We eventually took a hybrid approach for more complex dialogs where we used the resource editor first, saved our code snippet, reverted all changes, and then manually pasted our snippet back in thereby keeping the diff minimal.

[3] Yes, you can use touch to tweak the file’s timestamp but you need to be sure you can get away with that by working out what the effects might be.

[4] As with any “googling” knowing what the right terms are, to ask the right question, is the majority of the battle.

Choosing “a” Database, not “the” Database

Chris Oldwood from The OldWood Thing

One thing I’ve run across a few times over the years is the notion that an application or system has one, and only one, database product. It’s as if the answer to the question about where we should store our data must be about where we store “all” our data.

Horses for Courses

I’ve actually touched on this topic before in “Deferring the Database Choice” where our team tried to put off the question as long as possible because of a previous myopic mindset and there was a really strong possibility that we might even have a need for two different styles of database – relational and document-oriented – because we had two different types of data to store with very different constraints.

In that instance, after eventually working out what we really needed, we decided to look at a traditional relational database for the transactional data [1], while we looked towards the blossoming NoSQL crowd for the higher-volume non-transactional data. While one might have sufficed for both purposes the organisational structure and lack of operational experience at the time meant we didn’t feel comfortable putting all our eggs in that one NoSQL basket up front.

As an aside the Solution Architect [2] who was assigned to our team by the client definitely seemed out of their comfort zone with the notion that we might want to use different products for different purposes.

Platform Investment

My more recent example of this line of reasoning around “the one size fits all” misnomer was while doing some consulting at a firm in the insurance sector, an area where mainframes and legacy systems pervade the landscape.

In this particular case I had been asked to help advise on the architecture of a few new internal services they were planning. Two were really just caches of upstream data designed to reduce the per-cost call of 3rd party services while the third would serve up flood related data which was due to be incorporated into insurance pricing.

To me they all seemed like no-brainers. Even the flood data service just felt like it was probably a simple web service (maybe REST) that looks up the data in a document oriented database based on the postcode key. The volume of requests and size of the dataset did not seem remarkable in any way, nor the other caches. The only thing that I felt deserved any real thought was around the versioning of the data, if that was even a genuine consideration. (I was mostly trying to think of any potential risks that might vaguely add to the apparent lack of complexity.)

Given the company already called out from its mainframe to other web services they had built, this was a solved problem, and therefore I felt there was no reason not to start knocking up the flood data service which, given its simplicity, could be done outside-in so that they’d have their first microservice built TDD-style (an approach they wanted to try out anyway). They could even plug it in pretty quickly and just ignore the responses back to the mainframe in the short term so that they could start getting a feel for the operational aspects. In essence it seemed the perfect learning opportunity for many new skills within the department.

An Undercurrent

However, while I saw this as a low-risk venture there were questions from further up effectively about choosing the database. I suspected there were concerns about the cost but some rudimentary calculations based around a three-node cluster with redundant disks versus storage for the mainframe showed that they weren’t even in the same ballpark and we’re not even talking SSDs here either. (This also ignores the fact that they were close to maxing out the mainframe anyway.)

One of the great things about databases in these modern times is that you can download the binaries and just fire one up and get playing. Given the dataset fitted the document-oriented paradigm and there were no transactions to speak of I suggested they pick either MongoDB or Couchbase and just get started as it was the paradigm they most needed to get acquainted with, the specific vendor (to me) was less of a concern in the shorter term as the data model was simple.

Nevertheless, rather than build something first and get a feel for what makes most sense, they wanted to invite the various big NoSQL vendors in and discuss contracts and products up-front. So I arranged for the three main contenders at the time to visit the company’s offices and give a pitch, followed by some Q&A time for the management to ask any burning questions. It was during the first of these three pitches that I began to realise where the disconnect lay between my vision and theirs.

While I had always been working on the assumption that the company was most comfortable with mainframes and relational databases and that they wanted to step outside that and move to a less monolithic architecture, perhaps using the Strangler Pattern to break out the peripheral services into independent self-contained ones, they still saw a single database product sitting at the heart. Yes, the services might be built separately, and the data may well be partitioned via namespaces or collections or whatever, but fundamentally the assumption was that the data storage was still effectively monolithic.

A False Economy

In retrospect I shouldn’t really have been that surprised. The reason the mainframe had probably survived for so long was that the data was seen as the crown jewels and the problems of redundancy and backup had been solved long ago and were pretty robust. In fact if anything went wrong the vendor could helicopter some experts in (which they had done in the past). This was not the level of service offered by the new kids on the block and the company was still far from getting comfortable with cloud hosting and managed service providers which were are starting to spring up.

Hence, where I was looking at the somewhat disposable nature of the new services purely as an opportunity for learning, others higher up were looking at it as a stepping stone to moving all their data across to another platform. Coupled with this was the old-fashioned view that the decision needed to be made up-front and needed to be the right one from the off [3].

A Different Investment

Even with this misconception acknowledged and the shining cost savings to be had there was still a heavy reluctance to go with something new. I believe that in the end they put their investment into more mainframe storage instead of investing in their people and the organisation’s longer term future.

 

[1] There was definitely an element of “availability bias” here as the organisation had a volume licensing agreement with a relational database vendor.

[2] A role which highlighted their Ivory Tower approach at the time but has since fallen away as architecture has thankfully started leaning more towards shared ownership.

[3] Some of the impetus for “Don’t Fail Fast, Learn Cheaply” came from conversations I had with this organisation about their approach to career development.