Describing software engineering is terms of a traditional science

Derek Jones from The Shape of Code

If you were asked to describe the ‘building stuff’ side of software engineering, by comparing it with one of the traditional sciences, which science would you choose?

I think a lot of people would want to compare it with Physics. Yes, physics envy is not restricted to the softer sciences of humanities and liberal arts. Unlike physics, software engineering is not governed by a handful of simple ‘laws’, it’s a messy collection of stuff.

I used to think that biology had all the necessary important characteristics needed to explain software engineering: evolution (of code and products), species (e.g., of editors), lifespan, and creatures are built from a small set of components (i.e., DNA or language constructs).

Now I’m beginning to think that chemistry has aspects that are a better fit for some important characteristics of software engineering. Chemists can combine atoms of their choosing to create whatever molecule takes their fancy (subject to bonding constraints, a kind of syntax and semantics for chemistry), and the continuing existence of a molecule does not depend on anything outside of itself; biological creatures need to be able to extract some form of nutrient from the environment in which they live (which is also a requirement of commercial software products, but not non-commercial ones). Individuals can create molecules, but creating new creatures (apart from human babies) is still a ways off.

In chemistry and software engineering, it’s all about emergent behaviors (in biology, behavior is just too complicated to reliably say much about). In theory the properties of a molecule can be calculated from the known behavior of its constituent components (e.g., the electrons, protons and neutrons), but the equations are so complicated it’s impractical to do so (apart from the most simple of molecules; new properties of water, two atoms of hydrogen and one of oxygen, are still being discovered); the properties of programs could be deduced from the behavior its statements, but in practice it’s impractical.

What about the creative aspects of software engineering you ask? Again, chemistry is a much better fit than biology.

What about the craft aspect of software engineering? Again chemistry, or rather, alchemy.

Is there any characteristic that physics shares with software engineering? One that stands out is the ego of some of those involved. Describing, or creating, the universe nourishes large egos.

object_ptr – a safer replacement for raw pointers

Anthony Williams from Just Software Solutions Blog

Yesterday I uploaded my object_ptr<T> implementation to github under the Boost Software License.

This is an implementation of a class similar to std::experimental::observer_ptr<T> from the Library Fundamentals TS 2, but with various improvements suggested in WG21 email discussions of the feature.

The idea of std::experimental::observer_ptr<T> is that it provides a pointer-like object that does not own the pointee, and thus can be used in place of a raw pointer, but does not allow pointer arithmetic or use of delete, or pointer arithmetic, so is not as dangerous as a raw pointer. Its use also serves as documentation: this object is owned elsewhere, so explicitly check the lifetime of the pointed-to object — there is nothing to prevent a dangling std::experimental::observer_ptr<T>.

My implementation of this concept has a different name (object_ptr<T>). I feel that observer_ptr is a bad name, because it conjures up the idea of the Observer pattern, but it doesn't really "observe" anything. I believe object_ptr is better: it is a pointer to an object, so doesn't have any array-related functionality such as pointer arithmetic, but it doesn't tell you anything about ownership.

It also has slightly different semantics to std::experimental::observer_ptr: it allows incoming implicit conversions, and drops the release() member function. The implicit conversions make it easier to use as a function parameter, without losing any safety, as you can freely pass a std::shared_ptr<T>, std::unique_ptr<T>, or even a raw pointer to a function accepting object_ptr<T>. It makes it much easier to use jss::object_ptr<T> as a drop-in replacement for T* in function parameters. There is nothing you can do with a jss::object_ptr<T> that you can't do with a T*, and in fact there is considerably less that you can do: without explicitly requesting the stored T*, you can only use it to access the pointed-to object, or compare it with other pointers. The same applies with std::shared_ptr<T> and std::unique_ptr<T>: you are reducing functionality, so this is safe, and reducing typing for safe operations is a good thing.

I strongly recommend using object_ptr<T> or an equivalent implementation of the observer_ptr concept anywhere you have a non-owning raw pointer in your codebase that points to a single object.

If you have a raw pointer that does own its pointee, then I would strongly suggest finding a smart pointer class to use as a wrapper to encapsulate that ownership. For example, std::unique_ptr or std::shared_ptr with a custom deleter might well do the job.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

On An Ethereal Orrery – student

student from thus spake a.k.

My fellow students and I have lately been wondering whether we might be able to employ Professor B------'s Experimental Clockwork Mathematical Apparatus to fashion an ethereal orrery, making a model of the heavenly bodies with equations rather than brass.
In particular we have been curious as to whether we might construct such a model using nought but Sir N-----'s law of universal gravitation, which posits that those bodies are attracted to one another with a force that is proportional to the product of their masses divided by the square of the distance between them, and laws of motion, which posit that a body will remain at rest or move with constant velocity if no force acts upon it, that if a force acts upon it then it will be accelerated at a rate proportional to that force divided by its mass in the direction of that force and that it in return exerts a force of equal strength in the opposite direction.

Abstraction with Database Views

Chris Oldwood from The OldWood Thing

After being away from the relational database world for a few years it’s been interesting coming back and working on a mature system with plenty of SQL code. It’s been said that SQL is the assembly language of databases and when SQL code is written only using its primitives (types and tables) it’s easy to see why.

Way back in 2011 I wrote “The Public Interface of a Database” which was a distillation of my thoughts at the time about what I felt was generally wrong with much of the database code I saw. One aspect in particular which I felt was sorely underutilised was the use of views to build a logical model over the top of the physical model to allow a more emergent design to unfold. This post documents some of the ways I’ve found views to be beneficial in supporting a more agile approach to database design.

Views for Code Reuse

The first thing that struck me about the recent SQL code I saw was how much there was of it. Most queries were pretty verbose and as a consequence you had to work hard to comprehend what was going on. Just as you see the same tired examples around Orders => OrderItems => Products so the code had a similar set of 3 table joins over and over again as they formed the basis for so many queries.

One of the primary uses for database views is as a code reuse mechanism. Instead of copy-and-pasting the same bunch of joins everywhere:

FROM Orders o
INNER JOIN OrderItems oi
ON o.Id = oi.OrderId 
INNER JOIN Products p
ON oi.ProductId = p.Id

we could simply say:

FROM OrdersOrderItemsProducts

This one simplification reduces a lot of complexity and means that wherever we see that name we instantly recognise it without mentally working through the joins in our head. Views are composable too meaning that we can implement one view in terms of another rather than starting from scratch every time.

Naming

However, if the name OrdersOrderItemsProducts makes you wince then I don’t blame you because it’s jarring due to its length and unnaturalness. It’s a classic attempt at naming based on how it’s implemented rather than what it means.

I suspect a difficulty in naming views is part of the reason for their lack of use in some cases. For our classic example above I would probably go with OrderedProducts or ProductsOrdered. The latter is probably preferable as the point of focus is the Products “set” with the use of Orders being a means to qualify which products we’re interested in, like “users online”. Of course one could just easily say “unread messages” and therefore we quickly remember why naming is one of the two hardest problems in computer science.

Either way it’s important that we do spend the time required to name our views appropriately as they become the foundation on which we base many of our other queries.

Views for Encapsulation

Using views as a code reuse mechanism is definitely highly beneficial but where I think they start to provide more value are as a mechanism for revealing new, derived sets of data. The name ProductsOrdered is not radically different from the more long-winded OrdersOrderItemsProducts and therefore it still heavily reflects the physical relationship of the underlying tables.

Now imagine a cinema ticketing system where you have two core relationships: Venue => Screen => SeatingPlan and Film => Screening => Ticket => Seat. By navigating these two relationships it is possible to determine the occupancy of the venue, screen, showing, etc. and yet the term Occupancy says nothing about how that is achieved. In essence we have revealed a new abstraction (Occupancy) which can be independently queried and therefore elevates our thinking to a higher plane instead of getting bogged down in the lengthy chain of joins across a variety of base tables.

Views for Addressing Uncertainty

We can also turn this thinking upside down, so that rather than creating something new by hiding the underlying existing structure, we can start with something concrete and re-organise how things work underneath. This is the essence of refactoring – changing the design without changing the behaviour.

When databases were used as a point of integration this idea of hiding the underlying schema from “consumers” made sense as it gave you more room to change the schema without breaking a bunch of queries your consumers had already created. But even if you have sole control over your schema there is still a good reason why you might want to hide the schema, nay implementation, even from much of your own code.

Imagine you are developing a system where you need to keep daily versions of your customer’s details easily accessible because you regularly perform computations across multiple dates [1] and you need to use the correct version of each customer’s data for the relevant date. When you start out you may not know what the most appropriate way to store them because you do not know how frequently they change, what kinds of changes are made, or how the data will be used in practice.

If you assume that most attributes change most days you may well plump to just store them daily, in full, e.g.

| Date       | Name      | Valuation | ... | 
| 2019-03-01 | Company A | £102m     | ... |  
| 2019-03-01 | Company B | £47m      | ... |  
| 2019-03-02 | Company A | £105m     | ... |  
| 2019-03-02 | Company B | £42m      | ... |  
| 2019-03-03 | Company A | £105m     | ... |  
| 2019-03-03 | Company B | £42m      | ... |

On the contrary, if the attributes rarely change each day then maybe we can version the data instead:

| Name      | Version | Valuation | ... |
| Company A | 1       | £147m     | ... |
| Company A | 2       | £156m     | ... |
| Company B | 1       | £27m      | ... |

So far so good, but how do we track which version belongs to which date? Once again I can think of two obvious choices. The first is much like the original verbose table and we record it on a daily basis:

| Date       | Name      | Version |
| 2019-03-01 | Company A | 1       |
| 2019-03-01 | Company B | 1       |
| 2019-03-02 | Company A | 1       |
| 2019-03-02 | Company B | 2       |

The second is to coalesce dates with the same version creating a much more compact form:

| From       | To         | Name      | Version |
| 2019-03-01 | (null)     | Company A | 1       |
| 2019-03-01 | 2019-03-01 | Company B | 1       |
| 2019-03-02 | (null)     | Company B | 2       |

Notice how we have yet another design choice to make here – whether to use NULL to represent “the future”, or whether to put today’s date as the upper bound and bump it on a daily basis [2].

So, with all those choices how do we make a decision? What if we don’t need to make a decision, now? What if we Use Uncertainty as a Driver and create a design that is easily changeable when we know more about the shape of the data and how it’s used?

What we do know is that we need to process customer data on a per-date basis, therefore, instead of starting with a Customer table we start with a Customer view which has the shape we’re interested in:

| Date | Name | Valuation | ... | 

We can happily use this view wherever we like knowing that the underlying structure could change without us needing to fix up lots of code. Naturally some code will be dependent on the physical structure, but the point is that we’ve kept it to a bare minimum. If we need to transition from one design to another, but can’t take the downtime to rewrite all the data up-front, that can often be hidden behind the view too.

Views as Interfaces

It’s probably my background [3] but I can’t help but notice a strong parallel in the latter two examples with the use of interfaces in object-oriented code. George Box reminds us that “all models are wrong, but some are useful” and so we should be careful not to strain the analogy too far but I think there is some value in considering the relationship between views and tables as somewhat akin to interfaces and classes, at least for the purposes of encapsulation as described above.

On a similar note we often strive to create and use the narrowest interface that solves our problem and that should be no different in the database world either. Creating narrower interfaces (views) allows us to remain more in control of our implementation by leaking less.

One final type related comparison that I think worthy of mention is that it’s easier to spot structural problems when you have a “richer type system”, i.e. many well-named views. For example, if a query joins through ProductsOrdered to get to UserPreferences you can easily see something funky is going on.

Embracing Change

When you work alongside a database where the SQL code and schema gets refactored almost as heavily as the services that depend on it is a pleasurable experience [4]. Scott Ambler wrote a couple of books over a decade ago (Refactoring Databases: Evolutionary Database Design and Agile Database Techniques) which convinced me long ago that it was possible to design databases that could embrace change. Making judicious use of views certainly helped achieve that in part by keeping the accidental complexity down.

Admittedly performance concerns, still a dark art in the world of databases, gets in the way every now and but I’d rather try to make the database a better place for my successors rather than assume it can’t be done.

 

[1] In investment banking it’s common to re-evaluate trades and portfolios on historical dates both for regulatory and analytical purposes.

[2] Some interesting scenarios crop up here when repeatability matters and you have an unreliable upstream data source.

[3] I’m largely a self-taught, back-end developer with many years of writing C++ and C# based services.

[4] Having a large suite of database unit tests, also written in T-SQL, really helped as we could use TDD on the database schema too.

Tyranny of the backlog

Allan Kelly from Allan Kelly Associates

BurningDownTheBacklog-2019-03-14-16-19.jpg

The backlog is a great idea: all the things we think the team will build, or perhaps: things they might build, and it might contain other work, like evaluation or reviews. Yes, the backlog is a great idea, all the stuff the team might do, well perhaps not all, it is seldom complete, after all, as they say “stuff happens”.

The truth is: backlogs have a tendency to grow. All too often I find teams who are struggling under the weight of their backlog. They can’t spare time to do experiments or learn something because there is stuff to do. The backlog becomes a tyrannical ruler and all of it MUST BE DONE.

Look at that hypothetical burn-down chart above. By sprint 15 the team is well on its way to completing all the original work. But the amount of work they need to do is higher than when they started. It is not as if the team have been doing nothing. Look at the next chart, it shows how, most weeks, more work is added than is done.

DoneVAdded-2019-03-14-16-19.jpg

To my mind finding more work isn’t a problem, indeed teams should be finding more work. Problems stem from the fact that backlogs – and tracking mechanisms like burn-down charts – try to full-fill competing needs.

  1. The backlog is used as a store of ideas for work to do. This makes sense, you can’t do everything today so postpone some to the future. A backlog allows you t move some things from peak time to off-peak, although software development teams rarely seem to see off-peak time.

Plus, having a backlog makes it easier to say:

“Thanks for your suggestion Fred, I’ve added it to the backlog, I’ll let you know when we get to it.”

Rather than:

“Thanks for your e-mail Fred, we deleted it once we stopped laughing.”

It makes sense to give a new idea a quick once-over. But doing a proper analysis is time consuming:Discussing what is being asked for takes time, as does setting acceptance criteria are. And then there is business value to assess and other work priorities to consider. Therefore, put it in the backlog and do all that later (if it ever gets scheduled.)

Without a backlog we would be forced to make a binary decision: do it and do it now or reject it.

In fact the backlog can become a natural filter: as stories age in the backlog some items will jexpire. Unfortunately many Product Owners don’t feel they have the authority to delete old requests so the backlog grows and grows.

I call such a “constipated backlog”: work goes in but very little comes out. When the only way for items to leave the backlog is by doing them the rate of return falls.

  1. The backlog fills another role because so many teams are still expected to meet project success criteria which ask for “everything to be done.” The backlog becomes a tyrant when people believe that one day it will all be done. Worse still, some people plan using this assumption.

People want to know when “it” will be done, how long it will take and how much it will cost. It takes time to answer those questions and if the backlog is growing any answer is going to be wrong.

In fact, it is probably wrong to think everything will ever be done. Unless one freezes the backlog and refuses to add new work then it is likely that low value items will be postponed while new, more valuable items, will take priority.

As an industry we need to drop the idea that a backlog will ever be done: the backlog as repository of ideas is at odds with the backlog as a measure of completeness.

Think about it this way: some of the items in the backlog are very valuable. Some items are worth very little. Some will cost more effort than the benefit they bring. If we do everything then the low value and the high value will all get done. Conversely, if we encourage new ideas and weed-out as many low value items as possible our rate of return will be higher.

But very few teams follow this model. Many more teams are slaves to the backlog, and their quest for an empty backlog is doomed.


Like this post?

Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

The post Tyranny of the backlog appeared first on Allan Kelly Associates.

Altruistic innovation and the study of software economics

Derek Jones from The Shape of Code

Recently, I have been reading rather a lot of papers that are ostensibly about the economics of markets where applications, licensed under an open source license, are readily available. I say ostensibly, because the authors have some very odd ideas about the activities of those involved in the production of open source.

Perhaps I am overly cynical, but I don’t think altruism is the primary motivation for developers writing open source. Yes, there is an altruistic component, but I would list enjoyment as the primary driver; developers enjoy solving problems that involve the production of software. On the commercial side, companies are involved with open source because of naked self-interest, e.g., commoditizing software that complements their products.

It may surprise you to learn that academic papers, written by economists, tend to be knee-deep in differential equations. As a physics/electronics undergraduate I got to spend lots of time studying various differential equations (each relating to some aspect of the workings of the Universe). Since graduating, I have rarely encountered them; that is, until I started reading economics papers (or at least trying to).

Using differential equations to model problems in economics sounds like a good idea, after all they have been used to do a really good job of modeling how the universe works. But the universe is governed by a few simple principles (or at least the bit we have access to is), and there is lots of experimental data about its behavior. Economic issues don’t appear to be governed by a few simple principles, and there is relatively little experimental data available.

Writing down a differential equation is easy, figuring out an analytic solution can be extremely difficult; the Navier-Stokes equations were written down 200-years ago, and we are still awaiting a general solution (solutions for a variety of special cases are known).

To keep their differential equations solvable, economists make lots of simplifying assumptions. Having obtained a solution to their equations, there is little or no evidence to compare it against. I cannot speak for economics in general, but those working on the economics of software are completely disconnected from reality.

What factors, other than altruism, do academic economists think are of major importance in open source? No, not constantly reinventing the wheel-barrow, but constantly innovating. Of course, everybody likes to think they are doing something new, but in practice it has probably been done before. Innovation is part of the business zeitgeist and academic economists are claiming to see it everywhere (and it does exist in their differential equations).

The economics of Linux vs. Microsoft Windows is a common comparison, i.e., open vs. close source; I have not seen any mention of other open source operating systems. How might an economic analysis of different open source operating systems be framed? How about: “An economic analysis of the relative enjoyment derived from writing an operating system, Linux vs BSD”? Or the joy of writing an editor, which must be lots of fun, given how many have text editors are available.

I have added the topics, altruism and innovation to my list of indicators of poor quality, used to judge whether its worth spending more than 10 seconds reading a paper.

The ACCU’s Overload magazine

Frances Buontempo from BuontempoConsulting

ACCU is an organisation for programmers. Its original focus was C and C++, but now members use a variety of languages, talk about testing and process and how to keep learning. ACCU holds an annual conference in the UK, attended by people from around the world. There's even a YouTube channel of recorded talks from this.

As a member you get a discount for the conference, can volunteer to do book reviews, can participate in study groups, though these have been quiet lately, and get two magazines; the CVu members' magazine and Overload, which is open to anyone. There are also several local groups if you want to come along and meet us.

I've been a member for several years now. It's been a great networking opportunity and I have learn so much from other members. I love the magazines, and by starting to write for them myself, I stepped up my game. I began by writing book reviews, then tried some of the Student Code Critiques in CVu. Eventually, I wrote an Overload article, pulling together a discussion on the accu-general mailing list about floating point numbers.

I took on the role of Overload editor in 2012. We welcome articles from non-members as well as members. They are peer reviewed, meaning the author gets feedback, questions and suggestions. A surprisingly high number of writers have gone on to write books, myself included. (I mentioned I wrote a book about genetic algorithms and machine learning, yes?)

If you have an article you'd like to get published, let me know. We do accept existing blog posts, but the review team might well ask for slight improvements. There are some submission instructions here.

We welcome established writers as well as new writers. If you've never written an article, give it a go. You can learn a lot by trying to write something up. For example, as you try to explain something you may find gaps in your knowledge and understanding. Questions and suggestions from the review team will make your article better.

I love the ACCU and am looking forward to this year's conference, in just under a month. If you can't make the conference, find a local group, or consider joining the organisation. Or submit an article for Overload. Get involved.


ACCU Home page

Breakfast with Norman Wilson: Size matters! Why size determines everything…

Paul Grenyer from Paul Grenyer


Breakfast with Norman Wilson: Size matters! Why size determines everything...

When: Tuesday, 12th March @ 7.30am to 8.30am
Where: The Maids Head Hotel, Tombland, Norwich, NR3 1LB
How much: £13.95
RSVP: https://www.meetup.com/Norfolk-Developers-NorDev/events/qqwhznyzfbhb/

Size matters! Why size determines everything in your organisation.
Norman Wilson

An anthropologist and evolutionary psychologist, Dunbar's fame largely focuses around a single number, 150. The theory of Dunbar's Number posits that 150 is the number of individuals with whom any one person can maintain stable relationships.

The Perils of Multi-Phase Construction

Chris Oldwood from The OldWood Thing

I’ve never really been a fan of C#’s object initializer syntax. Yes, it’s a little more convenient to write but it has a big downside which is it forces you to make your types mutable by default. Okay, that’s a bit strong, it doesn’t force you to do anything, but it does promote that way of thinking and allows people to take advantage of mutability outside the initialisation block [1].

This post is inspired by some buggy code I encountered where my suspicion is that the subtleties of the object initialisation syntax got lost along the way and partially constructed objects eventually found their way into the wild.

No Dragons Yet

The method, which was to get the next message from a message queue, was originally written something like this:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  result = new Message
  {
    Priority = message.Priority,
    Type = GetHeader(message, “MessageType”),
    Body = message.Body, 
  };
}

return result;

This was effectively correct. I say “effectively correct” because it doesn’t contain the bug which came later but still relies on mutability which we know can be dangerous.

For example, what would happen if the GetHeader() method threw an exception? At the moment there is no error handling and so the exception propagates out the method and back up the stack. Because we make no effort to recover we let the caller decide what happens when a duff message comes in.

The Dragons Begin Circling

Presumably the behaviour when a malformed message arrived was undesirable because the method was changed slightly to include some recovery fairly soon after:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  try
  {
    result = new Message
    {
      Priority = message.Priority,
      Type = GetHeader(message, “MessageType”),
      Body = message.Body,  
    };
  }
  catch (Exception e)
  {
    Log.Error(“Invalid message. Skipping.”);
  }
}

return result;

Still no bug yet, but that catch handler falling through to the return at the bottom is somewhat questionable; we are making the reader work hard to track what happens to result under the happy / sad paths to ensure it remains correct under further change.

Object Initialisation Syntax

Before showing the bug, here’s a brief refresher on how the object initialisation syntax works under the covers [2] in the context of our example code. Essentially it invokes the default constructor first and then performs assignments on the various other properties, e.g.

var __m = new Message();
__m.Priority = message.Priority;
__m.Type = GetHeader(message, “MessageType”);
__m.Body = message.Body,  
result = __m;

Notice how the compiler introduces a hidden temporary variable during the construction which it then assigns to the target at the end? This ensures that any exceptions during construction won’t create partially constructed objects that are bound to variables by accident. (This assumes you don’t use the constructor or property setter to attach itself to any global variables either.)

Hence, with respect to our example, if any part of the initialization fails then result will be left as null and therefore the message is indeed discarded and the caller gets a null reference back.

The Dragons Surface

Time passes and the code is then updated to support a new property which is also passed via a header. And then another, and another. However, being more complicated than a simple string value the logic to parse it is placed outside the object initialisation block, like this:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  try
  {
    result = new Message
    {
      Priority = message.Priority,
      Type = GetHeader(message, “MessageType”),
      Body = message.Body,  
    };

    var str = GetHeader(message, “SomeIntValue”);
    if (str != null && TryParseInt(str, out var value))
      result.IntValue = value;

    // ... more of the same ...
  }
  catch (Exception e)
  {
    Log.Error(“Invalid message. Skipping.”);
  }
}

return result;

Now the problems start. With the latter header parsing code outside the initialisation block result is assigned a partially constructed object while the remaining parsing code runs. Any exceptions that occur [3] mean that result will be left only partially constructed and the caller will be returned the duff object because the exception handler falls out the bottom.

+1 for Tests

The reason I spotted the bug was because I was writing some tests around the code for a new header which also temporarily needed to be optional, like the others, to decouple the deployments. When running the tests there was an error displayed on the console output [4] telling me the message was being discarded, which I didn’t twig at first. It was when I added a retrospective test for the previous optional fields and I found my new one wasn’t be parsed correctly that I realised something funky was going on.

Alternatives

So, what’s the answer? Well, I can think of a number of approaches that would fix this particular code, ranging from small to large in terms of the amount of code that needs changing and our appetite for it.

Firstly we could avoid falling through in the exception handler and make it easier on the reader to comprehend what would be returned in the face of a parsing error:

catch (Exception e)  
{  
  Log.Error(“Invalid message. Skipping.”);
  return null;
}

Secondly we could reduce the scope of the result variable and return that at the end of the parsing block so it’s also clearer about what the happy path returns:

var result = new Message  
{  
  // . . .  
};

var str = GetHeader(message, “SomeIntValue”);
if (str != null && TryParseInt(str, out var value)
  result.IntValue = value;

return result;

We could also short circuit the original check too and remove the longer lived result variable altogether with:

RawMessage message = queue.Receive();

if (message == null)
    return null;

These are all quite simple changes which are also safe going forward should someone add more header values in the same way. Of course, if we were truly perverse and wanted to show how clever we were, we could fold the extra values back into the initialisation block by doing an Extract Function on the logic instead and leave the original dragons in place, e.g.

try
{  
  result = new Message  
  {  
    Priority = message.Priority,  
    Type = GetHeader(message, “MessageType”),  
    Body = message.Body,
    IntValue = GetIntHeader(message, “SomeIntValue”),
    // ... more of the same ...  
  };
}  
catch (Exception e)  
{  
  Log.Error(“Invalid message. Skipping.”);  
}

But we would never do that because the aim is to write code that helps stop people making these kinds of mistakes in the first place. If we want to be clever we should make it easier for the maintainers to fall into The Pit of Success.

Other Alternatives 

I said at the beginning that I was not a fan of mutability by default and therefore it would be remiss of me not to suggest that the entire Message type be made immutable and all properties set via the constructor instead:

result = new Message  
(  
  priority: message.Priority,  
  type: GetHeader(message, “MessageType”),  
  body: message.Body,
  IntValue: GetIntHeader(message, “SomeIntValue”),
  // ... more of the same ...  
);

Yes, adding a new property is a little more work but, as always, writing the tests to make sure it all works correctly will dominate here.

I would also prefer to see use of an Optional<> type instead of a null reference for signalling “no message” but that’s a different discussion.

Epilogue

While this bug was merely “theoretical” at the time I discovered it [5] it quickly came back to bite. A bug fix I made on the sending side got deployed before the receiving end and so the misleading error popped up in the logs after all.

Although the system appeared to be functioning correctly it had slowed down noticeably which we quickly discovered was down to the receiving process continually restarting. What I hadn’t twigged just from reading this nugget of code was that due to the catch handler falling through and passing the message on it was being acknowledged on the queue twice –– once in that catch handler, and again after processing it. This second acknowledgment attempt generated a fatal error that caused the process to restart. Deploying the fixed receiver code as well sorted the issue out.

Ironically the impetus for my blog post “Black Hole - The Fail Fast Anti-Pattern” way back in 2012 was also triggered by two-phase construction problems that caused a process to go into a nasty failure mode, but that time it processed messages much too quickly and stayed alive failing them all.

 

[1] Generally speaking the setting of multiple properties implies it’s multi-phase construction. The more common term Two-Phase Construction comes (I presume) from explicit constructor methods names like Initialise() or Create() which take multiple arguments, like the constructor, rather than setting properties one-by-one.

[2] This is based on my copy of The C# Programming Language: The Annotated Edition.

[3] When the header was missing it was passing a null byte[] reference into a UTF8 decoder which caused it to throw an ArgumentNullException.

[4] Internally it created a logger on-the-fly so it wasn’t an obvious dependency that initially needed mocking.

[5] It’s old, so possibly it did bite in the past but nobody knew why or it magically fixed itself when both ends where upgraded close enough together.