Comparison of Matrix events before and after “Extensible Events”

Andy Balaam from Andy Balaam's Blog

(Background: Matrix is the awesome open standard for messaging that I get to work on now that I work at Element.)

The Extensible Events (MSC1767) Matrix Spec Change proposal describes a new way of structuring events in matrix that makes it easy to send events that have multiple representations (e.g. something clever like an interactive map, and something simpler like an image of a map).

The main purpose of the change is to make it easy for clients that don’t support some amazing new feature to display something that is still useful.

Since there is an implementation of this change out in the wild (in Element), it seems reasonably likely that this change will be accepted into the Matrix spec.

I really like this change, but I find it hard to understand, so here is a simple example that I have found helpful to think it through.

An old event, and a new event

Here is an old-fashioned event, followed by a new, shiny, extensible version:

{
    "type": "m.room.message",
    "content": {
        "body": "This is the *old* way",
        "format": "org.matrix.custom.html",
        "formatted_body": "This is the <b>old</b> way",
        "msgtype": "m.text"
    },
    ... other properties not relevant to this, e.g. "sender" ...
}
{
    "type": "m.message",
    "content": {
        "m.message": [
            {"mimetype": "text/plain", "body": "This the *new* way"},
            {"mimetype": "text/html", "body": "This is the <b>new</b> way"}
        ],
    }
    ... other properties not relevant to this, e.g. "sender" ...
}

Notice that in the new extensible events, the property within content is the same as the message type (here: m.message).

The point is that as well as the primary event type (here, m.message) we can other representations of the same message, such as an image, location co-ordinates, or something completely different. The client will render the primary event type if it understands it (and is able to show it), but if not, it can look for other types that it does understand.

For example, in Polls when you send a new poll question, it could look like this:

{
    "type": "m.poll.start",
    "content": {
        "m.poll.start": {
            ... The actual poll question etc. ...
        },
        "m.message": [
            ... A text version of the question ...
        ]
    },
    ... other properties not relevant to this, e.g. "sender" ...
}

So clients that don’t know m.poll.start can still display the poll question (if they understand extensible events), instead of completely ignoring event types they don’t know about.

An abbreviated form of the new event

Of course, life is not quite as simple as that.

Because this is a lot of typing:

{
    "type": "m.message",
    "content": {
        "m.message": [
            {"mimetype": "text/plain", "body": "This the *new* way"},
            {"mimetype": "text/html", "body": "This is the <b>new</b> way"}
        ],
    }
    ... other properties not relevant to this, e.g. "sender" ...
}

We have an abbreviated form:

{
    "type": "m.message",
    "content": {
        "m.text": "This the *new* way",
        "m.html": "This is the <b>new</b> way"
    }
    ... other properties not relevant to this, e.g. "sender" ...
}

These two are exactly equivalent.

m.text is an abbreviation for an m.message containing an entry with "mimetype": "text/plain" and the relevant body. Similarly, m.html is an abbreviation for an m.message containing an entry with "mimetype": "text/html" and the relevant body. If you declare both, they effectively get squashed together into one m.message with both entries.

Those 2 are the only abbreviations listed, so they are special cases.

Backwards compatibility

Of course, life is way more complicated than that, so what we’re likely to see around if/when this gets widely adopted is some kind of mashed-together event like this:

{
    "type": "m.room.message",
    "content": {
        "msgtype": "m.text",
        "body": "Hello World",
        "format": "org.matrix.custom.html",
        "formatted_body": "<b>Hello</b> World",
        "m.text": "Hello World",
        "m.html": "<b>Hello</b> World"
    }
}

Note that the type here is m.room.message, where extensible events says it should be m.message. The idea is that an extensible-events-aware client will see "msgtype": "m.text" and know to look for m.message as the primary type. (This is further complicated here by the fact that there isn’t actually a m.message property – this is because m.text and m.html are abbreviated forms of it.)

Also, clients that want to display old events will need to preserve their code that parses the old event types in perpetuity.

Cost-effectiveness decision for fixing a known coding mistake

Derek Jones from The Shape of Code

If a mistake is spotted in the source code of a shipping software system, is it more cost-effective to fix the mistake, or to wait for a customer to report a fault whose root cause turns out to be that particular coding mistake?

The naive answer is don’t wait for a customer fault report, based on the following simplistic argument: C_{fix} < C_{find}+C_{fix}.

where: C_{fix} is the cost of fixing the mistake in the code (including testing etc), and C_{find} is the cost of finding the mistake in the code based on a customer fault report (i.e., the sum on the right is the total cost of fixing a fault reported by a customer).

If the mistake is spotted in the code for ‘free’, then C_{find}==0, e.g., a developer reading the code for another reason, or flagged by a static analysis tool.

This answer is naive because it fails to take into account the possibility that the code containing the mistake is deleted/modified before any customers experience a fault caused by the mistake; let M_{gone} be the likelihood that the coding mistake ceases to exist in the next unit of time.

The more often the software is used, the more likely a fault experience based on the coding mistake occurs; let F_{experience} be the likelihood that a fault is reported in the next time unit.

A more realistic analysis takes into account both the likelihood of the coding mistake disappearing and a corresponding fault being reported, modifying the relationship to: C_{fix} < (C_{find}+C_{fix})*{F_{experience}/M_{gone}}

Software systems are eventually retired from service; the likelihood that the software is maintained during the next unit of time, S_{maintained}, is slightly less than one.

Giving the relationship: C_{fix} < (C_{find}+C_{fix})*{F_{experience}/M_{gone}}*S_{maintained}

which simplifies to: 1 < (C_{find}/C_{fix}+1)*{F_{experience}/M_{gone}}*S_{maintained}

What is the likely range of values for the ratio: C_{find}/C_{fix}?

I have no find/fix cost data, although detailed total time is available, i.e., find+fix time (with time probably being a good proxy for cost). My personal experience of find often taking a lot longer than fix probably suffers from survival of memorable cases; I can think of cases where the opposite was true.

The two values in the ratio F_{experience}/M_{gone} are likely to change as a system evolves, e.g., high code turnover during early releases that slows as the system matures. The value of F_{experience} should decrease over time, but increase with a large influx of new users.

A study by Penta, Cerulo and Aversano investigated the lifetime of coding mistakes (detected by several tools), tracking them over three years from creation to possible removal (either fixed because of a fault report, or simply a change to the code).

Of the 2,388 coding mistakes detected in code developed over 3-years, 41 were removed as reported faults and 416 disappeared through changes to the code: F_{experience}/M_{gone} = 41/416 = 0.1

The plot below shows the survival curve for memory related coding mistakes detected in Samba, based on reported faults (red) and all other changes to the code (blue/green, code+data):

Survival curves of coding mistakes in Samba.

Coding mistakes are obviously being removed much more rapidly due to changes to the source, compared to customer fault reports.

For it to be cost-effective to fix coding mistakes in Samba, flagged by the tools used in this study (S_{maintained} is essentially one), requires: 10 < C_{find}/C_{fix}+1.

Meeting this requirement does not look that implausible to me, but obviously data is needed.

Time For A Chi Test – a.k.

a.k. from thus spake a.k.

A few months ago we explored the chi-squared distribution which describes the properties of sums of squares of standard normally distributed random variables, being those that have means of zero and standard deviations of one.
Whilst I'm very much of the opinion that statistical distributions are worth describing in their own right, the chi-squared distribution plays a pivotal role in testing whether or not the categories into which a set of observations of some variable quantity fall are consistent with assumptions about the expected numbers in each category, which we shall take a look at in this post.