Event-Sourced Domain Models in Python at PyCon UK

Rob Smallshire from Good With Computers

At PyCon UK 2015 I led a very well attended workshop with the goal of introducing Python developers to the tried-and-tested techniques and patterns of Domain Driven Design (DDD), in particular when used as part of an event-sourced architecture.

The two-and-a-half hour workshop was comprised of excerpts from our training course DDD Patterns in Python. Although the workshop material was heavily edited and compressed from the course – I'm confident that the majority of attendees grasped the main principles.

Several attendees have since asked for the introductory slides, which preceded the exercises. Here they are:

Sixty North training materials are for individual use. For training in a commercial setting please contact us to book a training course or obtain a license for the materials.

The super() Mystery Resolved

Austin Bingham from Good With Computers

In the previous articles in this series [1] we uncovered a small mystery regarding how Python's super() works, and we looked at some of the underlying mechanics of how super() really works. In this article we'll see how those details work together to resolve the mystery.

The mystery revisited

As you'll recall, the mystery we uncovered has to do with how a single use of super() seemed to invoke two separate implementations of a method in our SortedIntList example. As a reminder, our class diagram looks like this:

Inheritance graph

SimpleList defines a simplified list API, and IntList and SortedList each enforce specific constraints on the contents of the list. The class SortedIntList inherits from both IntList and SortedList and enforces the constraints of both base classes. Interestingly, though, SortedIntList has a trivial implementation:

class SortedIntList(IntList, SortedList):

How does SortedIntList do what it does? From the code it seems that SortedIntList does no coordination between its base classes, and certainly the base classes don't know anything about each other. Yet somehow SortedIntList manages to invoke both implementations of add(), thereby enforcing all of the necessary constraints.

super() with no arguments

We've already looked at method resolution order, C3 linearization, and the general behavior of super instances. The final bit of information we need in order to resolve the mystery is how super() behaves when called with no arguments. [2] Both IntList and SortedList use super() this way, in both their initializers and in add().

For instance methods such as these, calling super() with no arguments is the same as calling super() with the method's class as the first argument and self as the second. In other words, it constructs a super proxy that uses the MRO from self starting at the class implementing the method. Knowing this, it's easy to see how, in simple cases, using super() is equivalent to "calling the base class implementation". In these cases, type(self) is the same as the class implementing the method, so the method is resolved using everything in that class's MRO except the class itself. The next entry in the MRO will, of course, be the class's first base class, so simple uses of super() are equivalent to invoking a method on the first base class.

A key point to notice here is that type(self) will not always be the same as the class implementing a specific method. If you invoke super() in a method on a class with a subclass, then type(self) may well be that subclass. You can see this in a trivial example:

>>> class Base:
...     def f(self):
...         print('Type of self in Base.f() =', type(self))
>>> class Sub(Base):
...     pass
>>> Sub().f()
Type of self in Base.f() = <class '__main__.Sub'>

Understanding this point is the final key to seeing how SortedIntList works. If type(self) in a base class is not necessarily the type of the class implementing the method, then the MRO that gets used by super() is not necessarily the MRO of the class implementing the method...it may be that of the subclass. Since the entries in type(self).mro() may include entries that are not in the MRO for the implementing class, calls to super() in a base class may resolve to implementations that are not in the base class's own MRO. In other words, Python's method resolution is - as you might have guessed - extremely dynamic and depends not just on a class's base classes but its subclasses as well.

Method resolution order to the rescue

With that in mind, let's finally see how all of the elements - MRO, no-argument super(), and multiple inheritance - coordinate to make SortedIntList work. As we've just seen, when super() is used with no arguments in an instance method, the proxy uses the MRO of self which, of course, will be that of SortedIntList in our example. So the first thing to look at is the MRO of SortedIntList itself:

>>> SortedIntList.mro()
[<class 'SortedIntList'>,
<class 'IntList'>,
<class 'SortedList'>,
<class 'SimpleList'>,
<class 'object'>]

A critical point here is that the MRO contains IntList and SortedList, meaning that both of them can contribute implementations when super() is used.

Next, let's examine the behavior of SortedIntList when we call its add() method. [3] Because IntList is the first class in the MRO which implements add(), a call to add() on a SortedIntList resolves to IntList.add():

class IntList(SimpleList):
    # ...

    def add(self, item):

And this is where things get interesting!

In IntList.add() there is a call to super().add(item), and because of how no-argument super() calls work, this is equivalent to super(IntList, self).add(item). Since type(self) == SortedIntList, this call to super() uses the MRO for SortedIntList and not just IntList. As a result, even though IntList doesn't really "know" anything about SortedList, it can access SortedList methods via a subclass's MRO.

In the end, the call to super().add(item) in IntList.add() takes the MRO of SortedIntList, find's IntList in that MRO, and uses everything after IntList in the MRO to resolve the method invocation. Since that MRO-tail looks like this:

[<class 'SortedList'>, <class 'SimpleList'>, <class 'object'>]

and since method resolution uses the first class in an MRO that implements a method, the call resolves to SortedList.add() which, of course, enforces the sorting constraint.

So by including both of its base classes in its MRO [4] - and because IntList and SortedList use super() in a cooperative way - SortedIntList ensures that both the sorting and type constraint are enforced.

No more mystery

We've seen that a subclass can leverage MRO and super() to do some pretty interesting things. It can create entirely new method resolutions for its base classes, resolutions that aren't apparent in the base class definitions and are entirely dependent on runtime context.

Used properly, this can lead to some really powerful designs. Our SortedIntList example is just one instance of what can be done. At the same time, if used naively, super() can have some surprising and unexpected effects, so it pays to think deeply about the consequences of super() when you use it. For example, if you really do want to just call a specific base class implementation, you might be better off calling it directly rather than leaving the resolution open to the whims of subclass developers. It may be cliche, but it's true: with great power comes great responsibility.

For more information on this topic, you can always see the Inheritance and Subtype Polymorphism module of our Python: Beyond the Basics course on PluralSight. [5]

[1]Part 1 and Part 2
[2]Note that calling super() with no arguments is only supported in Python 3. Python 2 users will need to use the longer explicit form.
[3]The same logic applies to it's __init__() which also involves calls to super().
[4]Thanks, of course, to how C3 works.
[5]Python: Beyond the Basics on PluralSight

Method Resolution Order, C3, and Super Proxies

Austin Bingham from Good With Computers

In the previous article in this series we looked at a seemingly simple class graph with some surprising behavior. The central mystery was how a class with two bases can seem to invoke two different method implementations with just a single invocation of super(). In order to understand how that works, we need to delve into the details of how super() works, and this involves understanding some design details of the Python language itself.

Method Resolution Order

The first detail we need to understand is the notion of method resolution order or simply MRO. Put simply a method resolution order is the ordering of an inheritance graph for the purposes of deciding which implementation to use when a method is invoked on an object. Let's look at that definition a bit more closely.

First, we said that an MRO is an "ordering of an inheritance graph". Consider a simple diamond class structure like this:

>>> class A: pass
>>> class B(A): pass
>>> class C(A): pass
>>> class D(B, C): pass

The MRO for these classes could be, in principle, any ordering of the classes A, B, C, and D (and object, the ultimate base class of all classes in Python.) Python, of course, doesn't just pick the order randomly, and we'll cover how it picks the order in a later section. For now, let's examine the MROs for our classes using the mro() class method:

>>> A.mro()
[<class '__main__.A'>,
<class 'object'>]
>>> B.mro()
[<class '__main__.B'>,
<class '__main__.A'>,
<class 'object'>]
>>> C.mro()
[<class '__main__.C'>,
<class '__main__.A'>,
<class 'object'>]
>>> D.mro()
[<class '__main__.D'>,
<class '__main__.B'>,
<class '__main__.C'>,
<class '__main__.A'>,
<class 'object'>]

We can see that all of our classes have an MRO. But what is it used for? The second half of our definition said "for the purposes of deciding which implementation to use when a method is invoked on an object". What this means is that Python looks at a class's MRO when a method is invoked on an instance of that class. Starting at the head of the MRO, Python examines each class in order looking for the first one which implements the invoked method. That implementation is the one that gets used.

For example, let's augment our simple example with a method implemented in multiple locations:

>>> class A:
...     def foo(self):
...         print('A.foo')
>>> class B(A):
...     def foo(self):
...         print('B.foo')
>>> class C(A):
...     def foo(self):
...         print('C.foo')
>>> class D(B, C):
...     pass

What will happen if we invoke foo() on an instance of D? Remember that the MRO of D was [D, B, C, A, object]. Since the first class in that sequence to support foo() is B, we would expect to see "B.foo" printed, and indeed that is exactly what happens:

>>> D().foo()

What if remove the implementation in B? We would expect to see "C.foo", which again is what happens:

>>> class A:
...     def foo(self):
...         print('A.foo')
>>> class B(A):
...     pass
>>> class C(A):
...     def foo(self):
...         print('C.foo')
>>> class D(B, C):
...     pass
>>> D().foo()

To reiterate, method resolution order is nothing more than some ordering of the inheritance graph that Python uses to find method implementations. It's a relatively simple concept, but it's one that many developers understand only intuitively and partially. But how does Python calculate an MRO? We hinted earlier – and you probably suspected – that it's not just any random ordering, and in the next section we'll look at precisely how Python does this.

C3 superclass linearization

The short answer to the question of how Python determines MRO is "C3 superclass linearization", or simply C3. C3 is an algorithm initially developed for the Dylan programming language [1], and it has since been adopted by several prominent programming languages including Perl, Parrot, and of course Python. [2] We won't go into great detail on how C3 works, though there is plenty of information on the web that can tell you everything you need to know. [3]

What's important to know about C3 is that it guarantees three important features:

  1. Subclasses appear before base classes
  2. Base class declaration order is preserved
  3. For all classes in an inheritance graph, the relative orderings guaranteed by 1 and 2 are preserved at all points in the graph.

In other words, by rule 1, you will never see an MRO where a class is preceded by one of its base classes. If you have this:

>>> class Foo(Fred, Jim, Shiela):
...     pass

you will never see an MRO where Foo comes after Fred, Jim, or Shiela. This, again, is because Fred, Jim, and Shiela are all base classes of Foo, and C3 puts base classes after subclasses.

Likewise, by rule 2, you will never see an MRO where the base classes specified to the class keyword are in a different relative order than that definition. Given the same code above, this means that you will never see and MRO with Fred after either Jim or Shiela. Nor will you see an MRO with Jim after Shiela. This is because the base class declaration order is preserved by C3.

The third constraint guaranteed by C3 simply means that the relative orderings determined by one class in an inheritance graph – i.e. the ordering constraints based on one class's base class declarations – will not be violated in any MRO for any class in that graph.

C3 limits your inheritance options

One interesting side-effect of the use of C3 is that not all inheritance graphs are legal. It's possible to construct inheritance graphs which make it impossible to meet all of the constraints of C3. When this happens, Python raises an exception and prevents the creation of the invalid class:

>>> class A:
...     pass
>>> class B(A):
...     pass
>>> class C(A):
...     pass
>>> class D(B, A, C):
...     pass
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: Cannot create a consistent method resolution
order (MRO) for bases A, C

In this case, we've asked for D to inherit from B, A, and C, in that order. Unfortunately, C3 wants to enforce two incompatible constraints in this case:

  1. It wants to put C before A because A is a base class of C
  2. It wants to put A before C because of D's base class ordering

Since these are obviously mutually exclusive states, C3 rejects the inheritance graph and Python raises a TypeError.

That's about it, really. These rules provide a consistent, predictable basis for calculating MROs. Understanding C3, or even just knowing that it exists, is perhaps not important for day-to-day Python development, but it's an interesting tidbit for those interested in the details of language design.

Super proxies

The third detail we need to understand in order to resolve our mystery is the notion of a "super proxy". When you invoke super() in Python [4], what actually happens is that you construct an object of type super. In other words, super is a class, not a keyword or some other construct. You can see this in a REPL:

>>> s = super(C)
>>> type(s)
<class 'super'>
>>> dir(s)
['__class__', '__delattr__', '__dir__',
'__doc__', '__eq__', '__format__', '__ge__',
'__get__', '__getattribute__', '__gt__',
'__hash__', '__init__', '__le__', '__lt__',
'__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__self__',
'__self_class__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__thisclass__']

In most cases, super objects have two important pieces of information: an MRO and a class in that MRO. [5] When you use an invocation like this:

super(a_type, obj)

then the MRO is that of the type of obj, and the class within that MRO is a_type. [6]

Likewise, when you use an invocation like this:

super(type1, type2)

the MRO is that of type2 and the class within that MRO is type1. [7]

Given that, what exactly does super() do? It's hard to put it in a succinct, pithy, or memorable form, but here's my best attempt so far. Given a method resolution order and a class C in that MRO, super() gives you an object which resolves methods using only the part of the MRO which comes after C.

In other words, rather than resolving method invocation using the full MRO like normal, super uses only the tail of an MRO. In all other ways, though, method resolution is occurring exactly as it normally would.

For example, suppose I have an MRO like this:

[A, B, C, D, E, object]

and further suppose that I have a super object using this MRO and the class C in this MRO. In that case, the super instance would only resolve to methods implemented in D, E, or object (in that order.) In other words, a call like this:

super(C, A).foo()

would only resolve to an implementation of foo() in D or E. [8]

Why the name super-proxy

You might wonder why we've been using the name super-proxy when discussing super instances. The reason is that instances of super are designed to respond to any method name, resolving the actual method implementation based on their MRO and class configuration. In this way, super instances act as proxies for all objects. They simply pass arguments through to an underlying implementation.

The mystery is almost resolved!

We now know everything we need to know to resolve the mystery described in the first article in this series. You can (and probably should) see if you can figure it out for yourself at this point. By applying the concepts we discussed in this article - method resolution order, C3, and super proxies - you should be able to see how SortedIntList is able to enforce the constraints of IntList and SortedList even though it only makes a single call to super().

If you'd rather wait, though, the third article in this series will lay it all out for you. Stay tuned!

[1]The Dylan programming language
[2]Presumably starting with the letter "P" is not actually a requirement for using C3 in a language.
[3]Python's introduction of C3 in version 2.3 includes a great description. You can also track down the original research describing C3.
[4]With zero or more arguments. I'm using zero here to keep things simple.
[5]In the case of an unbound super object you don't have the MRO, but that's a detail we can ignore for this article.
[6]Remember that this form of super() requires that isinstance(obj, a_type) be true.
[7]Remember that this form of super() requires that issubclass(type2, type1) be true.
[8]Since object does not (yet...though I'm working on a PEP) implement foo().

Top four JavaZone 2013 talk – The Unreasonable Effectiveness of Dynamic Typing

Rob Smallshire from Good With Computers

I'm very happy to see that my talk on The Unreasonable Effectiveness of Dynamic Typing was rated fourth of all the talks in the show. Thanks to everyone who attended and voted.

This talk is perhaps deliberately provocative, but only with the intention of provoking critical thinking and empiricism around the tools we use. I'm genuinely curious as to why programs in dynamic languages are as reliable as they are, although I confess I don't yet have many of the answers.