Stateful Transducers

Rob Smallshire from Good With Computers

In the previous article in this series on transducers we saw how we can develop the notion of the transducer from a single function which literally transforms reducers to a more capable protocol which supports two further capabilities: First of all, the association of initial 'seed' values with a reduction operation, and secondly the opportunity for cleanup for stateful transducers. So far, we've exercised the first capability, but not the second. To demonstrate clean-up, we need to introduce stateful transducers.

The mapping and filtering transducers we have seen so far are stateless. What this means is that the result for the current item being processed depends only on the values of the result accumulated so far and the new item. We can, however, make stateful transducers, and the fact that our Python transducers are classes makes this particularly easy, because it gives us an obvious place to store the state, in instances of those classes. Perhaps the simplest example is an enumerating transducer which keeps track of item indexes and accumulates (index, item) tuple pairs into the result:

class Enumerating:

    def __init__(self, reducer, start):
        self._reducer = reducer
        self._counter = start

    def initial(self):
        return self._reducer.initial()

    def step(self, result, item):
        index = self._counter
        self._counter += 1
        return self._reducer.step(result, (index, item))

    def complete(self, result):
        return self._reducer.complete(result)

    def enumerating(start=0):
        """Create a transducer which enumerates items."""

        def enumerating_transducer(reducer):
            return Enumerating(reducer, start)

        return enumerating_transducer

We'll use this by composing it onto the end of our existing transducer chain:

>>> square_primes_transducer = compose(
...     filtering(is_prime),
...     mapping(square))
>>>
>>> enumerated_square_primes_transducer = compose(
...     square_primes_transducer,
...     enumerating())
>>>
>>> appending_reducer = Appending()
>>>
>>> transduce(enumerated_square_primes_transducer,
...     appending_reducer,
...     range(100))
[(0, 4), (1, 9), (2, 25), (3, 49), (4, 121), (5, 169), (6, 289),
(7, 361), (8, 529), (9, 841), (10, 961), (11, 1369), (12, 1681),
(13, 1849), (14, 2209), (15, 2809), (16, 3481), (17, 3721),
(18, 4489), (19, 5041), (20, 5329), (21, 6241), (22, 6889),
(23, 7921), (24, 9409)]

Cleaning up left-over state

So far, the implementations of the complete() method in our transducers haven't been very interesting. They've simply delegated the call to next reducer in the chain. At the end of the chain, the complete() implementations of the Appending or Conjoining reducers simply return whatever was passed to them.

Sometimes, the state accumulated within the transducer needs to be returned as part of the final result. For example, consider a batching transducer which collects successive items together into non-overlapping groups of a specified size. The transducer maintains a pending batch as internal state, and when the batch has grown to the requisite size, accumulates it into the result. When we reach the end of the input data, there may be a partial batch. If our design calls for returning the partial batch, we need a way to detect the end of processing and deal with any internal state. This is where the complete() method comes into play. Here's our batching transducer and its corresponding transducer factory:

class Batching:

    def __init__(self, reducer, size):
        self._reducer = reducer
        self._size = size
        self._pending = []

    def initial(self):
        return self._reducer.initial()

    def step(self, result, item):
        self._pending.append(item)
        if len(self._pending) == self._size:
            batch = self._pending
            self._pending = []
            return self._reducer.step(result, batch)
        return result

def complete(self, result):
    r = self._reducer.step(result, self._pending) if len(self._pending) > 0 else result
    return self._reducer.complete(r)

def batching(size):
    """Create a transducer which produces non-overlapping batches."""

    if size < 1:
        raise ValueError("batching() size must be at least 1")

    def batching_transducer(reducer):
        return Batching(reducer, size)

    return batching_transducer

Here we see that the complete method, calls step() on the underlying reducer one more time to pass on the partial batch. Here it is in action:

>>> batched_primes_transducer = compose(filtering(is_prime), batching(3))
>>> transduce(batched_primes_transducer, Appending(), range(100))
[[2, 3, 5], [7, 11, 13], [17, 19, 23], [29, 31, 37], [41, 43, 47],
[53, 59, 61], [67, 71, 73], [79, 83, 89], [97]]

Notice in particular the partial batch included at the end.

With stateful transducers and special handling of result completion and clean-up in place, in the next article we'll look at how to signal and detect early termination of a reduction operation, such as occurs when searching for and finding an item in a data series.

Enriching the Transducer Protocol

Rob Smallshire from Good With Computers

In the previous article in the series we looked at improving the experience of composing transducers together in Python, by introducing a compose() function. We finished by showing this snippet, which composes a filtering transducer with a mapping transducer to produce a prime-squaring transducer. Recalling that transducers are used to transform-reducers, we pass an appending reducer to the prime-squaring transducer to get a prime-squaring-appending reducer. This is passed in turn to reduce(), along with an input range and an empty seed list for the result:

>>> reduce(compose(filtering(is_prime),
...                mapping(square))
...        (appender), # appender assumes a MUTABLE-sequence
...        range(20),
...        []) # list is a MUTABLE sequence
[4, 9, 25, 49, 121, 169, 289, 361]

And therein lies the rub. There's a fairly well disguised implicit dependency here, between the empty list we've passed as the initial value for the reduction and our passing of appender() as the ultimate reducer. We can illustrate this by passing an immutable sequence type, which doesn't support append(), rather than a mutable sequence type, which does. Look what happens if we pass in an empty tuple instead of an empty list:

>>> reduce(compose(filtering(is_prime),
...                mapping(square))
...        (appender), # appender assumes a MUTABLE-sequence
...        range(20),
...        tuple()) # tuple is an IMMUTABLE sequence
Traceback (most recent call last):
  File "", line 1, in
  File "", line 4, in filter_reducer
  File "", line 4, in map_reducer
  File "", line 2, in appender
AttributeError: 'tuple' object has no attribute 'append'

We can "fix" this by passing another reducer, rather than appender, called conjoiner [1]:

def conjoiner(result, item):
    return result + type(result)((item,))

which we can use like this:

>>> reduce(compose(filtering(is_prime),
...                mapping(square))
...        (conjoiner), # conjoiner assumes an IMMUTABLE-sequence
...        range(20),
...        tuple()) # tuple is an IMMUTABLE sequence
(4, 9, 25, 49, 121, 169, 289, 361)

Notice that the result is now enclosed in parentheses rather than square brackets, indicating that it is a tuple.

In order to address the fact that the ultimate reducer and the seed value will often need to change in sympathy, meaning that if one changes we need to remember to change the other, we'll to enrich the transducer interface. It will got from being a simple function call, to something that is at once more complex and more capable. To understand what those complexities are, we'll refer back to the Clojure archetype.

Examining the Clojure original

Our code has a very general form, but it is lacking features of the Clojure original, such as early termination of the reduction process. Let's look at the Clojure original for map [2] when called with a single argument:

(defn map
  ([f]
  (fn [rf]
    (fn
      ([] (rf))
      ([result] (rf result))
      ([result input]
        (rf result (f input)))
      ([result input & inputs]
        (rf result (apply f input inputs))))))

This may not be very clear - even if you can read Clojure! - so let's annotate it with some comments:

(defn map ;; The tranducer factory...
  ([f] ;; ...accepts a single argument 'f', the
  ;; transforming function
  (fn [rf] ;; The transducer function accepts a
    ;; reducing function 'rf'
    (fn ;; This is the reducing function returned
      ;; by the transducer

      ([] (rf)) ;; 0-arity : Forward to the zero-arity
      ;; reducing function 'rf'

      ([result] (rf result)) ;; 1-arity : Forward to the one-arity
      ;; reducing function 'rf'

      ([result input] ;; 2-arity : Perform the reduction with
        ;; one arg to 'f'
        (rf result (f input)))

      ([result input & inputs] ;; n-arity : Perform the reduction with
        ;; multiple args to 'f'
        (rf result (apply f input inputs))))))

Here's our reducing function in Python, which only implements the equivalent of the 2-arity version which performs the actual reduction:

def map_reducer(result, item):
    return reducer(result, transform(item))

The Clojure definitions of the zero- and one-arity reduction functions don't provide much clue as to what they are for - they're just contract preserving functions which forward from the 'new' reducer to the underlying reducer which has been wrapped by it.

In fact, the zero-arity function is called to produce the initial seed value when one isn't provided. For example, for addition the seed needs to be zero [3], for multiplication the seed needs to be one [4] , and in our Python examples for appending the seed should be the empty list, and for conjoining the seed should be an empty tuple. The map reducer simply delegates this to the underlying reducer, since it can't know – and indeed shouldn't know – upon which kind of data structure it is operating.

The one-arity function, which accepts only the intermediate result and no further input is used to perform transducer chain clean-up or reduction to a final result when processing of the sequence is complete or terminated early. This is useful for certain stateful transducers which need to deal with any left-over state. We'll look at some examples later.

So to document our improved understanding in comments:

(defn map ;; The tranducer factory...
  ([f] ;; ...accepts a single argument 'f', the
  ;; transforming function
  (fn [rf] ;; The transducer function accepts a
    ;; reducing function 'rf'
    (fn ;; This is the reducing function returned
      ;; by the transducer

      ([] (rf)) ;; 0-arity : Return a 'seed' value
      ;; obtained from 'rf'

      ([result] (rf result)) ;; 1-arity : Obtain final result from 'rf'
      ;; and clean-up

      ([result input] ;; 2-arity : Perform the reduction with
        ;; one arg to 'f'
        (rf result (f input)))

      ([result input & inputs] ;; n-arity : Perform the reduction with
        ;; multiple args to 'f'
        (rf result (apply f input inputs))))))

In fact, to fully implement the concepts inherent in Clojure reducers and transducers we need to do more work in our Python version to support:

  1. Explicit (although optional) association of the seed value with the reduction operation
  2. Early termination of reduction processes. For example, a search can terminate early without needing to reducing a whole series
  3. Reduction to a final value and opportunity to clean-up left-over state

Clojure supports these distinct behaviours through different arity versions of the same anonymous reducing function. Python doesn't support overloading on arity, and in any case, overloading on arity in order to support different operations can seem obtuse. [5] We have a perfectly good tool for bundling related named functions together in Python, and that tool is the class.

In the next phase, we'll convert our reducing functions into classes and necessarily replace our use of the Python Standard Library reduce() function with something a little more sophisticated which can support our new class-based reducers.

In Python, the conceptual interface to a reducer, will look like this:

class Reducer:

    def __init__(self, reducer): # Construct from reducing function
        pass

    def initial(self): # Return the initial seed value
        pass # 0-arity

    def step(self, result, item): # Next step in the reduction
        pass # 2-arity

    def complete(self, result): # Produce a final result and clean up
        pass # 1-arity

Notice that the __init__() function – and therefore the class – is a transducer. It accepts a reducer and returns a reducer!

new_reducer = Reducer(reducer)

It takes a particularly clear-minded and highly-caffeinated state to appreciate that the class is a transducer but instances of the class are reducers! In fact, we've found it so confusing, that we generally wrap the constructor call in another function with a more appropriate name:

def transducer(reducer):
    return Reducer(reducer)

More concretely, here is our mapping() transducer factory, the transducing function and the reducer it creates:

def mapping(transform):

    def mapping_transducer(reducer):
        return Mapping(reducer, transform)

    return mapping_transducer

Let's implement our Mapping reducer cum transducer class:

class Mapping:

    def __init__(self, reducer, transform):
        self._reducer = reducer
        self._transform = transform

    def initial(self):
        return self._reducer.initial()

    def step(self, result, item):
        return self._reducer.step(result, self._transform(item))

    def complete(self, result):
        return self._reducer.complete(result)

In the absence of any necessary behaviours specific to a particular reduction algorithm, the initial(), step() and complete() methods simply forward to the next reducer in the chain (self._reducer). The only behaviour here specialised for Mapping is to apply self._transform() to the item before passing the result down the chain.

And here's our filtering transducer-factory together with the Filtering reducer cum transducer:

class Filtering:

    def __init__(self, reducer, predicate):
        self._reducer = reducer
        self._predicate = predicate

    def initial(self):
        return self._reducer.initial()

    def step(self, result, item):
        return self._reducer.step(result, item) if self._predicate(item)
        else result

    def complete(self, result):
        return self._reducer.complete(result)

    def filtering(predicate):

        def filtering_transducer(reducer):
            return Filtering(reducer, predicate)

        return filtering_transducer

To allow the chain of reducers produced by our transducers to terminate in a regular reducer, such as appending, we'll replace our appending and conjoining reducing functions with classes which sport the same interface as our other reducers:

class Appending:

    def initial(self):
        return []

    def step(self, result, item):
        result.append(item)
        return result

    def complete(self, result):
        return result

class Conjoining:

    def initial(self):
        return tuple()

    def step(self, result, item):
        return result + type(result)((item,))

    def complete(self, result):
        return result

These two reducing classes have no internal state, and hence no need for initialisation functions, but crucially, we use the ability afforded by the initial() method to associate a seed value with the reducing operation. [[[Being stateless, we could have decorated the methods of these reducers with @staticmethod; we haven't done so though, to avoid detracting from the important similarity between our reducer and transducer classes.]]]

To make use of our class-based transducers, we need an alternative to reduce() which understands our new transducer/reducer protocol. Following Clojure's lead, we will call it transduce():

UNSET = object()

def transduce(transducer, reducer, iterable, init=UNSET):
    r = transducer(reducer)
    accumulator = init if (init is not UNSET) else r.initial()
    for item in iterable:
        accumulator = r.step(accumulator, item)
    return r.complete(accumulator)

We supply the reducer separately, rather than bundling it up inside the transducer object, because it contains the knowledge of how to accumulate the final result. Excluding that from our transducer definition, allows us to keep our transducer more general and reusable without committing to a particular result representation. For example, we might compose a complex transducer and want to keep that separate from whether the final result is accumulated in a list or in a tuple.

Let's try to use our new transduce() function to apply a transducer to a list of numbers. We'll do this step-by-step to keep things clear. First we'll compose the transducer from a filtering and and mapping:

>>> square_primes_transducer = compose(
...     filtering(is_prime),
...     mapping(square))

Then we'll construct the reducer which will accumulate the final result. We want a list, so we'll use Appending:

>>> appending_reducer = Appending()

Now we'll pass these to transduce():

>>> transduce(square_primes_transducer, appending_reducer, range(100))
[4, 9, 25, 49, 121, 169, 289, 361, 529, 841, 961, 1369, 1681, 1849,
2209, 2809, 3481, 3721, 4489, 5041, 5329, 6241, 6889, 7921, 9409]

Et voila!

By using transduce() and enriching our notion of what a reducer looks like, we no longer need to separately specify the seed value. If we want a tuple, we can use a different reducer:

>>> conjoining_reducer = Conjoining()
>>> transduce(square_primes_transducer, conjoining_reducer, range(100))
(4, 9, 25, 49, 121, 169, 289, 361, 529, 841, 961, 1369, 1681, 1849,
2209, 2809, 3481, 3721, 4489, 5041, 5329, 6241, 6889, 7921, 9409)

This decoupling of the transducer processing pipeline from the result type may not seem important in this example, but as we see later, it buys us a great deal of flexibility and re-use potential.

In the next article, we'll look at stateful transducers, and how having our transducers implemented as classes makes this particularly straightforward.

[1]conjoin verb To join together. There is also an equivalent Clojure function conj, and Clojure/conj is a Clojure programming conference.
[2]The definition of the mapping transducer factory source code on Github.
[3]We say the additive identity is zero.
[4]We say the multiplicative identity is one.
[5]It seems I'm not the only person who found Clojure's use of overloading by arity an impediment to understanding transducers. In fact, overloading by arity is incidental to the concept of transducers, and a curiosity of the Clojure archetype.

Improving Transducer Composition

Rob Smallshire from Good With Computers

In the previous article in this series we derived a Python implementation of transducers from first principles. We finished by showing how transducers can be composed together using regular function call application to give us a single composite reducer which can perform many operations with a single pass of reduce(). Specifically, we showed how to filter a range of integers using a primality testing predicate, and then mapped a squaring function over the primes, to give prime squares:

>>> reduce(filtering(
...     predicate=is_prime)(
...         reducer=mapping(
...             transform=square)(
...                 reducer=appender)),
...     range(20),
...     [])
[4, 9, 25, 49, 121, 169, 289, 361]

Although this clearly works, composing transducers this way quickly becomes ungainly and the code certainly has a Lisp-ish flavour. Keeping track of the parentheses in Python, when we have function calls which return functions which we immediately call, is somewhat awkward. Most functional programming languages include a function called "compose" to help with composing functions; many imperative programming languages do not, including Python, so we'll have to write one:

def compose(f, *fs):
    """Compose functions right to left.

    compose(f, g, h)(x) -> f(g(h(x)))

    Args:
    f, *fs: The head and rest of a sequence of callables. The
        rightmost function passed can accept any arguments and
        the returned function will have the same signature as
        this last provided function. All preceding functions
        must be unary.

    Returns:
        The composition of the argument functions. The returned
        function will accept the same arguments as the rightmost
        passed in function.
    """
    rfs = list(chain([f], fs))
    rfs.reverse()

    def composed(\*args, \*\*kwargs):
        return reduce(
            lambda result, fn: fn(result),
            rfs[1:],
            rfs[0](\*args, \*\*kwargs))

    return composed

The signature of compose() forces us to accept at least one argument. The first part of the function reassembles the supplied arguments into a single list and reverses it to put it in first-to-last application order. We then define the inner composed() function which uses reduce() over the list of functions to apply each in turn, carrying the intermediate results forward. Any arguments to the composed() function are forwarded to the first function in the chain.

With compose() in our armoury, it becomes somewhat easier to specify multi-step composite reductions using transducers:

>>> reduce(compose(filtering(is_prime), mapping(square))(appender), range(20), [])
[4, 9, 25, 49, 121, 169, 289, 361]

This becomes even clearer if we put in some line breaks:

>>> reduce(compose(filtering(is_prime), mapping(square)) (appender), range(20), [])
[4, 9, 25, 49, 121, 169, 289, 361]

Now it's hopefully easier to see that the call to compose() produces a new transducer to which we pass the appender reducer to get a composite reducer which performs filtering, mapping and appending in one step. It is this composite reducer we pass to reduce() along with the input range and the empty list seed value.

In the next article in our series on transducers, we'll look at fixing some of the not-so-obvious shortcomings of the previous snippet and bringing the capabilities of our Python transducers more in line with those of the Clojure originals.

Deriving Transducers from First Principles

Rob Smallshire from Good With Computers

What is a transducer?

Transducers - a portmanteau of ‘transform reducers’ - are a new functional programming concept introduced into the Clojure programming language. Although transducers are actually pretty straightforward in retrospect, wrapping your brain around them, especially if you’re not already a competent Clojureist, can be challenging. In this series of articles we'll explain transducers by implementing them from first principles in Python.

To understand transducers, we must first understand reducing functions. In a nutshell, a reducing function is any function which takes a partial result and a new piece of information to produce a new result. Reducers are what we pass to the reduce() function in Python and can be thought of as having the following structure:

(result, input) -> result

A transducer is a function which accepts a reducing function and returns a new reducing function:

((result, input) -> result)  ->  ((result, input) -> result)

A transducer function can be used to augment an existing reducing function with some additional reducing behaviour, producing a new reducing function which applies both reducing behaviours in a specific order. In this way, we can use transducers to chain arbitrary reducing functions together. This in turn is useful, because it allows us to perform complex reduction operations using only a single call to reduce(). In other words, transducers are primarily useful because they provide a means of composing reducers.

As we will see, transducers also facilitate a clear separation of concerns between the algorithmic essence of a particular reduction, and the details of from where the input data are coming from and where the output data are going to.

In this series, we'll introduce transducers by implementing them from scratch in everybody’s favourite executable pseudocode, Python. We’ll start with the familiar staples of functional programming, map(), filter() and reduce(), and derive transducers from first principles. We’ll work towards a set of general tools which work with eager collections, lazy ‘pull’ sequences, and ‘push’ event streams. Along the way we’ll cover stateful transducers and transducer composition, demonstrating that transducers are both more general, and more fundamental, than the functional programming tools baked into Python and many other languages.

By the end of this series, not only should transducers make sense to you, but you’ll have a recipe for implementing transducers in your own favourite programming language.

The main objective of this exercise is to understand transducers. A secondary objective is to see how some more advanced functional programming techniques can be realised in a language like Python. We're not particularly advocating that you should widely adopt transducers in Python instead of, say, generators. For the time being, we expect transducers in Python to remain something of curiosity. That said, if you do find any interesting applications of transducers in Python, do please let us know.

The origin of transducers

Transducers were introduced by Rich Hickey, creator of the Clojure programming language, in a blog post "Transducers are Coming" [1] on August 6th, 2014.

The best way to understand transducers is to implement them yourself. Unfortunately from an educational point of view, the Clojure implementations are mixed up with some other details of Clojure which are pretty much irrelevant to transducers, and Clojure transducers are heavily implemented using anonymous functions which makes them excessively difficult to discuss.

How does this relate to Python?

Python is a pretty straightforward language - at least on the surface, and although Python supports lambdas and other forms of anonymous functions, in this presentation we have striven to use named functions to make the intent of the code more obvious, rather than writing Python in the style of Clojure.

Both Clojure and Python happen to be dynamically typed – or uni-typed – languages, and although that's not particularly important here, it does mean that the Python version can stay fairly close to the Clojure original, and we don't need to get sidetracked into a discussion of how transducers are typed.

Reviewing Python's functional programming tools

Let's briefly review three key functional programming tools included with Python, chiefly for the benefit of those people new to Python, and specifically Python 3. We'll look at map(), filter() and reduce() in turn.

map()

First, the map() function:

>>> help(map)

Help on class map in module builtins:

class map(object)
 |  map(func, *iterables) --> map object
 |
 |  Make an iterator that computes the function using arguments from
 |  each of the iterables.  Stops when the shortest iterable is exhausted.

The map() function accepts a function - or more strictly in Python, any callable object – as its first argument, and then any number of iterable series. In Python an iterable is a sort of lazy sequence which supports only forward iteration. For now, we'll pass the built-in function len as the first argument, and we'll stick to using one iterable, which will be a list of words, produced by calling split() on a string object:

>>> map(len, "Compute the length of each word".split())
<map object at 0x102beaf28>

In fact, in Python 3, which we're using here, map() is lazy and produces an iterator. To force evaluation, we'll pass this iterator to the list constructor:

>>> list(map(len, "Compute the length of each word".split()))
[7, 3, 6, 2, 4, 4]

As expected, we get the length of each word.

filter()

Secondly, we'll look at the filter() function:

>>> help(filter)

Help on class filter in module builtins:

class filter(object)
 |  filter(function or None, iterable) --> filter object
 |
 |  Return an iterator yielding those items of iterable for which function(item)
 |  is true. If function is None, return the items that are true.

The filter() function accepts a predicate function – a function which returns True or False – and an iterable series. Each item from the series is passed in turn to the predicate, and filter() returns an iterator over a series which contains only those items from the source series which match the predicate:

>>> list(filter(lambda w: 'o' in w,
... "the quick brown fox jumps over the lazy dog".split()))
['brown', 'fox', 'over', 'dog']

Neither map() nor filter() see much use in contemporary Python owing to the capabilities of list comprehensions, which provide a more natural syntax for mapping functions over, and filtering items from, iterable series:

>>> [len(w) for w in "jackdaws love my big sphinx of quartz".split()
... if 'a' in w]
[8, 6]

reduce()

The built-in reduce() function has had a troubled life [2] in the history of Python, and in the transition from Python 2 to Python 3, it was moved out of the language proper into the Python standard library. In fact, Guido was ready to throw lambda out with the bathwater!

Let's take a closer look at reduce(), which will set us firmly on the road to understanding and implementing transducers:

>>> from functools import reduce
>>> help(reduce)

Help on built-in function reduce in module _functools:

reduce(...)
    reduce(function, sequence[, initial]) -> value

    Apply a function of two arguments cumulatively to the items of a sequence,
    from left to right, so as to reduce the sequence to a single value.
    For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
    ((((1+2)+3)+4)+5).  If initial is present, it is placed before the items
    of the sequence in the calculation, and serves as a default when the
    sequence is empty.

When we execute reduce, we can see that it evaluates its arguments eagerly, immediately producing a result.

>>> reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])
15

The most common uses of reduction in Python, such as summing a series of items, have been replaced by the sum(), any() and all() functions built into the language, so reduce() doesn't see much use either.

We can also supply a initial item to be used in front of the supplied sequence, which is especially useful if the sequence is empty:

>>> reduce(lambda x, y: x+y, [1, 2, 3, 4, 5], 10)
25
>>> reduce(lambda x, y: x+y, [], 10)
10
>>> reduce(lambda x, y: x+y, [])
Traceback (most recent call last):
  File "", line 1, in
TypeError: reduce() of empty sequence with no initial value
>>>

Notice that with an empty series, the initial value becomes necessary to avoid an error.

It can sometimes be helpful to think of the initial value as being the identity of an operation. This value depends on the operation itself; for example, the additive identity is zero, whereas the multiplicative identity is one.

You might be tempted into thinking that the initial value should be of the same type as the values in the data series, but this isn't necessarily so. For example, using the following function we can accumulate a histogram into a dictionary item by item. In this case the identity value is an empty dictionary (i.e. histogram), not an integer:

>>> def count(hist, item):
...     if item not in hist:
...         hist[item] = 0
...     hist[item] += 1
...     return hist
...
>>> reduce(count, [1, 1, 5, 6, 5, 2, 1], {})
{1: 3, 2: 1, 5: 2, 6: 1}

Note that in production code, this use of reduce() with count() would be much better handled using the Counter type [3]. from the Python Standard Library collections module!

Our lambda expression for adding two numbers and our count() function serve as so-called "reducing functions", or simply 'reducers' for short. In fact, this is what during this session we'll refer to any function that can be passed to reduce() in this way.

Those of you familiar with functional programming techniques in other languages will recognise Python's reduce() as being equivalent to Haskell's foldl, Clojure's reduce, F#'s reduce or fold, Ruby's inject, C++'s std::accumulate(), or the Aggregate() extension method from .NET's LINQ.

Reduce is more fundamental than map or filter

Reduce is interesting because it's is more fundamental than map() or filter(). This is because we can implement map() and filter() in terms of reduce(), but not vice-versa. To do so, we just need to cook up the right reducing functions and supply a suitable initial value. Here is map() implemented in terms of reduce():

"""A module t.py in which we'll do live coding."""

from functools import reduce

def my_map(transform, iterable):

    def map_reducer(sequence, item):
        sequence.append(transform(item))
        return sequence

    return reduce(map_reducer, iterable, [])

Then in the REPL:

>>> my_map(str.upper, "Reduce can perform mapping too!".split())
['REDUCE', 'CAN', 'PERFORM', 'MAPPING', 'TOO!']

Similarly we can make an implementation of filter() in terms of reduce():

def my_filter(predicate, iterable):

    def filter_reducer(sequence, item):
        if predicate(item):
            sequence.append(item)
        return sequence

    return reduce(filter_reducer, iterable, [])

Then in the REPL:

>>> my_filter(lambda x: x % 2 != 0, [1, 2, 3, 4, 5, 6, 7, 8])
[1, 3, 5, 7]

Reducers

In the two previous examples, the local functions map_reducer() and filter_reducer() were used as the reducing functions with reduce(). Our my_map() and my_filter() functions reduce one collection to another, transforming values and collecting the results together. For the time being, we'll stick with the notion that a reducer is any operation you can pass to the reduce() function, although we'll extend and refine this notion later.

Composability is perhaps the most important quality of any programming abstraction. How composable are my_map() and my_filter()? Well, of course, given a predicate function is_prime() and transforming function square() defined like this:

from math import sqrt

def is_prime(x):
  if x < 2:
      return False
  for i in range(2, int(sqrt(x)) + 1):
      if x % i == 0:
          return False
  return True


def square(x):
    return x*x

We can compose this filtering expression,

>>> my_filter(is_prime, range(100))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61,
67, 71, 73, 79, 83, 89, 97]

with a mapping expression using regular function composition:

>>> my_map(square, my_filter(is_prime, range(100)))
[4, 9, 25, 49, 121, 169, 289, 361, 529, 841, 961, 1369, 1681, 1849,
2209, 2809, 3481, 3721, 4489, 5041, 5329, 6241, 6889, 7921, 9409]

One issue here, is that since my_map() and my_filter() are implemented in terms of reduce() they are eager and so a full intermediate list of elements is created. Notably, if we'd used the built-in Python 3 versions of map() and filter() evaluation would have been lazy, and consequently more memory efficient, so we've lost an important quality here.

Furthermore, these sort of functional pipelines can get hard to read in a language like Python as the arguments to the outermost (last applied) function become ever more widely separated. Other "real" functional languages such as Haskell and F# have better support for partial function application and pipelining results through a chain of functions. To some extent, this can be solved in Python by using named intermediate results, although this trades off the readability problem against the difficulty of coming up with meaningful names for the intermediate values.

Progressively abstracting reducers away from iterables

Notice that although reduce() abstracts away from some of the details of the operation - specifically how the items are transformed and combined, it is fundamentally coupled to the notion of iterable series. Furthermore, our my_map() and my_filter() functions are closely tied to the notion of mutable sequences, since they both call append():

my_filter(is_prime, my_map(prime_factors, range(32)))

Iterables in Python can be lazy, but our my_filter() and my_map() functions eagerly return fully realised list sequences.

Let's return to our function definitions to see why:

def my_map(transform, iterable):
    def map_reducer(sequence, item):
        sequence.append(transform(item))
        return sequence
    return reduce(map_reducer, iterable, [])

def my_filter(predicate, iterable):
    def filter_reducer(sequence, item):
        if predicate(item):
            sequence.append(item)
        return sequence
    return reduce(filter_reducer, iterable, [])

The my_map() and my_filter() functions have quite a lot in common:

  • Both my_map() and my_filter() call reduce()
  • Both my_map() and my_filter() supply a mutable sequence – specifically an empty list – as the initial value to reduce()
  • Both map_reducer() and filter_reducer() call append(), thereby expecting a mutable sequence

Let's progressively refactor these functions, extracting the common elements as we go. We'll start by moving the use of reduce() out of the functions completely. Instead we'll simply return the inner reducing function object, which can then be used externally. Our outer my_map() and my_filter() functions now become what are essentially reducer factories; we'll reflect this in the code by renaming them to make_mapper() and make_filterer():

def make_mapper(transform):

    def map_reducer(sequence, item):
        sequence.append(transform(item))
        return sequence

    return map_reducer

def make_filterer(predicate):

    def filter_reducer(sequence, item):
        if predicate(item):
            sequence.append(item)
        return sequence

    return filter_reducer

These can be used with individually with reduce() like this:

.. code-block:: python
>>> from t import *
>>> reduce(make_mapper(square), range(10), [])
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> reduce(make_filterer(is_prime), range(100), [])
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61,
67, 71, 73, 79, 83, 89, 97]

Unfortunately, it's not possible to compose the mapper and filterer into a single function which can be used with a single application of reduce(). If we want to perform both mapping and filtering we must still call reduce() twice.

We've pointed out that, both of our inner reducing functions depend on their first argument supporting an append() method, so we must pass a suitable mutable sequence type – such as a list – as the seed item for reduce(). In this case, append() is just a tool for combining an existing partial result (the list) with a new piece of data (the new item).

Let's abstract away that dependency on sequences, by arranging to pass in this "combining function", which in the case of appending to a sequence, will look like this:

def appender(result, item):
    result.append(item)
    return result

Look carefully at our combining function and recall that 'reducing' is just a synonymn for 'combining'. This innocuous function, which does nothing more than wrap a method call and return the modified sequence, is more interesting than it might first appear, because it too is a reducing-function that takes an intermediate result (the sequence) and the next piece of input (the item) and returns the next result for any subsequent iterations. Our combining function appender() is itself a reducer! We can prove this to ourselves by, using this roundabout way of constructing a list from a range:

>>> from t import *
>>> reduce(appender, range(5), [])
[0, 1, 2, 3, 4]

Having identified appender() as a reducer, we'll add another layer of abstraction, by using another layer of nested functions. The new layer affords us the opportunity to pass in a reducing function, such as appender(). We use a separate layer of inner functions, rather than adding an argument to the outer function, because the point in our client code at which we want to specify the transform or predicate is different from the point at which we want to specify the reducer:

We've made the following changes to make_mapper():

def mapping(transform):

    def make_mapping_reducer(reducer):

        def map_reducer(result, item):
            return reducer(result, transform(item))

        return map_reducer

    return make_mapping_reducer
  1. Renamed make_mapper() to mapping()
  2. Added a new inner function around map_reducer() called make_mapping_reducer(). This function accepts the 'combining' reducer and returns the map_reducer function.
  3. Adjusted the outer mapping() function to return the map_reducer function.
  4. Reworked the implementation of map_reducer() to be in terms of the passed-in combining reducer instead of append(), thereby breaking the dependency on sequence behaviour.
  5. Renamed the sequence argument of map_reducer to result, since it contains the partial result and is no longer necessarily a sequence.

And we've made the following changes to the following changes to make_filterer():

def filtering(predicate):

    def make_filtering_reducer(reducer):

        def filter_reducer(result, item):
            return reducer(result, item) if predicate(item) else result

        return filter_reducer

    return make_filtering_reducer
  1. Renamed make_filterer() to filtering()
  2. Added a new inner function around filter_reducer() called make_filtering_reducer(). This function accepts the combining reducer and returns the filter_reducer function.
  3. Adjusted the outer filtering() function to return the make_filtering_reducer function.
  4. Renamed the sequence argument of map_reducer to result.
  5. Reworked the implementation of filter_reducer() to be in terms of the passed-in combining reducer instead of append(), thereby breaking the dependency on sequence behaviour.
  6. Used a conditional expression rather than a conditional statement.

Finally: Transducers!

Recall that the word 'transducer' is a convenience (and contrivance) for functions which "transform reducers". Look closely at our inner functions make_mapping_reducer() and make_filtering_reducer(). Both of these functions accept a reducer and return a new reducer based on the one passed in. These inner functions are therefore our transducers; let's rename them as such to make this completely clear:

def mapping(transform):

    def mapping_transducer(reducer):

        def map_reducer(result, item):
            return reducer(result, transform(item))

        return map_reducer

    return mapping_transducer

def filtering(predicate):

    def filtering_transducer(reducer):

        def filter_reducer(result, item):
            return reducer(result, item) if predicate(item) else result

        return filter_reducer

    return filtering_transducer

The transducers identified, the outermost functions mapping() and filtering() are exposed for what they are: transducer factories.

The wonderful thing about such transducers is that they can be composed in such a way that the operations proceed element by element, without the need for intermediate sequences. This is achievable because the combining reducer passed to a transducer needn't be our appender() it could be any reducing function, with appender() only being used at the "bottom" of the composite reducer to put place final results into an actual data structure, such as a list. Transducer composition allows us to chain multiple reducing operations together into a single composite reducer.

Let's give our transducers a whirl, using named function arguments for clarity:

>>> from t import \*
>>> reduce(filtering(predicate=is_prime)(reducer=appender), range(20), [])
[2, 3, 5, 7, 11, 13, 17, 19]

Note that we pass the last reduction operation we want to perform on each item (appending) as a reducer to the first reduction operation we want to perform (primality testing). This is the opposite of normal function composition.

Instead of appender() we can pass a different reducer - in this case one produced by transducer made by the mapping() transducer factory:

>>> reduce(filtering(
... predicate=is_prime)(
... reducer=mapping(
... transform=square)(
... reducer=appender)),
... range(20),
... [])
[4, 9, 25, 49, 121, 169, 289, 361]

What's important to recognise here, is that we have composed the filtering and mapping operations to produce the effects of both using a single pass over the input sequence with a single application of reduce().

In the next article in this series, will look at improving on this effective, but somewhat clunky way of composing transducers.

[1]Transducers are Coming: The blog post introducing transducers to the world.
[2]The fate of reduce() in Python 3000, the Artima Developer posting in which Guido van Rossum explains his "reasons for dropping lambda, map() and filter()" from what was to become Python 3. He goes on "I expect tons of disagreement in the feedback, all from ex-Lisp-or-Scheme folks". Ultimately though, the proposal was watered down.
[3]Python documentation for Counter

The super() Mystery Resolved

Austin Bingham from Good With Computers

In the previous articles in this series [1] we uncovered a small mystery regarding how Python's super() works, and we looked at some of the underlying mechanics of how super() really works. In this article we'll see how those details work together to resolve the mystery.

The mystery revisited

As you'll recall, the mystery we uncovered has to do with how a single use of super() seemed to invoke two separate implementations of a method in our SortedIntList example. As a reminder, our class diagram looks like this:

Inheritance graph

SimpleList defines a simplified list API, and IntList and SortedList each enforce specific constraints on the contents of the list. The class SortedIntList inherits from both IntList and SortedList and enforces the constraints of both base classes. Interestingly, though, SortedIntList has a trivial implementation:

class SortedIntList(IntList, SortedList):
    pass

How does SortedIntList do what it does? From the code it seems that SortedIntList does no coordination between its base classes, and certainly the base classes don't know anything about each other. Yet somehow SortedIntList manages to invoke both implementations of add(), thereby enforcing all of the necessary constraints.

super() with no arguments

We've already looked at method resolution order, C3 linearization, and the general behavior of super instances. The final bit of information we need in order to resolve the mystery is how super() behaves when called with no arguments. [2] Both IntList and SortedList use super() this way, in both their initializers and in add().

For instance methods such as these, calling super() with no arguments is the same as calling super() with the method's class as the first argument and self as the second. In other words, it constructs a super proxy that uses the MRO from self starting at the class implementing the method. Knowing this, it's easy to see how, in simple cases, using super() is equivalent to "calling the base class implementation". In these cases, type(self) is the same as the class implementing the method, so the method is resolved using everything in that class's MRO except the class itself. The next entry in the MRO will, of course, be the class's first base class, so simple uses of super() are equivalent to invoking a method on the first base class.

A key point to notice here is that type(self) will not always be the same as the class implementing a specific method. If you invoke super() in a method on a class with a subclass, then type(self) may well be that subclass. You can see this in a trivial example:

>>> class Base:
...     def f(self):
...         print('Type of self in Base.f() =', type(self))
...
>>> class Sub(Base):
...     pass
...
>>> Sub().f()
Type of self in Base.f() = <class '__main__.Sub'>

Understanding this point is the final key to seeing how SortedIntList works. If type(self) in a base class is not necessarily the type of the class implementing the method, then the MRO that gets used by super() is not necessarily the MRO of the class implementing the method...it may be that of the subclass. Since the entries in type(self).mro() may include entries that are not in the MRO for the implementing class, calls to super() in a base class may resolve to implementations that are not in the base class's own MRO. In other words, Python's method resolution is - as you might have guessed - extremely dynamic and depends not just on a class's base classes but its subclasses as well.

Method resolution order to the rescue

With that in mind, let's finally see how all of the elements - MRO, no-argument super(), and multiple inheritance - coordinate to make SortedIntList work. As we've just seen, when super() is used with no arguments in an instance method, the proxy uses the MRO of self which, of course, will be that of SortedIntList in our example. So the first thing to look at is the MRO of SortedIntList itself:

>>> SortedIntList.mro()
[<class 'SortedIntList'>,
<class 'IntList'>,
<class 'SortedList'>,
<class 'SimpleList'>,
<class 'object'>]

A critical point here is that the MRO contains IntList and SortedList, meaning that both of them can contribute implementations when super() is used.

Next, let's examine the behavior of SortedIntList when we call its add() method. [3] Because IntList is the first class in the MRO which implements add(), a call to add() on a SortedIntList resolves to IntList.add():

class IntList(SimpleList):
    # ...

    def add(self, item):
        self._validate(item)
        super().add(item)

And this is where things get interesting!

In IntList.add() there is a call to super().add(item), and because of how no-argument super() calls work, this is equivalent to super(IntList, self).add(item). Since type(self) == SortedIntList, this call to super() uses the MRO for SortedIntList and not just IntList. As a result, even though IntList doesn't really "know" anything about SortedList, it can access SortedList methods via a subclass's MRO.

In the end, the call to super().add(item) in IntList.add() takes the MRO of SortedIntList, find's IntList in that MRO, and uses everything after IntList in the MRO to resolve the method invocation. Since that MRO-tail looks like this:

[<class 'SortedList'>, <class 'SimpleList'>, <class 'object'>]

and since method resolution uses the first class in an MRO that implements a method, the call resolves to SortedList.add() which, of course, enforces the sorting constraint.

So by including both of its base classes in its MRO [4] - and because IntList and SortedList use super() in a cooperative way - SortedIntList ensures that both the sorting and type constraint are enforced.

No more mystery

We've seen that a subclass can leverage MRO and super() to do some pretty interesting things. It can create entirely new method resolutions for its base classes, resolutions that aren't apparent in the base class definitions and are entirely dependent on runtime context.

Used properly, this can lead to some really powerful designs. Our SortedIntList example is just one instance of what can be done. At the same time, if used naively, super() can have some surprising and unexpected effects, so it pays to think deeply about the consequences of super() when you use it. For example, if you really do want to just call a specific base class implementation, you might be better off calling it directly rather than leaving the resolution open to the whims of subclass developers. It may be cliche, but it's true: with great power comes great responsibility.

For more information on this topic, you can always see the Inheritance and Subtype Polymorphism module of our Python: Beyond the Basics course on PluralSight. [5]

[1]Part 1 and Part 2
[2]Note that calling super() with no arguments is only supported in Python 3. Python 2 users will need to use the longer explicit form.
[3]The same logic applies to it's __init__() which also involves calls to super().
[4]Thanks, of course, to how C3 works.
[5]Python: Beyond the Basics on PluralSight

Method Resolution Order, C3, and Super Proxies

Austin Bingham from Good With Computers

In the previous article in this series we looked at a seemingly simple class graph with some surprising behavior. The central mystery was how a class with two bases can seem to invoke two different method implementations with just a single invocation of super(). In order to understand how that works, we need to delve into the details of how super() works, and this involves understanding some design details of the Python language itself.

Method Resolution Order

The first detail we need to understand is the notion of method resolution order or simply MRO. Put simply a method resolution order is the ordering of an inheritance graph for the purposes of deciding which implementation to use when a method is invoked on an object. Let's look at that definition a bit more closely.

First, we said that an MRO is an "ordering of an inheritance graph". Consider a simple diamond class structure like this:

>>> class A: pass
...
>>> class B(A): pass
...
>>> class C(A): pass
...
>>> class D(B, C): pass
...

The MRO for these classes could be, in principle, any ordering of the classes A, B, C, and D (and object, the ultimate base class of all classes in Python.) Python, of course, doesn't just pick the order randomly, and we'll cover how it picks the order in a later section. For now, let's examine the MROs for our classes using the mro() class method:

>>> A.mro()
[<class '__main__.A'>,
<class 'object'>]
>>> B.mro()
[<class '__main__.B'>,
<class '__main__.A'>,
<class 'object'>]
>>> C.mro()
[<class '__main__.C'>,
<class '__main__.A'>,
<class 'object'>]
>>> D.mro()
[<class '__main__.D'>,
<class '__main__.B'>,
<class '__main__.C'>,
<class '__main__.A'>,
<class 'object'>]

We can see that all of our classes have an MRO. But what is it used for? The second half of our definition said "for the purposes of deciding which implementation to use when a method is invoked on an object". What this means is that Python looks at a class's MRO when a method is invoked on an instance of that class. Starting at the head of the MRO, Python examines each class in order looking for the first one which implements the invoked method. That implementation is the one that gets used.

For example, let's augment our simple example with a method implemented in multiple locations:

>>> class A:
...     def foo(self):
...         print('A.foo')
...
>>> class B(A):
...     def foo(self):
...         print('B.foo')
...
>>> class C(A):
...     def foo(self):
...         print('C.foo')
...
>>> class D(B, C):
...     pass
...

What will happen if we invoke foo() on an instance of D? Remember that the MRO of D was [D, B, C, A, object]. Since the first class in that sequence to support foo() is B, we would expect to see "B.foo" printed, and indeed that is exactly what happens:

>>> D().foo()
B.foo

What if remove the implementation in B? We would expect to see "C.foo", which again is what happens:

>>> class A:
...     def foo(self):
...         print('A.foo')
...
>>> class B(A):
...     pass
...
>>> class C(A):
...     def foo(self):
...         print('C.foo')
...
>>> class D(B, C):
...     pass
...
>>> D().foo()
C.foo

To reiterate, method resolution order is nothing more than some ordering of the inheritance graph that Python uses to find method implementations. It's a relatively simple concept, but it's one that many developers understand only intuitively and partially. But how does Python calculate an MRO? We hinted earlier – and you probably suspected – that it's not just any random ordering, and in the next section we'll look at precisely how Python does this.

C3 superclass linearization

The short answer to the question of how Python determines MRO is "C3 superclass linearization", or simply C3. C3 is an algorithm initially developed for the Dylan programming language [1], and it has since been adopted by several prominent programming languages including Perl, Parrot, and of course Python. [2] We won't go into great detail on how C3 works, though there is plenty of information on the web that can tell you everything you need to know. [3]

What's important to know about C3 is that it guarantees three important features:

  1. Subclasses appear before base classes
  2. Base class declaration order is preserved
  3. For all classes in an inheritance graph, the relative orderings guaranteed by 1 and 2 are preserved at all points in the graph.

In other words, by rule 1, you will never see an MRO where a class is preceded by one of its base classes. If you have this:

>>> class Foo(Fred, Jim, Shiela):
...     pass
...

you will never see an MRO where Foo comes after Fred, Jim, or Shiela. This, again, is because Fred, Jim, and Shiela are all base classes of Foo, and C3 puts base classes after subclasses.

Likewise, by rule 2, you will never see an MRO where the base classes specified to the class keyword are in a different relative order than that definition. Given the same code above, this means that you will never see and MRO with Fred after either Jim or Shiela. Nor will you see an MRO with Jim after Shiela. This is because the base class declaration order is preserved by C3.

The third constraint guaranteed by C3 simply means that the relative orderings determined by one class in an inheritance graph – i.e. the ordering constraints based on one class's base class declarations – will not be violated in any MRO for any class in that graph.

C3 limits your inheritance options

One interesting side-effect of the use of C3 is that not all inheritance graphs are legal. It's possible to construct inheritance graphs which make it impossible to meet all of the constraints of C3. When this happens, Python raises an exception and prevents the creation of the invalid class:

>>> class A:
...     pass
...
>>> class B(A):
...     pass
...
>>> class C(A):
...     pass
...
>>> class D(B, A, C):
...     pass
...
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: Cannot create a consistent method resolution
order (MRO) for bases A, C

In this case, we've asked for D to inherit from B, A, and C, in that order. Unfortunately, C3 wants to enforce two incompatible constraints in this case:

  1. It wants to put C before A because A is a base class of C
  2. It wants to put A before C because of D's base class ordering

Since these are obviously mutually exclusive states, C3 rejects the inheritance graph and Python raises a TypeError.

That's about it, really. These rules provide a consistent, predictable basis for calculating MROs. Understanding C3, or even just knowing that it exists, is perhaps not important for day-to-day Python development, but it's an interesting tidbit for those interested in the details of language design.

Super proxies

The third detail we need to understand in order to resolve our mystery is the notion of a "super proxy". When you invoke super() in Python [4], what actually happens is that you construct an object of type super. In other words, super is a class, not a keyword or some other construct. You can see this in a REPL:

>>> s = super(C)
>>> type(s)
<class 'super'>
>>> dir(s)
['__class__', '__delattr__', '__dir__',
'__doc__', '__eq__', '__format__', '__ge__',
'__get__', '__getattribute__', '__gt__',
'__hash__', '__init__', '__le__', '__lt__',
'__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__self__',
'__self_class__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__thisclass__']

In most cases, super objects have two important pieces of information: an MRO and a class in that MRO. [5] When you use an invocation like this:

super(a_type, obj)

then the MRO is that of the type of obj, and the class within that MRO is a_type. [6]

Likewise, when you use an invocation like this:

super(type1, type2)

the MRO is that of type2 and the class within that MRO is type1. [7]

Given that, what exactly does super() do? It's hard to put it in a succinct, pithy, or memorable form, but here's my best attempt so far. Given a method resolution order and a class C in that MRO, super() gives you an object which resolves methods using only the part of the MRO which comes after C.

In other words, rather than resolving method invocation using the full MRO like normal, super uses only the tail of an MRO. In all other ways, though, method resolution is occurring exactly as it normally would.

For example, suppose I have an MRO like this:

[A, B, C, D, E, object]

and further suppose that I have a super object using this MRO and the class C in this MRO. In that case, the super instance would only resolve to methods implemented in D, E, or object (in that order.) In other words, a call like this:

super(C, A).foo()

would only resolve to an implementation of foo() in D or E. [8]

Why the name super-proxy

You might wonder why we've been using the name super-proxy when discussing super instances. The reason is that instances of super are designed to respond to any method name, resolving the actual method implementation based on their MRO and class configuration. In this way, super instances act as proxies for all objects. They simply pass arguments through to an underlying implementation.

The mystery is almost resolved!

We now know everything we need to know to resolve the mystery described in the first article in this series. You can (and probably should) see if you can figure it out for yourself at this point. By applying the concepts we discussed in this article - method resolution order, C3, and super proxies - you should be able to see how SortedIntList is able to enforce the constraints of IntList and SortedList even though it only makes a single call to super().

If you'd rather wait, though, the third article in this series will lay it all out for you. Stay tuned!

[1]The Dylan programming language
[2]Presumably starting with the letter "P" is not actually a requirement for using C3 in a language.
[3]Python's introduction of C3 in version 2.3 includes a great description. You can also track down the original research describing C3.
[4]With zero or more arguments. I'm using zero here to keep things simple.
[5]In the case of an unbound super object you don't have the MRO, but that's a detail we can ignore for this article.
[6]Remember that this form of super() requires that isinstance(obj, a_type) be true.
[7]Remember that this form of super() requires that issubclass(type2, type1) be true.
[8]Since object does not (yet...though I'm working on a PEP) implement foo().

Rational Computational Geometry in Python

Rob Smallshire from Good With Computers

In the previous article, we looked at how a standard technique for determining the collinearity of points, based on computing the sign of the area of the triangle formed by two points on the line and a third query point. We discovered, that when used with Python's float type [1] the routine was unreliable in a region close to the line. This shortcoming has nothing to do with Python specifically and everything to do with the finite precision of the float number type. This time, we'll examine the behaviour of the algorithm more systematically using the following program:

def sign(x):
    """Determine the sign of x.

    Returns:
        -1 if x is negative, +1 if x is positive or 0 if x is zero.
    """
    return (x > 0) - (x < 0)


def orientation(p, q, r):
    """Determine the orientation of three points in the plane.

    Args:
      p, q, r: Two-tuples representing coordinate pairs of three points.

    Returns:
        -1 if p, q, r is a turn to the right, +1 if p, q, r is a turn to the
        left, otherwise 0 if p, q, and r are collinear.
    """
    d = (q&#91;0&#93; - p&#91;0&#93;) * (r&#91;1&#93; - p&#91;1&#93;) - (q&#91;1&#93; - p&#91;1&#93;) * (r&#91;0&#93; - p&#91;0&#93;)
    return sign(d)


def main():
    """
    Test whether points immediately above and below the point (0.5, 0.5)
    lie above, on, or below the line through (12.0, 12.0) and (24.0, 24.0).
    """
    px = 0.5

    pys = 0.49999999999999,
          0.49999999999999006,
          0.4999999999999901,
          0.4999999999999902,
          0.49999999999999023,
          0.4999999999999903,
          0.49999999999999034,
          0.4999999999999904,
          0.49999999999999045,
          0.4999999999999905,
          0.49999999999999056,
          0.4999999999999906,
          0.4999999999999907,
          0.49999999999999073,
          0.4999999999999908,
          0.49999999999999084,
          0.4999999999999909,
          0.49999999999999095,
          0.499999999999991,
          0.49999999999999106,
          0.4999999999999911,
          0.4999999999999912,
          0.49999999999999123,
          0.4999999999999913,
          0.49999999999999134,
          0.4999999999999914,
          0.49999999999999145,
          0.4999999999999915,
          0.49999999999999156,
          0.4999999999999916,
          0.4999999999999917,
          0.49999999999999173,
          0.4999999999999918,
          0.49999999999999184,
          0.4999999999999919,
          0.49999999999999195,
          0.499999999999992,
          0.49999999999999206,
          0.4999999999999921,
          0.4999999999999922,
          0.49999999999999223,
          0.4999999999999923,
          0.49999999999999234,
          0.4999999999999924,
          0.49999999999999245,
          0.4999999999999925,
          0.49999999999999256,
          0.4999999999999926,
          0.4999999999999927,
          0.49999999999999273,
          0.4999999999999928,
          0.49999999999999284,
          0.4999999999999929,
          0.49999999999999295,
          0.499999999999993,
          0.49999999999999306,
          0.4999999999999931,
          0.49999999999999317,
          0.4999999999999932,
          0.4999999999999933,
          0.49999999999999334,
          0.4999999999999934,
          0.49999999999999345,
          0.4999999999999935,
          0.49999999999999356,
          0.4999999999999936,
          0.49999999999999367,
          0.4999999999999937,
          0.4999999999999938,
          0.49999999999999384,
          0.4999999999999939,
          0.49999999999999395,
          0.499999999999994,
          0.49999999999999406,
          0.4999999999999941,
          0.49999999999999417,
          0.4999999999999942,
          0.4999999999999943,
          0.49999999999999434,
          0.4999999999999944,
          0.49999999999999445,
          0.4999999999999945,
          0.49999999999999456,
          0.4999999999999946,
          0.49999999999999467,
          0.4999999999999947,
          0.4999999999999948,
          0.49999999999999484,
          0.4999999999999949,
          0.49999999999999495,
          0.499999999999995,
          0.49999999999999506,
          0.4999999999999951,
          0.49999999999999517,
          0.4999999999999952,
          0.4999999999999953,
          0.49999999999999534,
          0.4999999999999954,
          0.49999999999999545,
          0.4999999999999955,
          0.49999999999999556,
          0.4999999999999956,
          0.49999999999999567,
          0.4999999999999957,
          0.4999999999999958,
          0.49999999999999584,
          0.4999999999999959,
          0.49999999999999595,
          0.499999999999996,
          0.49999999999999606,
          0.4999999999999961,
          0.49999999999999617,
          0.4999999999999962,
          0.4999999999999963,
          0.49999999999999634,
          0.4999999999999964,
          0.49999999999999645,
          0.4999999999999965,
          0.49999999999999656,
          0.4999999999999966,
          0.49999999999999667,
          0.4999999999999967,
          0.4999999999999968,
          0.49999999999999684,
          0.4999999999999969,
          0.49999999999999695,
          0.499999999999997,
          0.49999999999999706,
          0.4999999999999971,
          0.49999999999999717,
          0.4999999999999972,
          0.4999999999999973,
          0.49999999999999734,
          0.4999999999999974,
          0.49999999999999745,
          0.4999999999999975,
          0.49999999999999756,
          0.4999999999999976,
          0.49999999999999767,
          0.4999999999999977,
          0.4999999999999978,
          0.49999999999999784,
          0.4999999999999979,
          0.49999999999999795,
          0.499999999999998,
          0.49999999999999806,
          0.4999999999999981,
          0.49999999999999817,
          0.4999999999999982,
          0.4999999999999983,
          0.49999999999999833,
          0.4999999999999984,
          0.49999999999999845,
          0.4999999999999985,
          0.49999999999999856,
          0.4999999999999986,
          0.49999999999999867,
          0.4999999999999987,
          0.4999999999999988,
          0.49999999999999883,
          0.4999999999999989,
          0.49999999999999895,
          0.499999999999999,
          0.49999999999999906,
          0.4999999999999991,
          0.49999999999999917,
          0.4999999999999992,
          0.4999999999999993,
          0.49999999999999933,
          0.4999999999999994,
          0.49999999999999944,
          0.4999999999999995,
          0.49999999999999956,
          0.4999999999999996,
          0.49999999999999967,
          0.4999999999999997,
          0.4999999999999998,
          0.49999999999999983,
          0.4999999999999999,
          0.49999999999999994,  # The previous representable float < 0.5
          0.5,
          0.5000000000000001,   # The next representable float > 0.5
          0.5000000000000002,
          0.5000000000000003,
          0.5000000000000004,
          0.5000000000000006,
          0.5000000000000007,
          0.5000000000000008,
          0.5000000000000009,
          0.500000000000001,
          0.5000000000000011,
          0.5000000000000012,
          0.5000000000000013,
          0.5000000000000014,
          0.5000000000000016,
          0.5000000000000017,
          0.5000000000000018,
          0.5000000000000019,
          0.500000000000002,
          0.5000000000000021,
          0.5000000000000022,
          0.5000000000000023,
          0.5000000000000024,
          0.5000000000000026,
          0.5000000000000027,
          0.5000000000000028,
          0.5000000000000029,
          0.500000000000003,
          0.5000000000000031,
          0.5000000000000032,
          0.5000000000000033,
          0.5000000000000034,
          0.5000000000000036,
          0.5000000000000037,
          0.5000000000000038,
          0.5000000000000039,
          0.500000000000004,
          0.5000000000000041,
          0.5000000000000042,
          0.5000000000000043,
          0.5000000000000044,
          0.5000000000000046,
          0.5000000000000047,
          0.5000000000000048,
          0.5000000000000049,
          0.500000000000005,
          0.5000000000000051,
          0.5000000000000052,
          0.5000000000000053,
          0.5000000000000054,
          0.5000000000000056,
          0.5000000000000057,
          0.5000000000000058,
          0.5000000000000059,
          0.500000000000006,
          0.5000000000000061,
          0.5000000000000062,
          0.5000000000000063,
          0.5000000000000064,
          0.5000000000000066,
          0.5000000000000067,
          0.5000000000000068,
          0.5000000000000069,
          0.500000000000007,
          0.5000000000000071,
          0.5000000000000072,
          0.5000000000000073,
          0.5000000000000074,
          0.5000000000000075,
          0.5000000000000077,
          0.5000000000000078,
          0.5000000000000079,
          0.500000000000008,
          0.5000000000000081,
          0.5000000000000082,
          0.5000000000000083,
          0.5000000000000084,
          0.5000000000000085,
          0.5000000000000087,
          0.5000000000000088,
          0.5000000000000089,
          0.500000000000009,
          0.5000000000000091,
          0.5000000000000092,
          0.5000000000000093,
          0.5000000000000094,
          0.5000000000000095,
          0.5000000000000097,
          0.5000000000000098,
          0.5000000000000099,
          0.50000000000001]

    q = (12.0, 12.0)
    r = (24.0, 24.0)

    for py in pys:
        p = (px, py)
        o = orientation(p, q, r)
        print("orientation(({p[0]:>3}, {p[1]:<19}) q, r) -> {o:>2}".format(
              p=p, o=o))


if __name__  == '__main__':
    main()

The program includes definitions of our sign() and orientation() functions, together with a main() function which runs the test. The main function includes a list of the 271 nearest representable \(y\)-coordinate values to 0.5. We haven't included the code to generate these values successive float values because it's somewhat besides the point; we're referenced the necessary technique in the previous article.

The program iterates over these py values and performs the orientation test each time, printing the result. The complex format string is used to get readable output which lines up in columns. When we look at that output we see an intricate pattern of results emerge, which isn't even symmetrical around the central 0.5 value:

orientation((0.5, 0.50000000000001   ) q, r) ->  1
orientation((0.5, 0.5000000000000099 ) q, r) ->  1
orientation((0.5, 0.5000000000000098 ) q, r) ->  1
orientation((0.5, 0.5000000000000097 ) q, r) ->  1
orientation((0.5, 0.5000000000000095 ) q, r) ->  1
orientation((0.5, 0.5000000000000094 ) q, r) ->  1
orientation((0.5, 0.5000000000000093 ) q, r) ->  1
orientation((0.5, 0.5000000000000092 ) q, r) ->  1
orientation((0.5, 0.5000000000000091 ) q, r) ->  1
orientation((0.5, 0.500000000000009  ) q, r) ->  1
orientation((0.5, 0.5000000000000089 ) q, r) ->  1
orientation((0.5, 0.5000000000000088 ) q, r) ->  1
orientation((0.5, 0.5000000000000087 ) q, r) ->  1
orientation((0.5, 0.5000000000000085 ) q, r) ->  1
orientation((0.5, 0.5000000000000084 ) q, r) ->  1
orientation((0.5, 0.5000000000000083 ) q, r) ->  1
orientation((0.5, 0.5000000000000082 ) q, r) ->  1
orientation((0.5, 0.5000000000000081 ) q, r) ->  1
orientation((0.5, 0.500000000000008  ) q, r) ->  1
orientation((0.5, 0.5000000000000079 ) q, r) ->  1
orientation((0.5, 0.5000000000000078 ) q, r) ->  1
orientation((0.5, 0.5000000000000077 ) q, r) ->  1
orientation((0.5, 0.5000000000000075 ) q, r) ->  1
orientation((0.5, 0.5000000000000074 ) q, r) ->  1
orientation((0.5, 0.5000000000000073 ) q, r) ->  1
orientation((0.5, 0.5000000000000072 ) q, r) ->  1
orientation((0.5, 0.5000000000000071 ) q, r) ->  1
orientation((0.5, 0.500000000000007  ) q, r) ->  1
orientation((0.5, 0.5000000000000069 ) q, r) ->  1
orientation((0.5, 0.5000000000000068 ) q, r) ->  1
orientation((0.5, 0.5000000000000067 ) q, r) ->  1
orientation((0.5, 0.5000000000000066 ) q, r) ->  1
orientation((0.5, 0.5000000000000064 ) q, r) ->  1
orientation((0.5, 0.5000000000000063 ) q, r) ->  1
orientation((0.5, 0.5000000000000062 ) q, r) ->  1
orientation((0.5, 0.5000000000000061 ) q, r) ->  1
orientation((0.5, 0.500000000000006  ) q, r) ->  1
orientation((0.5, 0.5000000000000059 ) q, r) ->  1
orientation((0.5, 0.5000000000000058 ) q, r) ->  1
orientation((0.5, 0.5000000000000057 ) q, r) ->  1
orientation((0.5, 0.5000000000000056 ) q, r) ->  1
orientation((0.5, 0.5000000000000054 ) q, r) ->  1
orientation((0.5, 0.5000000000000053 ) q, r) ->  1
orientation((0.5, 0.5000000000000052 ) q, r) ->  1
orientation((0.5, 0.5000000000000051 ) q, r) ->  1
orientation((0.5, 0.500000000000005  ) q, r) ->  1
orientation((0.5, 0.5000000000000049 ) q, r) ->  1
orientation((0.5, 0.5000000000000048 ) q, r) ->  1
orientation((0.5, 0.5000000000000047 ) q, r) ->  1
orientation((0.5, 0.5000000000000046 ) q, r) ->  1
orientation((0.5, 0.5000000000000044 ) q, r) ->  0
orientation((0.5, 0.5000000000000043 ) q, r) ->  0
orientation((0.5, 0.5000000000000042 ) q, r) ->  0
orientation((0.5, 0.5000000000000041 ) q, r) ->  0
orientation((0.5, 0.500000000000004  ) q, r) ->  0
orientation((0.5, 0.5000000000000039 ) q, r) ->  0
orientation((0.5, 0.5000000000000038 ) q, r) ->  0
orientation((0.5, 0.5000000000000037 ) q, r) ->  0
orientation((0.5, 0.5000000000000036 ) q, r) ->  0
orientation((0.5, 0.5000000000000034 ) q, r) ->  0
orientation((0.5, 0.5000000000000033 ) q, r) ->  0
orientation((0.5, 0.5000000000000032 ) q, r) ->  0
orientation((0.5, 0.5000000000000031 ) q, r) ->  0
orientation((0.5, 0.500000000000003  ) q, r) ->  0
orientation((0.5, 0.5000000000000029 ) q, r) ->  0
orientation((0.5, 0.5000000000000028 ) q, r) ->  0
orientation((0.5, 0.5000000000000027 ) q, r) ->  0
orientation((0.5, 0.5000000000000026 ) q, r) ->  0
orientation((0.5, 0.5000000000000024 ) q, r) ->  0
orientation((0.5, 0.5000000000000023 ) q, r) ->  0
orientation((0.5, 0.5000000000000022 ) q, r) ->  0
orientation((0.5, 0.5000000000000021 ) q, r) ->  0
orientation((0.5, 0.500000000000002  ) q, r) ->  0
orientation((0.5, 0.5000000000000019 ) q, r) ->  0
orientation((0.5, 0.5000000000000018 ) q, r) ->  1
orientation((0.5, 0.5000000000000017 ) q, r) ->  1
orientation((0.5, 0.5000000000000016 ) q, r) ->  1
orientation((0.5, 0.5000000000000014 ) q, r) ->  1
orientation((0.5, 0.5000000000000013 ) q, r) ->  1
orientation((0.5, 0.5000000000000012 ) q, r) ->  1
orientation((0.5, 0.5000000000000011 ) q, r) ->  1
orientation((0.5, 0.500000000000001  ) q, r) ->  1
orientation((0.5, 0.5000000000000009 ) q, r) ->  0
orientation((0.5, 0.5000000000000008 ) q, r) ->  0
orientation((0.5, 0.5000000000000007 ) q, r) ->  0
orientation((0.5, 0.5000000000000006 ) q, r) ->  0
orientation((0.5, 0.5000000000000004 ) q, r) ->  0
orientation((0.5, 0.5000000000000003 ) q, r) ->  0
orientation((0.5, 0.5000000000000002 ) q, r) ->  0
orientation((0.5, 0.5000000000000001 ) q, r) ->  0
orientation((0.5, 0.5                ) q, r) ->  0
orientation((0.5, 0.49999999999999994) q, r) ->  0
orientation((0.5, 0.4999999999999999 ) q, r) ->  0
orientation((0.5, 0.49999999999999983) q, r) ->  0
orientation((0.5, 0.4999999999999998 ) q, r) ->  0
orientation((0.5, 0.4999999999999997 ) q, r) ->  0
orientation((0.5, 0.49999999999999967) q, r) ->  0
orientation((0.5, 0.4999999999999996 ) q, r) ->  0
orientation((0.5, 0.49999999999999956) q, r) ->  0
orientation((0.5, 0.4999999999999995 ) q, r) ->  0
orientation((0.5, 0.49999999999999944) q, r) ->  0
orientation((0.5, 0.4999999999999994 ) q, r) ->  0
orientation((0.5, 0.49999999999999933) q, r) ->  0
orientation((0.5, 0.4999999999999993 ) q, r) ->  0
orientation((0.5, 0.4999999999999992 ) q, r) ->  0
orientation((0.5, 0.49999999999999917) q, r) ->  0
orientation((0.5, 0.4999999999999991 ) q, r) ->  0
orientation((0.5, 0.49999999999999906) q, r) -> -1
orientation((0.5, 0.499999999999999  ) q, r) -> -1
orientation((0.5, 0.49999999999999895) q, r) -> -1
orientation((0.5, 0.4999999999999989 ) q, r) -> -1
orientation((0.5, 0.49999999999999883) q, r) -> -1
orientation((0.5, 0.4999999999999988 ) q, r) -> -1
orientation((0.5, 0.4999999999999987 ) q, r) -> -1
orientation((0.5, 0.49999999999999867) q, r) -> -1
orientation((0.5, 0.4999999999999986 ) q, r) -> -1
orientation((0.5, 0.49999999999999856) q, r) -> -1
orientation((0.5, 0.4999999999999985 ) q, r) -> -1
orientation((0.5, 0.49999999999999845) q, r) -> -1
orientation((0.5, 0.4999999999999984 ) q, r) -> -1
orientation((0.5, 0.49999999999999833) q, r) -> -1
orientation((0.5, 0.4999999999999983 ) q, r) -> -1
orientation((0.5, 0.4999999999999982 ) q, r) -> -1
orientation((0.5, 0.49999999999999817) q, r) ->  0
orientation((0.5, 0.4999999999999981 ) q, r) ->  0
orientation((0.5, 0.49999999999999806) q, r) ->  0
orientation((0.5, 0.499999999999998  ) q, r) ->  0
orientation((0.5, 0.49999999999999795) q, r) ->  0
orientation((0.5, 0.4999999999999979 ) q, r) ->  0
orientation((0.5, 0.49999999999999784) q, r) ->  0
orientation((0.5, 0.4999999999999978 ) q, r) ->  0
orientation((0.5, 0.4999999999999977 ) q, r) ->  0
orientation((0.5, 0.49999999999999767) q, r) ->  0
orientation((0.5, 0.4999999999999976 ) q, r) ->  0
orientation((0.5, 0.49999999999999756) q, r) ->  0
orientation((0.5, 0.4999999999999975 ) q, r) ->  0
orientation((0.5, 0.49999999999999745) q, r) ->  0
orientation((0.5, 0.4999999999999974 ) q, r) ->  0
orientation((0.5, 0.49999999999999734) q, r) ->  0
orientation((0.5, 0.4999999999999973 ) q, r) ->  0
orientation((0.5, 0.4999999999999972 ) q, r) ->  0
orientation((0.5, 0.49999999999999717) q, r) ->  0
orientation((0.5, 0.4999999999999971 ) q, r) ->  0
orientation((0.5, 0.49999999999999706) q, r) ->  0
orientation((0.5, 0.499999999999997  ) q, r) ->  0
orientation((0.5, 0.49999999999999695) q, r) ->  0
orientation((0.5, 0.4999999999999969 ) q, r) ->  0
orientation((0.5, 0.49999999999999684) q, r) ->  0
orientation((0.5, 0.4999999999999968 ) q, r) ->  0
orientation((0.5, 0.4999999999999967 ) q, r) ->  0
orientation((0.5, 0.49999999999999667) q, r) ->  0
orientation((0.5, 0.4999999999999966 ) q, r) ->  0
orientation((0.5, 0.49999999999999656) q, r) ->  0
orientation((0.5, 0.4999999999999965 ) q, r) ->  0
orientation((0.5, 0.49999999999999645) q, r) ->  0
orientation((0.5, 0.4999999999999964 ) q, r) ->  0
orientation((0.5, 0.49999999999999634) q, r) ->  0
orientation((0.5, 0.4999999999999963 ) q, r) ->  0
orientation((0.5, 0.4999999999999962 ) q, r) ->  0
orientation((0.5, 0.49999999999999617) q, r) ->  0
orientation((0.5, 0.4999999999999961 ) q, r) ->  0
orientation((0.5, 0.49999999999999606) q, r) ->  0
orientation((0.5, 0.499999999999996  ) q, r) ->  0
orientation((0.5, 0.49999999999999595) q, r) ->  0
orientation((0.5, 0.4999999999999959 ) q, r) ->  0
orientation((0.5, 0.49999999999999584) q, r) ->  0
orientation((0.5, 0.4999999999999958 ) q, r) ->  0
orientation((0.5, 0.4999999999999957 ) q, r) ->  0
orientation((0.5, 0.49999999999999567) q, r) ->  0
orientation((0.5, 0.4999999999999956 ) q, r) ->  0
orientation((0.5, 0.49999999999999556) q, r) ->  0
orientation((0.5, 0.4999999999999955 ) q, r) -> -1
orientation((0.5, 0.49999999999999545) q, r) -> -1
orientation((0.5, 0.4999999999999954 ) q, r) -> -1
orientation((0.5, 0.49999999999999534) q, r) -> -1
orientation((0.5, 0.4999999999999953 ) q, r) -> -1
orientation((0.5, 0.4999999999999952 ) q, r) -> -1
orientation((0.5, 0.49999999999999517) q, r) -> -1
orientation((0.5, 0.4999999999999951 ) q, r) -> -1
orientation((0.5, 0.49999999999999506) q, r) -> -1
orientation((0.5, 0.499999999999995  ) q, r) -> -1
orientation((0.5, 0.49999999999999495) q, r) -> -1
orientation((0.5, 0.4999999999999949 ) q, r) -> -1
orientation((0.5, 0.49999999999999484) q, r) -> -1
orientation((0.5, 0.4999999999999948 ) q, r) -> -1
orientation((0.5, 0.4999999999999947 ) q, r) -> -1
orientation((0.5, 0.49999999999999467) q, r) -> -1
orientation((0.5, 0.4999999999999946 ) q, r) -> -1
orientation((0.5, 0.49999999999999456) q, r) -> -1
orientation((0.5, 0.4999999999999945 ) q, r) -> -1
orientation((0.5, 0.49999999999999445) q, r) -> -1
orientation((0.5, 0.4999999999999944 ) q, r) -> -1
orientation((0.5, 0.49999999999999434) q, r) -> -1
orientation((0.5, 0.4999999999999943 ) q, r) -> -1
orientation((0.5, 0.4999999999999942 ) q, r) -> -1
orientation((0.5, 0.49999999999999417) q, r) -> -1
orientation((0.5, 0.4999999999999941 ) q, r) -> -1
orientation((0.5, 0.49999999999999406) q, r) -> -1
orientation((0.5, 0.499999999999994  ) q, r) -> -1
orientation((0.5, 0.49999999999999395) q, r) -> -1
orientation((0.5, 0.4999999999999939 ) q, r) -> -1
orientation((0.5, 0.49999999999999384) q, r) -> -1
orientation((0.5, 0.4999999999999938 ) q, r) -> -1
orientation((0.5, 0.4999999999999937 ) q, r) -> -1
orientation((0.5, 0.49999999999999367) q, r) -> -1
orientation((0.5, 0.4999999999999936 ) q, r) -> -1
orientation((0.5, 0.49999999999999356) q, r) -> -1
orientation((0.5, 0.4999999999999935 ) q, r) -> -1
orientation((0.5, 0.49999999999999345) q, r) -> -1
orientation((0.5, 0.4999999999999934 ) q, r) -> -1
orientation((0.5, 0.49999999999999334) q, r) -> -1
orientation((0.5, 0.4999999999999933 ) q, r) -> -1
orientation((0.5, 0.4999999999999932 ) q, r) -> -1
orientation((0.5, 0.49999999999999317) q, r) -> -1
orientation((0.5, 0.4999999999999931 ) q, r) -> -1
orientation((0.5, 0.49999999999999306) q, r) -> -1
orientation((0.5, 0.499999999999993  ) q, r) -> -1
orientation((0.5, 0.49999999999999295) q, r) -> -1
orientation((0.5, 0.4999999999999929 ) q, r) -> -1
orientation((0.5, 0.49999999999999284) q, r) -> -1
orientation((0.5, 0.4999999999999928 ) q, r) -> -1
orientation((0.5, 0.49999999999999273) q, r) -> -1
orientation((0.5, 0.4999999999999927 ) q, r) -> -1
orientation((0.5, 0.4999999999999926 ) q, r) -> -1
orientation((0.5, 0.49999999999999256) q, r) -> -1
orientation((0.5, 0.4999999999999925 ) q, r) -> -1
orientation((0.5, 0.49999999999999245) q, r) -> -1
orientation((0.5, 0.4999999999999924 ) q, r) -> -1
orientation((0.5, 0.49999999999999234) q, r) -> -1
orientation((0.5, 0.4999999999999923 ) q, r) -> -1
orientation((0.5, 0.49999999999999223) q, r) -> -1
orientation((0.5, 0.4999999999999922 ) q, r) -> -1
orientation((0.5, 0.4999999999999921 ) q, r) -> -1
orientation((0.5, 0.49999999999999206) q, r) -> -1
orientation((0.5, 0.499999999999992  ) q, r) -> -1
orientation((0.5, 0.49999999999999195) q, r) -> -1
orientation((0.5, 0.4999999999999919 ) q, r) -> -1
orientation((0.5, 0.49999999999999184) q, r) -> -1
orientation((0.5, 0.4999999999999918 ) q, r) -> -1
orientation((0.5, 0.49999999999999173) q, r) -> -1
orientation((0.5, 0.4999999999999917 ) q, r) -> -1
orientation((0.5, 0.4999999999999916 ) q, r) -> -1
orientation((0.5, 0.49999999999999156) q, r) -> -1
orientation((0.5, 0.4999999999999915 ) q, r) -> -1
orientation((0.5, 0.49999999999999145) q, r) -> -1
orientation((0.5, 0.4999999999999914 ) q, r) -> -1
orientation((0.5, 0.49999999999999134) q, r) -> -1
orientation((0.5, 0.4999999999999913 ) q, r) -> -1
orientation((0.5, 0.49999999999999123) q, r) -> -1
orientation((0.5, 0.4999999999999912 ) q, r) -> -1
orientation((0.5, 0.4999999999999911 ) q, r) -> -1
orientation((0.5, 0.49999999999999106) q, r) -> -1
orientation((0.5, 0.499999999999991  ) q, r) -> -1
orientation((0.5, 0.49999999999999095) q, r) -> -1
orientation((0.5, 0.4999999999999909 ) q, r) -> -1
orientation((0.5, 0.49999999999999084) q, r) -> -1
orientation((0.5, 0.4999999999999908 ) q, r) -> -1
orientation((0.5, 0.49999999999999073) q, r) -> -1
orientation((0.5, 0.4999999999999907 ) q, r) -> -1
orientation((0.5, 0.4999999999999906 ) q, r) -> -1
orientation((0.5, 0.49999999999999056) q, r) -> -1
orientation((0.5, 0.4999999999999905 ) q, r) -> -1
orientation((0.5, 0.49999999999999045) q, r) -> -1
orientation((0.5, 0.4999999999999904 ) q, r) -> -1
orientation((0.5, 0.49999999999999034) q, r) -> -1
orientation((0.5, 0.4999999999999903 ) q, r) -> -1
orientation((0.5, 0.49999999999999023) q, r) -> -1
orientation((0.5, 0.4999999999999902 ) q, r) -> -1
orientation((0.5, 0.4999999999999901 ) q, r) -> -1
orientation((0.5, 0.49999999999999006) q, r) -> -1
orientation((0.5, 0.49999999999999   ) q, r) -> -1

The colour coding (added later) represents whether the algorithm reckons the points are above the line (in blue), on the line (in yellow) or below the line (in red). The only point which is actually on the line is in green.

By this point you should at least be wary of using floating point arithmetic for geometric computation. Lest you think this can easily be solved by introducing a tolerance value, or some other clunky solution, we'll save you the bother by pointing out that doing do merely moves these fringing effects to the edge of the tolerance zone.

What to do? Fortunately, as we alluded to at the beginning of this tale, Python gives us a solution into the form of the rational numbers, implemented as the Fraction type.

Let's make a small change to our program, converting all numbers to Fractions before proceeding with the computation. We'll do this by modifying the orientation() to convert each of its three arguments from a tuple containing a pair of numeric objects into a pair of Fractions. The Fraction constructor accepts a selection of numeric types, including float:

def orientation(p, q, r):
      """Determine the orientation of three points in the plane.

      Args:
        p, q, r: Two-tuples representing coordinate pairs of three points.

      Returns:
          -1 if p, q, r is a turn to the right, +1 if p, q, r is a turn to the
          left, otherwise 0 if p, q, and r are collinear.
      """
      p = (Fraction(p[0]), Fraction(p[1]))
      q = (Fraction(q[0]), Fraction(q[1]))
      r = (Fraction(r[0]), Fraction(r[1]))

      d = (q[0] - p[0]) * (r[1] - p[1]) - (q[1] - p[1]) * (r[0] - p[0])
      return sign(d)

The variable d will now also be a Fraction and the sign() function will work as expected with this type since it only uses comparison to zero.

Let's run our modified example:

orientation((0.5, 0.49999999999999   ) q, r) -> -1
orientation((0.5, 0.49999999999999006) q, r) -> -1
orientation((0.5, 0.4999999999999901 ) q, r) -> -1
orientation((0.5, 0.4999999999999902 ) q, r) -> -1
orientation((0.5, 0.49999999999999023) q, r) -> -1
orientation((0.5, 0.4999999999999903 ) q, r) -> -1
orientation((0.5, 0.49999999999999034) q, r) -> -1
orientation((0.5, 0.4999999999999904 ) q, r) -> -1
orientation((0.5, 0.49999999999999045) q, r) -> -1
orientation((0.5, 0.4999999999999905 ) q, r) -> -1
orientation((0.5, 0.49999999999999056) q, r) -> -1
orientation((0.5, 0.4999999999999906 ) q, r) -> -1
orientation((0.5, 0.4999999999999907 ) q, r) -> -1
orientation((0.5, 0.49999999999999073) q, r) -> -1
orientation((0.5, 0.4999999999999908 ) q, r) -> -1
orientation((0.5, 0.49999999999999084) q, r) -> -1
orientation((0.5, 0.4999999999999909 ) q, r) -> -1
orientation((0.5, 0.49999999999999095) q, r) -> -1
orientation((0.5, 0.499999999999991  ) q, r) -> -1
orientation((0.5, 0.49999999999999106) q, r) -> -1
orientation((0.5, 0.4999999999999911 ) q, r) -> -1
orientation((0.5, 0.4999999999999912 ) q, r) -> -1
orientation((0.5, 0.49999999999999123) q, r) -> -1
orientation((0.5, 0.4999999999999913 ) q, r) -> -1
orientation((0.5, 0.49999999999999134) q, r) -> -1
orientation((0.5, 0.4999999999999914 ) q, r) -> -1
orientation((0.5, 0.49999999999999145) q, r) -> -1
orientation((0.5, 0.4999999999999915 ) q, r) -> -1
orientation((0.5, 0.49999999999999156) q, r) -> -1
orientation((0.5, 0.4999999999999916 ) q, r) -> -1
orientation((0.5, 0.4999999999999917 ) q, r) -> -1
orientation((0.5, 0.49999999999999173) q, r) -> -1
orientation((0.5, 0.4999999999999918 ) q, r) -> -1
orientation((0.5, 0.49999999999999184) q, r) -> -1
orientation((0.5, 0.4999999999999919 ) q, r) -> -1
orientation((0.5, 0.49999999999999195) q, r) -> -1
orientation((0.5, 0.499999999999992  ) q, r) -> -1
orientation((0.5, 0.49999999999999206) q, r) -> -1
orientation((0.5, 0.4999999999999921 ) q, r) -> -1
orientation((0.5, 0.4999999999999922 ) q, r) -> -1
orientation((0.5, 0.49999999999999223) q, r) -> -1
orientation((0.5, 0.4999999999999923 ) q, r) -> -1
orientation((0.5, 0.49999999999999234) q, r) -> -1
orientation((0.5, 0.4999999999999924 ) q, r) -> -1
orientation((0.5, 0.49999999999999245) q, r) -> -1
orientation((0.5, 0.4999999999999925 ) q, r) -> -1
orientation((0.5, 0.49999999999999256) q, r) -> -1
orientation((0.5, 0.4999999999999926 ) q, r) -> -1
orientation((0.5, 0.4999999999999927 ) q, r) -> -1
orientation((0.5, 0.49999999999999273) q, r) -> -1
orientation((0.5, 0.4999999999999928 ) q, r) -> -1
orientation((0.5, 0.49999999999999284) q, r) -> -1
orientation((0.5, 0.4999999999999929 ) q, r) -> -1
orientation((0.5, 0.49999999999999295) q, r) -> -1
orientation((0.5, 0.499999999999993  ) q, r) -> -1
orientation((0.5, 0.49999999999999306) q, r) -> -1
orientation((0.5, 0.4999999999999931 ) q, r) -> -1
orientation((0.5, 0.49999999999999317) q, r) -> -1
orientation((0.5, 0.4999999999999932 ) q, r) -> -1
orientation((0.5, 0.4999999999999933 ) q, r) -> -1
orientation((0.5, 0.49999999999999334) q, r) -> -1
orientation((0.5, 0.4999999999999934 ) q, r) -> -1
orientation((0.5, 0.49999999999999345) q, r) -> -1
orientation((0.5, 0.4999999999999935 ) q, r) -> -1
orientation((0.5, 0.49999999999999356) q, r) -> -1
orientation((0.5, 0.4999999999999936 ) q, r) -> -1
orientation((0.5, 0.49999999999999367) q, r) -> -1
orientation((0.5, 0.4999999999999937 ) q, r) -> -1
orientation((0.5, 0.4999999999999938 ) q, r) -> -1
orientation((0.5, 0.49999999999999384) q, r) -> -1
orientation((0.5, 0.4999999999999939 ) q, r) -> -1
orientation((0.5, 0.49999999999999395) q, r) -> -1
orientation((0.5, 0.499999999999994  ) q, r) -> -1
orientation((0.5, 0.49999999999999406) q, r) -> -1
orientation((0.5, 0.4999999999999941 ) q, r) -> -1
orientation((0.5, 0.49999999999999417) q, r) -> -1
orientation((0.5, 0.4999999999999942 ) q, r) -> -1
orientation((0.5, 0.4999999999999943 ) q, r) -> -1
orientation((0.5, 0.49999999999999434) q, r) -> -1
orientation((0.5, 0.4999999999999944 ) q, r) -> -1
orientation((0.5, 0.49999999999999445) q, r) -> -1
orientation((0.5, 0.4999999999999945 ) q, r) -> -1
orientation((0.5, 0.49999999999999456) q, r) -> -1
orientation((0.5, 0.4999999999999946 ) q, r) -> -1
orientation((0.5, 0.49999999999999467) q, r) -> -1
orientation((0.5, 0.4999999999999947 ) q, r) -> -1
orientation((0.5, 0.4999999999999948 ) q, r) -> -1
orientation((0.5, 0.49999999999999484) q, r) -> -1
orientation((0.5, 0.4999999999999949 ) q, r) -> -1
orientation((0.5, 0.49999999999999495) q, r) -> -1
orientation((0.5, 0.499999999999995  ) q, r) -> -1
orientation((0.5, 0.49999999999999506) q, r) -> -1
orientation((0.5, 0.4999999999999951 ) q, r) -> -1
orientation((0.5, 0.49999999999999517) q, r) -> -1
orientation((0.5, 0.4999999999999952 ) q, r) -> -1
orientation((0.5, 0.4999999999999953 ) q, r) -> -1
orientation((0.5, 0.49999999999999534) q, r) -> -1
orientation((0.5, 0.4999999999999954 ) q, r) -> -1
orientation((0.5, 0.49999999999999545) q, r) -> -1
orientation((0.5, 0.4999999999999955 ) q, r) -> -1
orientation((0.5, 0.49999999999999556) q, r) -> -1
orientation((0.5, 0.4999999999999956 ) q, r) -> -1
orientation((0.5, 0.49999999999999567) q, r) -> -1
orientation((0.5, 0.4999999999999957 ) q, r) -> -1
orientation((0.5, 0.4999999999999958 ) q, r) -> -1
orientation((0.5, 0.49999999999999584) q, r) -> -1
orientation((0.5, 0.4999999999999959 ) q, r) -> -1
orientation((0.5, 0.49999999999999595) q, r) -> -1
orientation((0.5, 0.499999999999996  ) q, r) -> -1
orientation((0.5, 0.49999999999999606) q, r) -> -1
orientation((0.5, 0.4999999999999961 ) q, r) -> -1
orientation((0.5, 0.49999999999999617) q, r) -> -1
orientation((0.5, 0.4999999999999962 ) q, r) -> -1
orientation((0.5, 0.4999999999999963 ) q, r) -> -1
orientation((0.5, 0.49999999999999634) q, r) -> -1
orientation((0.5, 0.4999999999999964 ) q, r) -> -1
orientation((0.5, 0.49999999999999645) q, r) -> -1
orientation((0.5, 0.4999999999999965 ) q, r) -> -1
orientation((0.5, 0.49999999999999656) q, r) -> -1
orientation((0.5, 0.4999999999999966 ) q, r) -> -1
orientation((0.5, 0.49999999999999667) q, r) -> -1
orientation((0.5, 0.4999999999999967 ) q, r) -> -1
orientation((0.5, 0.4999999999999968 ) q, r) -> -1
orientation((0.5, 0.49999999999999684) q, r) -> -1
orientation((0.5, 0.4999999999999969 ) q, r) -> -1
orientation((0.5, 0.49999999999999695) q, r) -> -1
orientation((0.5, 0.499999999999997  ) q, r) -> -1
orientation((0.5, 0.49999999999999706) q, r) -> -1
orientation((0.5, 0.4999999999999971 ) q, r) -> -1
orientation((0.5, 0.49999999999999717) q, r) -> -1
orientation((0.5, 0.4999999999999972 ) q, r) -> -1
orientation((0.5, 0.4999999999999973 ) q, r) -> -1
orientation((0.5, 0.49999999999999734) q, r) -> -1
orientation((0.5, 0.4999999999999974 ) q, r) -> -1
orientation((0.5, 0.49999999999999745) q, r) -> -1
orientation((0.5, 0.4999999999999975 ) q, r) -> -1
orientation((0.5, 0.49999999999999756) q, r) -> -1
orientation((0.5, 0.4999999999999976 ) q, r) -> -1
orientation((0.5, 0.49999999999999767) q, r) -> -1
orientation((0.5, 0.4999999999999977 ) q, r) -> -1
orientation((0.5, 0.4999999999999978 ) q, r) -> -1
orientation((0.5, 0.49999999999999784) q, r) -> -1
orientation((0.5, 0.4999999999999979 ) q, r) -> -1
orientation((0.5, 0.49999999999999795) q, r) -> -1
orientation((0.5, 0.499999999999998  ) q, r) -> -1
orientation((0.5, 0.49999999999999806) q, r) -> -1
orientation((0.5, 0.4999999999999981 ) q, r) -> -1
orientation((0.5, 0.49999999999999817) q, r) -> -1
orientation((0.5, 0.4999999999999982 ) q, r) -> -1
orientation((0.5, 0.4999999999999983 ) q, r) -> -1
orientation((0.5, 0.49999999999999833) q, r) -> -1
orientation((0.5, 0.4999999999999984 ) q, r) -> -1
orientation((0.5, 0.49999999999999845) q, r) -> -1
orientation((0.5, 0.4999999999999985 ) q, r) -> -1
orientation((0.5, 0.49999999999999856) q, r) -> -1
orientation((0.5, 0.4999999999999986 ) q, r) -> -1
orientation((0.5, 0.49999999999999867) q, r) -> -1
orientation((0.5, 0.4999999999999987 ) q, r) -> -1
orientation((0.5, 0.4999999999999988 ) q, r) -> -1
orientation((0.5, 0.49999999999999883) q, r) -> -1
orientation((0.5, 0.4999999999999989 ) q, r) -> -1
orientation((0.5, 0.49999999999999895) q, r) -> -1
orientation((0.5, 0.499999999999999  ) q, r) -> -1
orientation((0.5, 0.49999999999999906) q, r) -> -1
orientation((0.5, 0.4999999999999991 ) q, r) -> -1
orientation((0.5, 0.49999999999999917) q, r) -> -1
orientation((0.5, 0.4999999999999992 ) q, r) -> -1
orientation((0.5, 0.4999999999999993 ) q, r) -> -1
orientation((0.5, 0.49999999999999933) q, r) -> -1
orientation((0.5, 0.4999999999999994 ) q, r) -> -1
orientation((0.5, 0.49999999999999944) q, r) -> -1
orientation((0.5, 0.4999999999999995 ) q, r) -> -1
orientation((0.5, 0.49999999999999956) q, r) -> -1
orientation((0.5, 0.4999999999999996 ) q, r) -> -1
orientation((0.5, 0.49999999999999967) q, r) -> -1
orientation((0.5, 0.4999999999999997 ) q, r) -> -1
orientation((0.5, 0.4999999999999998 ) q, r) -> -1
orientation((0.5, 0.49999999999999983) q, r) -> -1
orientation((0.5, 0.4999999999999999 ) q, r) -> -1
orientation((0.5, 0.49999999999999994) q, r) -> -1
orientation((0.5, 0.5                ) q, r) ->  0
orientation((0.5, 0.5000000000000001 ) q, r) ->  1
orientation((0.5, 0.5000000000000002 ) q, r) ->  1
orientation((0.5, 0.5000000000000003 ) q, r) ->  1
orientation((0.5, 0.5000000000000004 ) q, r) ->  1
orientation((0.5, 0.5000000000000006 ) q, r) ->  1
orientation((0.5, 0.5000000000000007 ) q, r) ->  1
orientation((0.5, 0.5000000000000008 ) q, r) ->  1
orientation((0.5, 0.5000000000000009 ) q, r) ->  1
orientation((0.5, 0.500000000000001  ) q, r) ->  1
orientation((0.5, 0.5000000000000011 ) q, r) ->  1
orientation((0.5, 0.5000000000000012 ) q, r) ->  1
orientation((0.5, 0.5000000000000013 ) q, r) ->  1
orientation((0.5, 0.5000000000000014 ) q, r) ->  1
orientation((0.5, 0.5000000000000016 ) q, r) ->  1
orientation((0.5, 0.5000000000000017 ) q, r) ->  1
orientation((0.5, 0.5000000000000018 ) q, r) ->  1
orientation((0.5, 0.5000000000000019 ) q, r) ->  1
orientation((0.5, 0.500000000000002  ) q, r) ->  1
orientation((0.5, 0.5000000000000021 ) q, r) ->  1
orientation((0.5, 0.5000000000000022 ) q, r) ->  1
orientation((0.5, 0.5000000000000023 ) q, r) ->  1
orientation((0.5, 0.5000000000000024 ) q, r) ->  1
orientation((0.5, 0.5000000000000026 ) q, r) ->  1
orientation((0.5, 0.5000000000000027 ) q, r) ->  1
orientation((0.5, 0.5000000000000028 ) q, r) ->  1
orientation((0.5, 0.5000000000000029 ) q, r) ->  1
orientation((0.5, 0.500000000000003  ) q, r) ->  1
orientation((0.5, 0.5000000000000031 ) q, r) ->  1
orientation((0.5, 0.5000000000000032 ) q, r) ->  1
orientation((0.5, 0.5000000000000033 ) q, r) ->  1
orientation((0.5, 0.5000000000000034 ) q, r) ->  1
orientation((0.5, 0.5000000000000036 ) q, r) ->  1
orientation((0.5, 0.5000000000000037 ) q, r) ->  1
orientation((0.5, 0.5000000000000038 ) q, r) ->  1
orientation((0.5, 0.5000000000000039 ) q, r) ->  1
orientation((0.5, 0.500000000000004  ) q, r) ->  1
orientation((0.5, 0.5000000000000041 ) q, r) ->  1
orientation((0.5, 0.5000000000000042 ) q, r) ->  1
orientation((0.5, 0.5000000000000043 ) q, r) ->  1
orientation((0.5, 0.5000000000000044 ) q, r) ->  1
orientation((0.5, 0.5000000000000046 ) q, r) ->  1
orientation((0.5, 0.5000000000000047 ) q, r) ->  1
orientation((0.5, 0.5000000000000048 ) q, r) ->  1
orientation((0.5, 0.5000000000000049 ) q, r) ->  1
orientation((0.5, 0.500000000000005  ) q, r) ->  1
orientation((0.5, 0.5000000000000051 ) q, r) ->  1
orientation((0.5, 0.5000000000000052 ) q, r) ->  1
orientation((0.5, 0.5000000000000053 ) q, r) ->  1
orientation((0.5, 0.5000000000000054 ) q, r) ->  1
orientation((0.5, 0.5000000000000056 ) q, r) ->  1
orientation((0.5, 0.5000000000000057 ) q, r) ->  1
orientation((0.5, 0.5000000000000058 ) q, r) ->  1
orientation((0.5, 0.5000000000000059 ) q, r) ->  1
orientation((0.5, 0.500000000000006  ) q, r) ->  1
orientation((0.5, 0.5000000000000061 ) q, r) ->  1
orientation((0.5, 0.5000000000000062 ) q, r) ->  1
orientation((0.5, 0.5000000000000063 ) q, r) ->  1
orientation((0.5, 0.5000000000000064 ) q, r) ->  1
orientation((0.5, 0.5000000000000066 ) q, r) ->  1
orientation((0.5, 0.5000000000000067 ) q, r) ->  1
orientation((0.5, 0.5000000000000068 ) q, r) ->  1
orientation((0.5, 0.5000000000000069 ) q, r) ->  1
orientation((0.5, 0.500000000000007  ) q, r) ->  1
orientation((0.5, 0.5000000000000071 ) q, r) ->  1
orientation((0.5, 0.5000000000000072 ) q, r) ->  1
orientation((0.5, 0.5000000000000073 ) q, r) ->  1
orientation((0.5, 0.5000000000000074 ) q, r) ->  1
orientation((0.5, 0.5000000000000075 ) q, r) ->  1
orientation((0.5, 0.5000000000000077 ) q, r) ->  1
orientation((0.5, 0.5000000000000078 ) q, r) ->  1
orientation((0.5, 0.5000000000000079 ) q, r) ->  1
orientation((0.5, 0.500000000000008  ) q, r) ->  1
orientation((0.5, 0.5000000000000081 ) q, r) ->  1
orientation((0.5, 0.5000000000000082 ) q, r) ->  1
orientation((0.5, 0.5000000000000083 ) q, r) ->  1
orientation((0.5, 0.5000000000000084 ) q, r) ->  1
orientation((0.5, 0.5000000000000085 ) q, r) ->  1
orientation((0.5, 0.5000000000000087 ) q, r) ->  1
orientation((0.5, 0.5000000000000088 ) q, r) ->  1
orientation((0.5, 0.5000000000000089 ) q, r) ->  1
orientation((0.5, 0.500000000000009  ) q, r) ->  1
orientation((0.5, 0.5000000000000091 ) q, r) ->  1
orientation((0.5, 0.5000000000000092 ) q, r) ->  1
orientation((0.5, 0.5000000000000093 ) q, r) ->  1
orientation((0.5, 0.5000000000000094 ) q, r) ->  1
orientation((0.5, 0.5000000000000095 ) q, r) ->  1
orientation((0.5, 0.5000000000000097 ) q, r) ->  1
orientation((0.5, 0.5000000000000098 ) q, r) ->  1
orientation((0.5, 0.5000000000000099 ) q, r) ->  1
orientation((0.5, 0.50000000000001   ) q, r) ->  1

Using Fractions internally, our orientation() function gets the full benefit of exact arithmetic with effectively infinite precision and consequently produces an exact result with only one position of p being reported as collinear with q and r.

In the next article, we'll more fully explore the behaviour of the non-robust float-based version of this function based graphically, to get an impression of how lines are 'seen' by floating-point geometric functions.

[1]Python's float is an IEEE-754 double precision 64-bit float.

Python’s super(): Not as Simple as You Thought

Austin Bingham from Good With Computers

Python's super() is one of those aspects of the language that many developers use without really understanding what it does or how it works. [1] To many people, super() is simply how you access your base-class's implementation of a method. And while this is true, it's far from the full story.

In this series I want to look at the mechanics and some of the theory behind super(). I want to show that, far from just letting you access your base-class, Python's super() is the key to some interesting and elegant design options that promote composability, separation of concerns, and long-term maintainability. In this first article I'll introduce a small set of classes, and in these classes we'll find a bit of a mystery. In subsequent articles we'll investigate this mystery by seeing how Python's super() really works, looking at topics like method resolution order, the C3 algorithm, and proxy objects.

In the end, you'll find that super() is both more complex than you probably expected, yet also surprisingly elegant and easy-to-understand. This improved understanding of super() will help you understand and appreciate Python on at a deeper level, and it will give you new tools for developing your own Python code.

A note on Python versions

This series is written using Python 3, and some of the examples and concepts don't apply completely to Python 2. In particular, this series assumes that classes are "new-style" classes. ((For a discussion of the difference between "old-style" and "new-style" classes, see the Python wiki.)) In Python 3 all classes are new-style, while in Python 2 you have to explicitly inherit from object to be a new-style class.

For example, where we might use the following Python 3 code in this series:

class IntList:
    . . .

the equivalent Python 2 code would be:

class IntList(object):
    . . .

Also, throughout this series we'll call super() with no arguments. This is only supported in Python 3, but it has an equivalent form in Python 2. In general, when you see a method like this:

class IntList:
    def add(self, x):
        super().add(x)

the equivalent Python 2 code has to pass the class name and self to super(), like this:

class IntList:
    def add(self, x):
        super(IntList, self).add(x)

If any other Python2/3 differences occur in the series, I'll be sure to point them out. [2]

The Mystery of the SortedIntList

To begin, we're going to define a small family of classes that implement some constrained list-like functionality. These classes form a diamond inheritance graph like this:

Inheritance graph

At the root of these classes is SimpleList:

class SimpleList:
    def __init__(self, items):
        self._items = list(items)

    def add(self, item):
        self._items.append(item)

    def __getitem__(self, index):
        return self._items[index]

    def sort(self):
        self._items.sort()

    def __len__(self):
        return len(self._items)

    def __repr__(self):
        return "{}({!r})".format(
            self.__class__.__name__,
            self._items)

SimpleList uses a standard list internally, and it provides a smaller, more limited API for interacting with the list data. This may not be a very realistic class from a practical point of view, but, as you'll see, it let's us explore some interesting aspects of inheritance relationships in Python.

Next let's create a subclass of SimpleList which keeps the list contents sorted. We'll call this class SortedList:

class SortedList(SimpleList):
    def __init__(self, items=()):
        super().__init__(items)
        self.sort()

    def add(self, item):
        super().add(item)
        self.sort()

The initializer calls SimpleList's initializer and then immediately uses SimpleList.sort() to sort the contents. SortedList also overrides the add method on SimpleList to ensure that the list always remains sorted.

In SortedList we already see a call to super(), and the intention of this code is pretty clear. In SortedList.add(), for example, super() is used to call SimpleList.add() - that is, deferring to the base-class - before sorting the list contents. There's nothing mysterious going on...yet.

Next let's define IntList, a SimpleList subclass which only allows integer elements. This list subclass prevents the insertion of non-integer elements, and it does so by using the isinstance() function:

class IntList(SimpleList):
    def __init__(self, items=()):
        for item in items: self._validate(item)
        super().__init__(items)

    @classmethod
    def _validate(cls, item):
        if not isinstance(item, int):
            raise TypeError(
                '{} only supports integer values.'.format(
                    cls.__name__))

    def add(self, item):
        self._validate(item)
        super().add(item)

You'll immediately notice that IntList is structurally similar to SortedList. It provides its own initializer and, like SortedList, overrides the add() method to perform extra checks. In this case, IntList calls its _validate() method on every item that goes into the list. _validate() uses isinstance() to check the type of the candidates, and if a candidate is not an instance of int, _validate() raises a TypeError.

Chances are, neither SortedList nor IntList are particularly surprising. They both use super() for the job that most people find most natural: calling base-class implementations. With that in mind, then, let's introduce one more class, SortedIntList, which inherits from both SortedList and IntList, and which enforces both constraints at once:

class SortedIntList(IntList, SortedList):
    pass

It doesn't look like much, does it? We've simply defined a new class and given it two base classes. In fact, we haven't added any new implementation code at all. But if we go to the REPL we can see that it works as we expect. The initializer sorts the input sequence:

>>> sil = SortedIntList([42, 23, 2])
>>> sil
SortedIntList([2, 23, 42])

but rejects non-integer values:

>>> SortedIntList([3, 2, '1'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sorted_int_list.py", line 43, in __init__
    for x in items: self._validate(x)
  File "sorted_int_list.py", line 49, in _validate
    raise TypeError(
                '{} only supports integer values.'.format(
                    cls.__name__))
TypeError: SortedIntList only supports integer values.

Likewise, add() maintains both the sorting and type constraints defined by the base classes:

>>> sil.add(-1234)
>>> sil
SortedIntList([-1234, 2, 23, 42])
>>> sil.add('the smallest uninteresting number')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sorted_int_list.py", line 45, in add
    self._validate(item)
  File "sorted_int_list.py", line 42, in _validate
    raise TypeError(
                '{} only supports integer values.'.format(
                    cls.__name__))
TypeError: SortedIntList only supports integer values.

You should spend some time playing with SortedIntList to convince yourself that it works as expected. You can get the code here.

It may not be immediately apparent how all of this works, though. After all, both IntList and SortedList define add(). How does Python know which add() to call? More importantly, since both the sorting and type constraints are being enforced by SortedIntList, how does Python seem to know to call both of them? This is the mystery that this series will unravel, and the answers to these questions have to do with the method resolution order we mentioned earlier, along with the details of how super() really works. So stay tuned!

[1]If you do know how it works, then congratulations! This series probably isn't for you. But believe me, lots of people don't.
[2]For a detailed look at the differences between the language versions, see Guido's list of differences.

The Folly of Floating-Point for Robust Geometric Computation

Rob Smallshire from Good With Computers

Computational geometry - a world where lines have zero thickness, circles are perfectly round and points are dimensionless. Creating robust geometric algorithms using finite precision number types such as float is fiendishly difficult because it's not possible to exactly represent numbers such as one-third, which rather gets in the way of performing seemingly simple operations like dividing a line into exactly three equal segments. In this short series of posts, we'll look at some of the pitfalls of geometric computation, with examples in Python, although the key messages are true with finite-precision floating point numbers in any language.

Rational numbers, modelled by Python's Fraction [1] type can be useful for implementing robust geometric algorithms. These algorithms are often deeply elegant and surprising because they must avoid any detour into the realm of irrational numbers which cannot be represented in finite precision, which means that using seemingly innocuous operations like square root, for example to determine the length of a line using Pythagoras, are not permitted.

One algorithm which benefits from rational numbers is a simple collinearity test determining whether three points lie on the same line. This can be further refined to consider whether a query point \(p\) is above, exactly on, or below the line. Now there are many ways to approach writing such a function, and like many questions in computational geometry the naïve approaches are either overly complex, inefficient, or just plain wrong, albeit in subtle ways. I won't cover the story of how to arrive at a robust algorithm, that story is entertaining covered in Writing Programs for "The Book" by Brian Hayes. [2] Rather, we'll start where Brian leaves off by showing how to implement the algorithm in Python using both floating-point and exact arithmetic so we can understand the tradeoffs between performance and correctness inherent in these choices. Along the way, we'll perhaps touch on some aspects of Python which may be new to you.

Collinearity test

Whether p is above, exactly on, or below line p, r can be determined from the orientation of triangle p, q, r.

You don't need to understand the mathematics of the orientation test to appreciate the point of what we're about to demonstrate, suffice to say that the orientation of three two-dimensional points can be concisely found by computing the sign of the determinant of a three by three matrix containing the \(x\) and \(y\) coordinates of the points in question, where the determinant happens to be the signed area of the triangle formed by the three points:

\begin{equation*} \newcommand{\sgn}{\mathop{\rm sgn}\nolimits} o(p, q, r) = \sgn \begin{vmatrix} 1 & p\_x & p\_y\\1 & q\_x & q\_y\\1 & r\_x & r\_y \end{vmatrix} \end{equation*}

The function \(o\) returns \(+1\) if the polyline \(p\), \(q\), \(r\) executes a left turn and the loop is counterclockwise, \(0\) if the polyline is straight, or \(-1\) if the polyline executes a right turn and the loop is clockwise. These values can in turn be interpreted in terms of whether the query point \(p\) is above, on, or below the line through \(q\) and \(r\).

To cast this formula in Python, we need a sign function and a means of computing the determinant. Both of these are straightforward, although perhaps not obvious, and give us the opportunity to explore a little appreciated aspect of Python. First, the sign() function. You may be surprised to learn − and you wouldn't be alone − that there is no built-in or library function in Python which returns the sign of a number as \(-1\), \(0\) or \(+1\), so we need to roll our own. The simplest solution is probably something like this:

>>> def sign(x):
...     if x < 0:
...         return -1
...     elif x > 0:
...         return 1
...     return 0
...
>>> sign(5)
1
>>> sign(-5)
-1
>>> sign(0)
0

This works well enough. A more elegant solution would be to exploit an interesting behaviour of the bool type, specifically how it behaves under subtraction. Let's do a few experiments:

>>> False - False
0
>>> False - True
-1
>>> True - False
1
>>> True - True
0

Intriguingly, subtraction of bool objects has an integer result! In fact, when used in arithmetic operations this way, True is equivalent to positive one and False is equivalent to zero. We can use this behaviour to implement a most elegant sign() function:

>>> def sign(x):
...     return (x > 0) - (x < 0)
...
>>> sign(-5)
-1
>>> sign(5)
1
>>> sign(0)
0

Now we need to compute the determinant. In our case this turns out to reduce down to simply:

\begin{equation*} \det = (q\_x - p\_x)(r\_y - p\_y) - (q\_y - p\_y)(r\_x - p\_x) \end{equation*}

so the definition of our orientation() function using tuple coordinate pairs for each point becomes just:

def orientation(p, q, r):
    d = (q[0] - p[0]) * (r[1] - p[1]) - (q[1] - p[1]) * (r[0] - p[0])
    return sign(d)

Let's test this on on some examples. First we set up three points a, b and c:

>>> a = (0, 0)
>>> b = (4, 0)
>>> c = (4, 3)

Now we test the orientation of a âž” b âž” c:

>>> orientation(a, b, c)
1

This represents a left turn, so the function returns positive one. On the other hand the orientation of a âž” c âž” b is negative one:

>>> orientation(a, c, b)
-1

Let's introduce a fourth point d which is collinear with a and c. As expected our orientation() function returns zero for the group a âž” c âž” d:

>>> d = (8, 6)
>>> orientation(a, c, d)
0

So far so good. Everything we have done so far is using integer numbers which, in Python, have arbitrary precision. Our function only uses multiplication and subtraction, with no division to result in float values, so all of that precision is preserved. But what happens if we use floating point values as our input data? Let's try some different values using floats. Here are three points which lie on a diagonal line:

>>> e = (0.5, 0.5)
>>> f = (12.0, 12.0)
>>> g = (24.0, 24.0)

As we would expect, our orientation test determines that these points are collinear:

>>> orientation(e, f, g)
0

Furthermore, moving the point e up a little, by increasing its \(y\) coordinate by even a tiny amount, gives the answer we would expect:

>>> e = (0.5, 0.5000000000000018)
>>> orientation(e, f, g)
1

Now let's increase the \(y\) coordinate just a little more. In fact, we'll increase it by the smallest possible amount to the next representable [3] floating point number:

>>> e = (0.5, 0.5000000000000019)
>>> orientation(e, f, g)
0

Wow! According to our orientation function the points e, f and g are collinear again. This cannot possibly be! In fact, we can go through the next 23 successive floating point values up to,

>>> e = (0.5, 0.5000000000000044)
>>> orientation(e, f, g)
0

with our function still reporting that the three points are collinear, until we get to this value,

>>> e = (0.5, 0.5000000000000046)

at which point things settle down and become well behaved again:

>>> orientation(e, f, g)
1

What's happening here is that we've run into problems with the finite precision of Python floats at points very close the diagonal line, and the mathematical assumptions we make in our formula about how numbers work break down due to the fact that floating point numbers are a far from a perfect model of real numbers. Next time, we'll investigate more thoroughly, the behaviour of this orientation test at the limits of floating-point precision.

[1]The Fraction type which models rational numbers is defined in the Python Standard Library fractions module.
[2]Hayes, Brian. (2007) Writing Programs for "The Book". In: Oram, A. & Wilson, G., eds. Beautiful Code O'Reilly Media. pp. 539–551.
[3]In C we could use the nextafter() function to generate the next representable floating point number. Unfortunately, nextafter() is not exposed to Python. Various workarounds are available, including a version built into Numpy, directly calling the C version using ctypes and a pure Python implementation.

How to write Boost.Python type converters

Austin Bingham from Good With Computers

Boost.Python [1] makes it possible to write C++ that "feels" like Python. The library is powerful and sometimes subtle. This is as compared with the Python C API, where the experience is very far removed from writing Python code.

Part of making C++ feel more like Python is allowing natural assignment of C++ objects to Python variables. For instance, assigning an standard library string to a Python object looks like this:

// Create a C++ string
std::string msg("Hello, Python");

// Assign it to a python object
boost::python::object py_msg = msg;

Likewise (though somewhat less naturally), it is also important to be able to extract C++ objects from Python objects. Boost.Python provides the extract [2] type for this:

boost::python::object obj = ... ;
std::string msg = boost::python::extract(obj);

To allow this kind of natural assignment, Boost.Python provides a system for registering converters between the languages. Unfortunately, the Boost.Python documentation does a pretty poor job of describing how to write them. A bit of searching on the internet will turn up a few links. [3]

While these are fine (and, in truth, are the basis for what I know about the conversion system), they are not as explicit as I would like.

So, in an effort to clarify the conversion system both for myself and (hopefully) others, I wrote this little primer. I'll step through a full example showing how to write converters for Qt's QString [4] class. In the end, you should have all the information you need to write and register your own converters.

Converting QString

A Boost.Python type converter consists of two major parts. The first part, which is generally the simpler of the two, converts a C++ type into a Python type. I'll refer to this as the to-python converter. The second part converts a Python object into a C++ type. I'll refer to this as the from-python converter.

In order to have your converters be used at runtime, the Boost.Python framework requires you to register them. The Boost.Python API provides separate methods for registering to-python and from-python converters. Because of this, you are free to provide conversion in only one direction for a type if you so choose.

Note that, for certain elements of what I'm about to describe, there is more than one way to do things. For example, in some cases where I choose to use static member functions, you could also use free functions. I won't point these out, but if you wear your C++ thinking-cap you should be able to see what is mandatory and what isn't.

To-python Converters

A to-python converter converts a C++ type to a Python object. From an API perspective, a to-python converter is used any time that you construct a boost::python::object [5] from another C++ type. For example:

// Construct object from an int
boost::python::object int_obj(42);

// Construct object from a string
boost::python::object str_obj = std::string("llama");

// Construct object from a user-defined type
Foo foo;
boost::python::object foo_obj(foo);

You implement a to-python converter using a struct with static member function named convert(), which takes the C++ object to be converted as its argument, and it returns a PyObject*. A to-python converter for QStrings looks like this:

/* to-python convert to QStrings */
struct QString_to_python_str
{
    static PyObject* convert(QString const& s)
    {
        return boost::python::incref(
            boost::python::object(
                s.toLatin1().constData()).ptr());
    }
};

The crux what this does is as follows:

  1. Extract the QString's underlying character data using toLatin1().constData()
  2. Construct a boost::python::object with the character data
  3. Retrieve the boost::python::object's PyObject* with ptr()
  4. Increment the reference count on the PyObject* and return that pointer.

That last step bears a little explanation. Suppose that you didn't increment the reference count on the returned pointer. As soon as the function returned, the boost::python::object in the function would destruct, thereby reducing the ref-count to zero. When the PyObject's reference count goes to zero, Python will consider the object dead and it may be garbage-collected, meaning you would return a deallocated object from convert().

Once you've written the to-python converter for a type, you need to register it with Boost.Python's runtime. You do this with the aptly-named to_python_converter [6] template:

// register the QString-to-python converter
boost::python::to_python_converter<
    QString,
    QString_to_python_str>()

The first template parameter is the C++ type for which you're registering a converter. The second is the converter struct. Notice that this registration process is done at runtime; you need to call the registration functions before you try to do any custom type converting.

From-python Converters

From-python converters are slightly more complex because, beyond simply providing a function to convert from Python to C++, they also have to provide a function that determines if a Python type can safely be converted to the requested C++ type. Likewise, they often require more knowledge of the Python C API.

From-python converters are used whenever Boost.Python's extract type is called. For example:

// get an int from a python object
int x = boost::python::extract(int_obj);

// get an STL string from a python object
std::string s = boost::python::extract(str_obj);

// get a user-defined type from a python object
Foo foo = boost::python::extract(foo_obj);

The recipe I use for creating from-python converters is similar to to-python converters: create a struct with some static methods and register those with the Boost.Python runtime system.

The first method you'll need to define is used to determine whether an arbitrary Python object is convertible to the type you want to extract. If the conversion is OK, this function should return the PyObject*; otherwise, it should return NULL. So, for QStrings you would write:

struct QString_from_python_str
{

    . . .

    // Determine if obj_ptr can be converted in a QString
    static void* convertible(PyObject* obj_ptr)
    {
        if (!PyString_Check(obj_ptr)) return 0;
        return obj_ptr;
    }

    . . .

};

This simply says that a PyObject* can be converted to a QString if it is a Python string.

The second method you'll need to write does the actual conversion. The primary trick in this method is that Boost.Python will provide you with a chunk of memory into which you must in-place construct your new C++ object. All of the funny "rvalue_from_python" stuff just has to do with Boost.Python's method for providing you with that memory chunk:

struct QString_from_python_str
{

    . . .

    // Convert obj_ptr into a QString
    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        // Extract the character data from the python string
        const char* value = PyString_AsString(obj_ptr);

        // Verify that obj_ptr is a string (should be ensured by
        convertible())
        assert(value);

        // Grab pointer to memory into which to construct the new QString
        void* storage = (
            (boost::python::converter::rvalue_from_python_storage*)
            data)->storage.bytes;

        // in-place construct the new QString using the character data
        // extraced from the python object
        new (storage) QString(value);

        // Stash the memory chunk pointer for later use by boost.python
        data->convertible = storage;
    }

  . . .

};

The final step for from-python converters is, of course, to register the converter. To do this, you use boost::python::converter::registry::push_back(). [7] The first argument is a pointer to the function which tests for convertibility, the second is a pointer to the conversion function, and the third is a boost::python::type_id for the C++ type. In this case, we'll put the registration into the constructor for the struct we've been building up:

struct QString_from_python_str
{
    QString_from_python_str()
    {
        boost::python::converter::registry::push_back(
            &convertible,
            &construct,
            boost::python::type_id());
    }

    . . .

};

Now, if you simply construct a single QString_from_python_str object in your initialization code (just like you how you called to_python_converter() for the to-python registration), conversion from Python strings to QString will be enabled.

Taking a reference to the PyObject in convert()

One gotcha to be aware of in your construct() function is that the PyObject argument is a 'borrowed' reference. That is, its reference count has not already been incremented for you. [8] If you plan to keep a reference to that object, you must use Boost.Python's borrowed construct. For example:

class MyClass
{
public:
    MyClass(boost::python::object obj) : obj_ (obj) {}

private:
    boost::python::object obj_;
};

struct MyClass_from_python
{
    . . .

    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        using namespace boost::python;

        void* storage = (
            (converter::rvalue_from_python_storage*)
                data)->storage.bytes;

        // Use borrowed to construct the object so that a reference
        // count will be properly handled.
        handle<> hndl(borrowed(obj_ptr));
        new (storage) MyClass(object(hndl));

        data->convertible = storage;
    }
};

Failing to use borrowed() in this situation will generally lead to memory corruption and/or garbage collection errors in the Python runtime.

There are a number of useful resources on the web for finding more information on Boost.Python objects, handles, and reference counting. [9]

When converters don't exist

Finally, a cautionary note. The Boost.Python type-conversion system works well, not only at the job of moving objects across the C++-python languages barrier, but at making code easier to read and understand. You must always keep in mind, though, this comes at the cost of very little compile-time checking.

That is, the boost::python::object copy-constructor is templatized and accepts any type without complaint. This means that your code will compile just fine even if you're constructing boost::python::object s from types that have no registered converter. At runtime these constructors will find that they have no converter for the requested type, and this will result in exceptions.

These exceptions [10] will tend to happen in unexpected places, and you could spend quite a bit of time trying to figure them out. I say all of this so that maybe, when you encounter strange exceptions when using Boost.Python, you'll remember to check that your converters are registered first. Hopefully it'll save you some time.

Resources

Boost.Python is fairly complex and can be difficult to understand all at once. Here are few more useful resources that might help you come up to speed on this useful technology:

  • This IPython notebook-based tutorial covers a lot of the major (and some of the more obscure) topics in Boost.Python.
  • The Boost.Python wiki contains a lot of collected Boost.Python knowledge.
  • And of course, the Boost.Python documentation itself is very useful.

Appendix: Full code for QString converter

struct QString_to_python_str
{
    static PyObject* convert(QString const& s)
    {
        return boost::python::incref(
            boost::python::object(
                s.toLatin1().constData()).ptr());
    }
};

struct QString_from_python_str
{
    QString_from_python_str()
    {
        boost::python::converter::registry::push_back(
            &convertible,
            &construct,
            boost::python::type_id());
    }

    // Determine if obj_ptr can be converted in a QString
    static void* convertible(PyObject* obj_ptr)
    {
        if (!PyString_Check(obj_ptr)) return 0;
        return obj_ptr;
    }

    // Convert obj_ptr into a QString
    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        // Extract the character data from the python string
        const char* value = PyString_AsString(obj_ptr);

        // Verify that obj_ptr is a string (should be ensured by convertible())
        assert(value);

        // Grab pointer to memory into which to construct the new QString
        void* storage = (
            (boost::python::converter::rvalue_from_python_storage*)
            data)->storage.bytes;

        // in-place construct the new QString using the character data
        // extraced from the python object
        new (storage) QString(value);

        // Stash the memory chunk pointer for later use by boost.python
        data->convertible = storage;
    }
};

void initializeConverters()
{
    using namespace boost::python;

    // register the to-python converter
    to_python_converter<
        QString,
        QString_to_python_str>();

    // register the from-python converter
    QString_from_python_str();
}
[1]The Boost.Python homepage.
[2]boost::python::extract<> documentation.
[3]For example the Boost.Python FAQ.
[4]The Qt QString documentation.
[5]The boost::python::object documentation.
[6]The to_python_converter documentation.
[7]The boost::python::converter::registry documentation.
[8]Python reference counting details.
[9]For example, this discussion from the C++-sig discussion list, the Boost.Python documentation, and David Abrahams' guidelines. for handle<> on the Python wiki.))
[10]Boost.Python uniformly uses boost::python::error_already_set to communicate exceptions from Python to C++..