Rob Smallshire – Page 2 – ACCU World of Code

In this series we take an in-depth look at transducers. Transducers - a portmanteau of "transform reducers" - are a new functional programming concept introduced into the Clojure programming language. Although transducers are actually pretty straightforward in retrospect, wrapping your brain around them, especially if you're not already a competent Clojureist, can be challenging.

In this series, we introduce transducers by implementing them from scratch in everybody's favourite executable pseudocode, Python. We'll start with the familiar staples of functional programming, map(), filter() and reduce(), and derive transducers from first principles. We'll work towards a set of general tools which works with eager collections, lazy "pull" sequences, and "push" event streams. Along the way we"ll cover stateful transducers and transducer composition, demonstrating that transducers are both more general, and more fundamental, than the functional programming tools baked into Python and many other languages.

By the end of this series, not only should transducers make sense to you, but you"ll have a recipe for implementing transducers in your own favourite programming language.

September 11, 2014

Predictive Models of Development Teams and the Systems They Build

Rob Smallshire from Good With Computers

In 1968 Melvin Conway pointed out a seemingly inevitable symmetry between organisations and the software systems they construct. Organisations today are more fluid than 40 years ago, with short developer tenure, and frequent migration of individuals between projects and employers. In this article weâ€™ll examine data on the tenure and productivity of programmers and use this to gain insight into codebases, by simulating their growth with simple stochastic models. From such models, we can make important predictions about the maintainability and long-term viability of software systems, with implications for how we approach software design, documentation and how we assemble teams.

Legacy systems

I've always been interested in legacy software systems, primarily because legacy software systems are those which have proven to be valuable over time. The reason they become legacy â€“ and get old â€“ is because they continue to be useful.

I have an interest in that as a software engineer, having worked on legacy systems as an employee for various software product companies, and more recently as a consultant with Sixty North, helping out with the problems that such systems inevitably raise.

I called myself a "software engineer", although I use the term somewhat loosely. To call what many developers do "engineering" is a bit of a stretch. Engineer or not, my academic training was as a scientist, which is perhaps reflected in the content of this article. Most readers will be familar with the structure of the scientific method: We ask questions. We formulate hypotheses which propose answers to those questions. We design experiments to test our hypotheses. We collect data from the experiments. And we draw conclusions from the data. This done, having learned something about how our world works, we go round again.

I would like to be able to apply this powerful tool to what we do as "software engineers" or developers. Unfortunately for our industry, it's very difficult to do experimental science - still less randomised controlled trials â€“ on the process of software development, for a whole host of reasons: Developers don't like to be watched. We can't eliminate extraneous factors. The toy problems we use in experiments aren't realistic. No two projects are the same. The subjects are often students who have little experience.

Even on the rare occasions we do perform experiments, there are many threats to validity of such experiments, so the results tend not be to taken very seriously. Addressing the weaknesses of the experimental design would be prohibitively expensive, if possible at all.

The role of models

Fortunately, there's another way of doing science, which doesn't rely on the version of the scientific method just mentioned. It's the same type of science we do in astronomy, or geology where we can't run experiments because we don't have enough time, we don't have enough money, or we just don't have anywhere big enough to perform the experiment. Experimentally colliding galaxies, or experimenting with the initiation of plate tectonics are simply in the realms of science fiction on the money, time and space axes.

In such cases, we have to switch to a slightly different version of the scientific method, which looks like this: We make a prediction about how the universe works, where our 'universe' could be galactic collisions, or the more prosaic world of software development. We then make a model of that situation either through physical analogy or in a computer. By executing this model we can predict the outcome based on the details of a specific scenario. Lastly, we compare the results from the model with reality and either reject the model completely if it just doesn't work, or tune the model by refining, updating or tweaking it, until we have a model that is a good match for reality.

The aim here, is to come up with a useful model. Models are by their very nature simplifications or abstractions of reality. So long as we bear this in mind, even a simple (although not simplistic) model can have predictive power.

Essentially, all models are wrong, but some are useful

—George E. P. Box [Box, G. E. P., and Draper, N. R., (1987), Empirical Model Building and Response Surfaces, John Wiley & Sons, New York, NY.]

A model of software development

A key factor in modelling the development of legacy software systems is the fact that although such systems may endure for decades - we developers tend not to endure them for decades. In other words, the tenure of software developers is typically much shorter than the life span of software systems.

But how much shorter?

Whenever I speak publically on this topic with an audience of developers, I like to perform a simple experiment with my audience. My assumption is that the turnover of developers can be modelled as if developers have a half-life within organizations. The related concept of residence time [1] is probably a better approach, but most people have a grasp of half-life, and it avoids a tedious digression into explaining something that is ultimately tangiential to the main discussion. In any case, a catchy hook is important when you're going for audience participation, so half-life it is.

I start by asking everyone in the audience who has moved from working on one codebase â€“ for example a product â€“ to another (including the transition to working on their first codebase), at any time in the preceding 32 years to raise their hands. This starting point is intended to catch the vast majority of typical tech conference audience members, and indeed it does, although arguably in the spirit of inclusiveness I should start with 64 years. Now all of the audience have raised hands.

Next I ask the audience to keep their hands raised if this is still true for 16 years: Most of the hands are still raised. Now eight years: Some hands are going down. Now four years: A small majority of hands are still raised. Then two years: At this point, a large minority still have raised hands, but we have crossed the half-way threshold and established that the 'half-life' of developers is somewhere between two and four years. This result has been consistent on over 90% of the occasions I've performed this experiment. The only notable deviation was a large Swedish software consultancy where we established the half-life was around six months!

In fact, what little research there has been into developer tenure indicates that the half-life across the industry is about 3.2 years, which fits nicely with what I see out in the field.

One way to think about this result concretely is as follows: If you work on a team that numbers ten developers in total, you can expect half of them - five in this case - to leave at some point in the next 3.2 years. Obviously, if the team size is to remain constant, they will need to be replaced.

Note that saying that turnover of half of a population of developers will take 3.2 years is not the same as claiming that the average tenure of a developer is 3.2 years. In fact, mean tenure will be \(3.2 / \\ln 2\) which is about 4.6 years. You might want to compare that figure against your own career so far.

If you're concerned that developers don't behave very much like radionucleides then rest assured that the notion of half-life follows directly from an assumption that the decay of a particle (or departure of a developer) follows exponential decay, which again follows from the notion of constant probability density with respect to time. All we're saying is that in a given time interval there is a fixed probability that are particle will decay (or a developer will depart), so it is actually a very simple model.

Notably, the half-life of developers is shorter than the half-life of almost anything else in our industry, including CEOs, lines of code, megacorps or classes.

Half-lives in years of various entities in and around the software industry. Developers are one of the most short-lived.

Productivity

If we're going to have a stab a modelling software developers as part of the software development process, we're going to need some measure of productivity. I'm going to use - and you can save your outrage for later - lines of code. To repurpose a famous phrase by Winston Churchill: "Lines of code is the worst programmer productivty metric, except for all the others". Now I know, as well as you do, that what ultimately matters to the customers of software systems is value for money, return on investment, and all those other good things. The problem is, that it's notoriously hard to tie any of those things back in a rigourous way to what individual developers do on a day-to-day basis, which should be design and code software systems based on an understanding of the problem at hand. On the other hand, I think I'm on fairly safe ground in assuming that software systems with zero lines of code deliver no value, and proportionally more complex problems can be solved (and hence more value delivered) by larger software systems.

Furthermore, there's some evidence that the number of lines of code cut by a particular developer per day is fairly constant irrespective of which programming language they're working in. So five lines of F# might do twice as much 'work' as 10 lines of Python or 20 lines of C++. This is an alternative phrasing of the notion of 'expressiveness' in programming languages. This is why we tend to feel that expressiveness - or semantic density - is important in programming languages. We can often deliver as much value with 5 lines of F# as with 20 lines of C++, yet it will take a quarter of the effort to put together.

Now, however you choose to measure productivity, not all developers are equally productive on the same code base, and the same developer will demonstrate different productivity on different code bases, even if they are in the same programming language. In fact, as I'm sure many of us have experienced, the principle control on our productivity is simply the size of the code base at hand. Writing a hundred lines of code for a small script is usually much less work than adding 100 lines to a one million line system.

We can capture this variance by looking to what little literature there is on the topic [2], and using this albeit sparse data to build some simple developer productivity distributions.

For example, we know that on a small 10,000 line code base, the least productive developer will produce about 2000 lines of debugged and working code in a year, the most productive developer will produce about 29,000 lines of code in a year, and the typical (or average) developer will produce about 3200 lines of code in a year. Notice that the distribution is highly skewed toward the low productivity end, and the multiple between the typical and most productive developers corresponds to the fabled 10x programmer.

Given only these three numbers and in the absence of any more information on the shape of the distribution, we'll follow a well-trodden path and use them to erect a triangular probability density function (PDF) characterised by the minimum, modal and maximum productivity values. Based on this PDF it's straightforward to compute the corresponding cumulative distribution function (CDF) which we can use to construct simulated "teams" of developers, by using the CDF to transform uniformly distributed samples on the cumulative probability axis into samples on the producivity axis. In a real simulation where we wanted to generate many typical teams, we would generate uniform random numbers between zero and one and transform them into productivity values using the CDF, although for clarity in the illustration that follows, I've used evenly distributed samples from which to generate the productivity values.

Programmer productivity in lines of code per year for a team of ten developers on a 10000 line project.

As you can see the resulting productivity values for a team of ten developers cluster around the modal productity value, with comparitavely few developers of very high productivity.

Perhaps more intuitively, software development of teams comprising ten developers look like this:

A typical team of ten developers would look like this, if their contributions in lines of code were represented as circular areas.

This typical team has a only a couple of people being responsible for the majority of the output. Again, it might be interesting to compare this to your own situation. At the very least, it shows how the 'right' team of two developers can be competitive with a much larger team; a phenomenon you may have witnessed for yourselves.

Overall, this team produces about 90,000 lines of code in a year.

Incorporating growth of software

Of course, the story doesn't end there. Once our team has written 90,000 lines of code, they're no longer working on a 10,000 line code base, they're working on a 100,000 line code base! This causes their productivity to drop, so we now have a modified description of their productivities and a new distribution from which to draw a productivity if somebody new joins the team. But more of that in a moment. We don't have much in the way of published data for productivity on different sizes of code base, but we can interpolate between and extrapolate from the data we do have, without any of the assumptions involved in such extrapolation looking too outlandish. As you can see, we can put three straight line through the minimums, modes and maximums respectively to facilitate determination of a productivity distribution for any code size up to about 10 million lines of code. (Note that we shouldn't infer from these straight lines anything about the mathematical nature of how productivity declines with increasing code size - there be dragons! [3])

Productivity is a function of codebase size. Developers are dramatically less productive on larger bodies of code.

When performing a simulation of growth of software in the computer, we can get more accurate results by reducing the time-step on which we adjust programmer productivity downwards from once per year as in the example above, to just once per day: At the end of every simulated day, we know how much code we have, so we can predict the productivity of our developers on the following day, and so on.

Incorporating turnover

We've already stated our assumption that the probability of a developer departing is constant per unit time, together with our half-life figure of 3.2 years. Given this, it's straightforward to compute the probability of a developer leaving on any given day, which is about 0.001, or once in every thousand days. As we all know, when a particular developer leaves an organisation and is replaced by a new recruit, there's no guarantee that their replacement will have the same level of productivity. In this event, our simulation will draw a new developer at random from the distribution of developer productivity for the current code base size, so it's likely that a very low productivity developer will be replaced with a higher productivity developer and that a very high productivity developer will be replaced with a lower productivity developer; an example of regression to mediocrity. [4]

Simulating a project

With components of variance in developer productivity, its relationship to code base size and a simple model of developer turnover we're ready to run a simulation of a project. To do so, we initialize the model with the number of developers in the development team, and set it running. The simulator starts by randomly drawing a team of developers of the required size from the productivity distribution for a zero-size code base, and computes how much code they will have produced after one day. At the end of the time step, the developer productivities are updated to the new distribution; each developer's quantile within the distribution remains fixed, but the underlying CDF is updated to yield a new productivity value. The next time step for day two then begins, with each developer producing a little less code than on the previous day.

On each day, there is a fixed probability that a developer will leave the team. When this occurs, they are immediately replaced the following day by a new hire whose productivity will be drawn anew from the productivity distribution. For small teams, this case shift the overall team productivity significantly and more often than not towards the mean.

Let's look at an example: If we configure a simulation with a team of seven developers, and let it run for five years, we get something like this:

Streamed code contributions of a team of seven developers over five years. A total of 19 people contribute to this codebase.

This figure has time running from left to right, and the coloured streams show the growing contributions over time of individual developers. We start on the left with no code and the original seven developers in the team, from top to bottom sporting the colours brown, orange, green, red, blue, purple and yellow. The code base grows quickly at first, but soon slows. About 180 days into the project the purple developer quits, indicated by a black, vertical bar across their stream. From this point on, their contribution remains static and is shown in a lighter shade. Vertically below this terminator we see a new stream beginning, coloured pink, which represents the contribution of the recruit who is purple's replacement. As you can see, they are about three times more productive (measured in lines of code at least), than their predecessor, although pink only sticks around for around 200 days before moving on and being replaced by the upper blue stream.

In this particular scenario, at the end of the five year period, we find our team of seven has churned through a grand total of 19 developers. In fact the majority of the extant code was written by people no longer with the organisation; only 37% of the code was programmed by people still present at the end. This is perhaps motivation for getting documentation in place as systems are developed, while the people who are doing the development are still around, rather than at the end of the effort - if at all - as is all to common.

Monte Carlo simulation

Being drawn randomly, each scenario such as the one outlines above is different, although in aggregate they vary in a predictable way according to the distributions we are using. The above scenario was typical, insofar as it produced, compared to all identically configured simulations, an average amount of code, although it did happen to get through a rather high number of developers. Of course, individual scenarios such as this, although interesting, can never be indicative of what will actually happen. For that, we need to turn to Monte Carlo modelling: Run many thousands of simulations - all with configurations drawn randomly from identical distributions - and look at the results in aggregate either graphically or using various statistical tools.

When we run 1000 simulations of a seven person project run over three years, the following statistics emerge: We can expect our team of seven to see four people leave and be replaced during the project. In fact, the total number of contributors will be 11 Â± 2 at one standard deviation (1Ïƒ). The total body of code produced in three years will be 157,000 Â± 23,000 @ 1Ïƒ. The proportion of the code written by contributors present at the end will be 70% Â± 14% @ 1Ïƒ.

Perhaps a more useful question might be to ask "How long is it likely to take to produce 100,000 lines of code?" By answering this question for each simulation, we can build a histogram (actually we use a kernel density estimate here, to give a smooth, rather than binned, result).

How long does it take a team of seven to deliver one-hundred thousand lines of code?

Although this gives a good intuitive sense of when the team will reach the 100 k threshold, a more useful chart is the cumulative distribution of finishing time, which allows us to easily recognise that while there is a probability of 20% of finishing in 330 days, for a much more secure 80% probability, we should allow for 470 days - some 42% longer and correspondingly more costly.

Cumulative distribution of 100,000 LOC delivery times

Cumulative distribution function showing probability of delivery of one-hundred thousand lines of code before a particular day. Based on 10 000 simulations.

Finally, looking at the proportion of the code base that was, at any time, written by the current team, we see an exponential decline in this fraction, leaving us with a headline figure of 20% after 20 years.

Proportion of code written by current team

The proportion of code written by the current team. In other words, for how much of your code can you easily talk to the author? Blue and green lines show plus and minus one standard deviation around the mean. Based on 10,000 simulations.

That's right, on a 20 year old code base only one fifth of the code will have been created by the current team. This resonates with my own experience, and quantitatively explains why working on large legacy systems can be a lonely, disorienting and confusing experience.

July 3, 2014

Rational Computational Geometry in Python

Rob Smallshire from Good With Computers

In the previous article, we looked at how a standard technique for determining the collinearity of points, based on computing the sign of the area of the triangle formed by two points on the line and a third query point. We discovered, that when used with Python's float type [1] the routine was unreliable in a region close to the line. This shortcoming has nothing to do with Python specifically and everything to do with the finite precision of the float number type. This time, we'll examine the behaviour of the algorithm more systematically using the following program:

def sign(x):
    """Determine the sign of x.

    Returns:
        -1 if x is negative, +1 if x is positive or 0 if x is zero.
    """
    return (x > 0) - (x < 0)


def orientation(p, q, r):
    """Determine the orientation of three points in the plane.

    Args:
      p, q, r: Two-tuples representing coordinate pairs of three points.

    Returns:
        -1 if p, q, r is a turn to the right, +1 if p, q, r is a turn to the
        left, otherwise 0 if p, q, and r are collinear.
    """
    d = (q&#91;0&#93; - p&#91;0&#93;) * (r&#91;1&#93; - p&#91;1&#93;) - (q&#91;1&#93; - p&#91;1&#93;) * (r&#91;0&#93; - p&#91;0&#93;)
    return sign(d)


def main():
    """
    Test whether points immediately above and below the point (0.5, 0.5)
    lie above, on, or below the line through (12.0, 12.0) and (24.0, 24.0).
    """
    px = 0.5

    pys = 0.49999999999999,
          0.49999999999999006,
          0.4999999999999901,
          0.4999999999999902,
          0.49999999999999023,
          0.4999999999999903,
          0.49999999999999034,
          0.4999999999999904,
          0.49999999999999045,
          0.4999999999999905,
          0.49999999999999056,
          0.4999999999999906,
          0.4999999999999907,
          0.49999999999999073,
          0.4999999999999908,
          0.49999999999999084,
          0.4999999999999909,
          0.49999999999999095,
          0.499999999999991,
          0.49999999999999106,
          0.4999999999999911,
          0.4999999999999912,
          0.49999999999999123,
          0.4999999999999913,
          0.49999999999999134,
          0.4999999999999914,
          0.49999999999999145,
          0.4999999999999915,
          0.49999999999999156,
          0.4999999999999916,
          0.4999999999999917,
          0.49999999999999173,
          0.4999999999999918,
          0.49999999999999184,
          0.4999999999999919,
          0.49999999999999195,
          0.499999999999992,
          0.49999999999999206,
          0.4999999999999921,
          0.4999999999999922,
          0.49999999999999223,
          0.4999999999999923,
          0.49999999999999234,
          0.4999999999999924,
          0.49999999999999245,
          0.4999999999999925,
          0.49999999999999256,
          0.4999999999999926,
          0.4999999999999927,
          0.49999999999999273,
          0.4999999999999928,
          0.49999999999999284,
          0.4999999999999929,
          0.49999999999999295,
          0.499999999999993,
          0.49999999999999306,
          0.4999999999999931,
          0.49999999999999317,
          0.4999999999999932,
          0.4999999999999933,
          0.49999999999999334,
          0.4999999999999934,
          0.49999999999999345,
          0.4999999999999935,
          0.49999999999999356,
          0.4999999999999936,
          0.49999999999999367,
          0.4999999999999937,
          0.4999999999999938,
          0.49999999999999384,
          0.4999999999999939,
          0.49999999999999395,
          0.499999999999994,
          0.49999999999999406,
          0.4999999999999941,
          0.49999999999999417,
          0.4999999999999942,
          0.4999999999999943,
          0.49999999999999434,
          0.4999999999999944,
          0.49999999999999445,
          0.4999999999999945,
          0.49999999999999456,
          0.4999999999999946,
          0.49999999999999467,
          0.4999999999999947,
          0.4999999999999948,
          0.49999999999999484,
          0.4999999999999949,
          0.49999999999999495,
          0.499999999999995,
          0.49999999999999506,
          0.4999999999999951,
          0.49999999999999517,
          0.4999999999999952,
          0.4999999999999953,
          0.49999999999999534,
          0.4999999999999954,
          0.49999999999999545,
          0.4999999999999955,
          0.49999999999999556,
          0.4999999999999956,
          0.49999999999999567,
          0.4999999999999957,
          0.4999999999999958,
          0.49999999999999584,
          0.4999999999999959,
          0.49999999999999595,
          0.499999999999996,
          0.49999999999999606,
          0.4999999999999961,
          0.49999999999999617,
          0.4999999999999962,
          0.4999999999999963,
          0.49999999999999634,
          0.4999999999999964,
          0.49999999999999645,
          0.4999999999999965,
          0.49999999999999656,
          0.4999999999999966,
          0.49999999999999667,
          0.4999999999999967,
          0.4999999999999968,
          0.49999999999999684,
          0.4999999999999969,
          0.49999999999999695,
          0.499999999999997,
          0.49999999999999706,
          0.4999999999999971,
          0.49999999999999717,
          0.4999999999999972,
          0.4999999999999973,
          0.49999999999999734,
          0.4999999999999974,
          0.49999999999999745,
          0.4999999999999975,
          0.49999999999999756,
          0.4999999999999976,
          0.49999999999999767,
          0.4999999999999977,
          0.4999999999999978,
          0.49999999999999784,
          0.4999999999999979,
          0.49999999999999795,
          0.499999999999998,
          0.49999999999999806,
          0.4999999999999981,
          0.49999999999999817,
          0.4999999999999982,
          0.4999999999999983,
          0.49999999999999833,
          0.4999999999999984,
          0.49999999999999845,
          0.4999999999999985,
          0.49999999999999856,
          0.4999999999999986,
          0.49999999999999867,
          0.4999999999999987,
          0.4999999999999988,
          0.49999999999999883,
          0.4999999999999989,
          0.49999999999999895,
          0.499999999999999,
          0.49999999999999906,
          0.4999999999999991,
          0.49999999999999917,
          0.4999999999999992,
          0.4999999999999993,
          0.49999999999999933,
          0.4999999999999994,
          0.49999999999999944,
          0.4999999999999995,
          0.49999999999999956,
          0.4999999999999996,
          0.49999999999999967,
          0.4999999999999997,
          0.4999999999999998,
          0.49999999999999983,
          0.4999999999999999,
          0.49999999999999994,  # The previous representable float < 0.5
          0.5,
          0.5000000000000001,   # The next representable float > 0.5
          0.5000000000000002,
          0.5000000000000003,
          0.5000000000000004,
          0.5000000000000006,
          0.5000000000000007,
          0.5000000000000008,
          0.5000000000000009,
          0.500000000000001,
          0.5000000000000011,
          0.5000000000000012,
          0.5000000000000013,
          0.5000000000000014,
          0.5000000000000016,
          0.5000000000000017,
          0.5000000000000018,
          0.5000000000000019,
          0.500000000000002,
          0.5000000000000021,
          0.5000000000000022,
          0.5000000000000023,
          0.5000000000000024,
          0.5000000000000026,
          0.5000000000000027,
          0.5000000000000028,
          0.5000000000000029,
          0.500000000000003,
          0.5000000000000031,
          0.5000000000000032,
          0.5000000000000033,
          0.5000000000000034,
          0.5000000000000036,
          0.5000000000000037,
          0.5000000000000038,
          0.5000000000000039,
          0.500000000000004,
          0.5000000000000041,
          0.5000000000000042,
          0.5000000000000043,
          0.5000000000000044,
          0.5000000000000046,
          0.5000000000000047,
          0.5000000000000048,
          0.5000000000000049,
          0.500000000000005,
          0.5000000000000051,
          0.5000000000000052,
          0.5000000000000053,
          0.5000000000000054,
          0.5000000000000056,
          0.5000000000000057,
          0.5000000000000058,
          0.5000000000000059,
          0.500000000000006,
          0.5000000000000061,
          0.5000000000000062,
          0.5000000000000063,
          0.5000000000000064,
          0.5000000000000066,
          0.5000000000000067,
          0.5000000000000068,
          0.5000000000000069,
          0.500000000000007,
          0.5000000000000071,
          0.5000000000000072,
          0.5000000000000073,
          0.5000000000000074,
          0.5000000000000075,
          0.5000000000000077,
          0.5000000000000078,
          0.5000000000000079,
          0.500000000000008,
          0.5000000000000081,
          0.5000000000000082,
          0.5000000000000083,
          0.5000000000000084,
          0.5000000000000085,
          0.5000000000000087,
          0.5000000000000088,
          0.5000000000000089,
          0.500000000000009,
          0.5000000000000091,
          0.5000000000000092,
          0.5000000000000093,
          0.5000000000000094,
          0.5000000000000095,
          0.5000000000000097,
          0.5000000000000098,
          0.5000000000000099,
          0.50000000000001]

    q = (12.0, 12.0)
    r = (24.0, 24.0)

    for py in pys:
        p = (px, py)
        o = orientation(p, q, r)
        print("orientation(({p[0]:>3}, {p[1]:<19}) q, r) -> {o:>2}".format(
              p=p, o=o))


if __name__  == '__main__':
    main()

The program includes definitions of our sign() and orientation() functions, together with a main() function which runs the test. The main function includes a list of the 271 nearest representable \(y\)-coordinate values to 0.5. We haven't included the code to generate these values successive float values because it's somewhat besides the point; we're referenced the necessary technique in the previous article.

The program iterates over these py values and performs the orientation test each time, printing the result. The complex format string is used to get readable output which lines up in columns. When we look at that output we see an intricate pattern of results emerge, which isn't even symmetrical around the central 0.5 value:

orientation((0.5, 0.50000000000001   ) q, r) ->  1
orientation((0.5, 0.5000000000000099 ) q, r) ->  1
orientation((0.5, 0.5000000000000098 ) q, r) ->  1
orientation((0.5, 0.5000000000000097 ) q, r) ->  1
orientation((0.5, 0.5000000000000095 ) q, r) ->  1
orientation((0.5, 0.5000000000000094 ) q, r) ->  1
orientation((0.5, 0.5000000000000093 ) q, r) ->  1
orientation((0.5, 0.5000000000000092 ) q, r) ->  1
orientation((0.5, 0.5000000000000091 ) q, r) ->  1
orientation((0.5, 0.500000000000009  ) q, r) ->  1
orientation((0.5, 0.5000000000000089 ) q, r) ->  1
orientation((0.5, 0.5000000000000088 ) q, r) ->  1
orientation((0.5, 0.5000000000000087 ) q, r) ->  1
orientation((0.5, 0.5000000000000085 ) q, r) ->  1
orientation((0.5, 0.5000000000000084 ) q, r) ->  1
orientation((0.5, 0.5000000000000083 ) q, r) ->  1
orientation((0.5, 0.5000000000000082 ) q, r) ->  1
orientation((0.5, 0.5000000000000081 ) q, r) ->  1
orientation((0.5, 0.500000000000008  ) q, r) ->  1
orientation((0.5, 0.5000000000000079 ) q, r) ->  1
orientation((0.5, 0.5000000000000078 ) q, r) ->  1
orientation((0.5, 0.5000000000000077 ) q, r) ->  1
orientation((0.5, 0.5000000000000075 ) q, r) ->  1
orientation((0.5, 0.5000000000000074 ) q, r) ->  1
orientation((0.5, 0.5000000000000073 ) q, r) ->  1
orientation((0.5, 0.5000000000000072 ) q, r) ->  1
orientation((0.5, 0.5000000000000071 ) q, r) ->  1
orientation((0.5, 0.500000000000007  ) q, r) ->  1
orientation((0.5, 0.5000000000000069 ) q, r) ->  1
orientation((0.5, 0.5000000000000068 ) q, r) ->  1
orientation((0.5, 0.5000000000000067 ) q, r) ->  1
orientation((0.5, 0.5000000000000066 ) q, r) ->  1
orientation((0.5, 0.5000000000000064 ) q, r) ->  1
orientation((0.5, 0.5000000000000063 ) q, r) ->  1
orientation((0.5, 0.5000000000000062 ) q, r) ->  1
orientation((0.5, 0.5000000000000061 ) q, r) ->  1
orientation((0.5, 0.500000000000006  ) q, r) ->  1
orientation((0.5, 0.5000000000000059 ) q, r) ->  1
orientation((0.5, 0.5000000000000058 ) q, r) ->  1
orientation((0.5, 0.5000000000000057 ) q, r) ->  1
orientation((0.5, 0.5000000000000056 ) q, r) ->  1
orientation((0.5, 0.5000000000000054 ) q, r) ->  1
orientation((0.5, 0.5000000000000053 ) q, r) ->  1
orientation((0.5, 0.5000000000000052 ) q, r) ->  1
orientation((0.5, 0.5000000000000051 ) q, r) ->  1
orientation((0.5, 0.500000000000005  ) q, r) ->  1
orientation((0.5, 0.5000000000000049 ) q, r) ->  1
orientation((0.5, 0.5000000000000048 ) q, r) ->  1
orientation((0.5, 0.5000000000000047 ) q, r) ->  1
orientation((0.5, 0.5000000000000046 ) q, r) ->  1
orientation((0.5, 0.5000000000000044 ) q, r) ->  0
orientation((0.5, 0.5000000000000043 ) q, r) ->  0
orientation((0.5, 0.5000000000000042 ) q, r) ->  0
orientation((0.5, 0.5000000000000041 ) q, r) ->  0
orientation((0.5, 0.500000000000004  ) q, r) ->  0
orientation((0.5, 0.5000000000000039 ) q, r) ->  0
orientation((0.5, 0.5000000000000038 ) q, r) ->  0
orientation((0.5, 0.5000000000000037 ) q, r) ->  0
orientation((0.5, 0.5000000000000036 ) q, r) ->  0
orientation((0.5, 0.5000000000000034 ) q, r) ->  0
orientation((0.5, 0.5000000000000033 ) q, r) ->  0
orientation((0.5, 0.5000000000000032 ) q, r) ->  0
orientation((0.5, 0.5000000000000031 ) q, r) ->  0
orientation((0.5, 0.500000000000003  ) q, r) ->  0
orientation((0.5, 0.5000000000000029 ) q, r) ->  0
orientation((0.5, 0.5000000000000028 ) q, r) ->  0
orientation((0.5, 0.5000000000000027 ) q, r) ->  0
orientation((0.5, 0.5000000000000026 ) q, r) ->  0
orientation((0.5, 0.5000000000000024 ) q, r) ->  0
orientation((0.5, 0.5000000000000023 ) q, r) ->  0
orientation((0.5, 0.5000000000000022 ) q, r) ->  0
orientation((0.5, 0.5000000000000021 ) q, r) ->  0
orientation((0.5, 0.500000000000002  ) q, r) ->  0
orientation((0.5, 0.5000000000000019 ) q, r) ->  0
orientation((0.5, 0.5000000000000018 ) q, r) ->  1
orientation((0.5, 0.5000000000000017 ) q, r) ->  1
orientation((0.5, 0.5000000000000016 ) q, r) ->  1
orientation((0.5, 0.5000000000000014 ) q, r) ->  1
orientation((0.5, 0.5000000000000013 ) q, r) ->  1
orientation((0.5, 0.5000000000000012 ) q, r) ->  1
orientation((0.5, 0.5000000000000011 ) q, r) ->  1
orientation((0.5, 0.500000000000001  ) q, r) ->  1
orientation((0.5, 0.5000000000000009 ) q, r) ->  0
orientation((0.5, 0.5000000000000008 ) q, r) ->  0
orientation((0.5, 0.5000000000000007 ) q, r) ->  0
orientation((0.5, 0.5000000000000006 ) q, r) ->  0
orientation((0.5, 0.5000000000000004 ) q, r) ->  0
orientation((0.5, 0.5000000000000003 ) q, r) ->  0
orientation((0.5, 0.5000000000000002 ) q, r) ->  0
orientation((0.5, 0.5000000000000001 ) q, r) ->  0
orientation((0.5, 0.5                ) q, r) ->  0
orientation((0.5, 0.49999999999999994) q, r) ->  0
orientation((0.5, 0.4999999999999999 ) q, r) ->  0
orientation((0.5, 0.49999999999999983) q, r) ->  0
orientation((0.5, 0.4999999999999998 ) q, r) ->  0
orientation((0.5, 0.4999999999999997 ) q, r) ->  0
orientation((0.5, 0.49999999999999967) q, r) ->  0
orientation((0.5, 0.4999999999999996 ) q, r) ->  0
orientation((0.5, 0.49999999999999956) q, r) ->  0
orientation((0.5, 0.4999999999999995 ) q, r) ->  0
orientation((0.5, 0.49999999999999944) q, r) ->  0
orientation((0.5, 0.4999999999999994 ) q, r) ->  0
orientation((0.5, 0.49999999999999933) q, r) ->  0
orientation((0.5, 0.4999999999999993 ) q, r) ->  0
orientation((0.5, 0.4999999999999992 ) q, r) ->  0
orientation((0.5, 0.49999999999999917) q, r) ->  0
orientation((0.5, 0.4999999999999991 ) q, r) ->  0
orientation((0.5, 0.49999999999999906) q, r) -> -1
orientation((0.5, 0.499999999999999  ) q, r) -> -1
orientation((0.5, 0.49999999999999895) q, r) -> -1
orientation((0.5, 0.4999999999999989 ) q, r) -> -1
orientation((0.5, 0.49999999999999883) q, r) -> -1
orientation((0.5, 0.4999999999999988 ) q, r) -> -1
orientation((0.5, 0.4999999999999987 ) q, r) -> -1
orientation((0.5, 0.49999999999999867) q, r) -> -1
orientation((0.5, 0.4999999999999986 ) q, r) -> -1
orientation((0.5, 0.49999999999999856) q, r) -> -1
orientation((0.5, 0.4999999999999985 ) q, r) -> -1
orientation((0.5, 0.49999999999999845) q, r) -> -1
orientation((0.5, 0.4999999999999984 ) q, r) -> -1
orientation((0.5, 0.49999999999999833) q, r) -> -1
orientation((0.5, 0.4999999999999983 ) q, r) -> -1
orientation((0.5, 0.4999999999999982 ) q, r) -> -1
orientation((0.5, 0.49999999999999817) q, r) ->  0
orientation((0.5, 0.4999999999999981 ) q, r) ->  0
orientation((0.5, 0.49999999999999806) q, r) ->  0
orientation((0.5, 0.499999999999998  ) q, r) ->  0
orientation((0.5, 0.49999999999999795) q, r) ->  0
orientation((0.5, 0.4999999999999979 ) q, r) ->  0
orientation((0.5, 0.49999999999999784) q, r) ->  0
orientation((0.5, 0.4999999999999978 ) q, r) ->  0
orientation((0.5, 0.4999999999999977 ) q, r) ->  0
orientation((0.5, 0.49999999999999767) q, r) ->  0
orientation((0.5, 0.4999999999999976 ) q, r) ->  0
orientation((0.5, 0.49999999999999756) q, r) ->  0
orientation((0.5, 0.4999999999999975 ) q, r) ->  0
orientation((0.5, 0.49999999999999745) q, r) ->  0
orientation((0.5, 0.4999999999999974 ) q, r) ->  0
orientation((0.5, 0.49999999999999734) q, r) ->  0
orientation((0.5, 0.4999999999999973 ) q, r) ->  0
orientation((0.5, 0.4999999999999972 ) q, r) ->  0
orientation((0.5, 0.49999999999999717) q, r) ->  0
orientation((0.5, 0.4999999999999971 ) q, r) ->  0
orientation((0.5, 0.49999999999999706) q, r) ->  0
orientation((0.5, 0.499999999999997  ) q, r) ->  0
orientation((0.5, 0.49999999999999695) q, r) ->  0
orientation((0.5, 0.4999999999999969 ) q, r) ->  0
orientation((0.5, 0.49999999999999684) q, r) ->  0
orientation((0.5, 0.4999999999999968 ) q, r) ->  0
orientation((0.5, 0.4999999999999967 ) q, r) ->  0
orientation((0.5, 0.49999999999999667) q, r) ->  0
orientation((0.5, 0.4999999999999966 ) q, r) ->  0
orientation((0.5, 0.49999999999999656) q, r) ->  0
orientation((0.5, 0.4999999999999965 ) q, r) ->  0
orientation((0.5, 0.49999999999999645) q, r) ->  0
orientation((0.5, 0.4999999999999964 ) q, r) ->  0
orientation((0.5, 0.49999999999999634) q, r) ->  0
orientation((0.5, 0.4999999999999963 ) q, r) ->  0
orientation((0.5, 0.4999999999999962 ) q, r) ->  0
orientation((0.5, 0.49999999999999617) q, r) ->  0
orientation((0.5, 0.4999999999999961 ) q, r) ->  0
orientation((0.5, 0.49999999999999606) q, r) ->  0
orientation((0.5, 0.499999999999996  ) q, r) ->  0
orientation((0.5, 0.49999999999999595) q, r) ->  0
orientation((0.5, 0.4999999999999959 ) q, r) ->  0
orientation((0.5, 0.49999999999999584) q, r) ->  0
orientation((0.5, 0.4999999999999958 ) q, r) ->  0
orientation((0.5, 0.4999999999999957 ) q, r) ->  0
orientation((0.5, 0.49999999999999567) q, r) ->  0
orientation((0.5, 0.4999999999999956 ) q, r) ->  0
orientation((0.5, 0.49999999999999556) q, r) ->  0
orientation((0.5, 0.4999999999999955 ) q, r) -> -1
orientation((0.5, 0.49999999999999545) q, r) -> -1
orientation((0.5, 0.4999999999999954 ) q, r) -> -1
orientation((0.5, 0.49999999999999534) q, r) -> -1
orientation((0.5, 0.4999999999999953 ) q, r) -> -1
orientation((0.5, 0.4999999999999952 ) q, r) -> -1
orientation((0.5, 0.49999999999999517) q, r) -> -1
orientation((0.5, 0.4999999999999951 ) q, r) -> -1
orientation((0.5, 0.49999999999999506) q, r) -> -1
orientation((0.5, 0.499999999999995  ) q, r) -> -1
orientation((0.5, 0.49999999999999495) q, r) -> -1
orientation((0.5, 0.4999999999999949 ) q, r) -> -1
orientation((0.5, 0.49999999999999484) q, r) -> -1
orientation((0.5, 0.4999999999999948 ) q, r) -> -1
orientation((0.5, 0.4999999999999947 ) q, r) -> -1
orientation((0.5, 0.49999999999999467) q, r) -> -1
orientation((0.5, 0.4999999999999946 ) q, r) -> -1
orientation((0.5, 0.49999999999999456) q, r) -> -1
orientation((0.5, 0.4999999999999945 ) q, r) -> -1
orientation((0.5, 0.49999999999999445) q, r) -> -1
orientation((0.5, 0.4999999999999944 ) q, r) -> -1
orientation((0.5, 0.49999999999999434) q, r) -> -1
orientation((0.5, 0.4999999999999943 ) q, r) -> -1
orientation((0.5, 0.4999999999999942 ) q, r) -> -1
orientation((0.5, 0.49999999999999417) q, r) -> -1
orientation((0.5, 0.4999999999999941 ) q, r) -> -1
orientation((0.5, 0.49999999999999406) q, r) -> -1
orientation((0.5, 0.499999999999994  ) q, r) -> -1
orientation((0.5, 0.49999999999999395) q, r) -> -1
orientation((0.5, 0.4999999999999939 ) q, r) -> -1
orientation((0.5, 0.49999999999999384) q, r) -> -1
orientation((0.5, 0.4999999999999938 ) q, r) -> -1
orientation((0.5, 0.4999999999999937 ) q, r) -> -1
orientation((0.5, 0.49999999999999367) q, r) -> -1
orientation((0.5, 0.4999999999999936 ) q, r) -> -1
orientation((0.5, 0.49999999999999356) q, r) -> -1
orientation((0.5, 0.4999999999999935 ) q, r) -> -1
orientation((0.5, 0.49999999999999345) q, r) -> -1
orientation((0.5, 0.4999999999999934 ) q, r) -> -1
orientation((0.5, 0.49999999999999334) q, r) -> -1
orientation((0.5, 0.4999999999999933 ) q, r) -> -1
orientation((0.5, 0.4999999999999932 ) q, r) -> -1
orientation((0.5, 0.49999999999999317) q, r) -> -1
orientation((0.5, 0.4999999999999931 ) q, r) -> -1
orientation((0.5, 0.49999999999999306) q, r) -> -1
orientation((0.5, 0.499999999999993  ) q, r) -> -1
orientation((0.5, 0.49999999999999295) q, r) -> -1
orientation((0.5, 0.4999999999999929 ) q, r) -> -1
orientation((0.5, 0.49999999999999284) q, r) -> -1
orientation((0.5, 0.4999999999999928 ) q, r) -> -1
orientation((0.5, 0.49999999999999273) q, r) -> -1
orientation((0.5, 0.4999999999999927 ) q, r) -> -1
orientation((0.5, 0.4999999999999926 ) q, r) -> -1
orientation((0.5, 0.49999999999999256) q, r) -> -1
orientation((0.5, 0.4999999999999925 ) q, r) -> -1
orientation((0.5, 0.49999999999999245) q, r) -> -1
orientation((0.5, 0.4999999999999924 ) q, r) -> -1
orientation((0.5, 0.49999999999999234) q, r) -> -1
orientation((0.5, 0.4999999999999923 ) q, r) -> -1
orientation((0.5, 0.49999999999999223) q, r) -> -1
orientation((0.5, 0.4999999999999922 ) q, r) -> -1
orientation((0.5, 0.4999999999999921 ) q, r) -> -1
orientation((0.5, 0.49999999999999206) q, r) -> -1
orientation((0.5, 0.499999999999992  ) q, r) -> -1
orientation((0.5, 0.49999999999999195) q, r) -> -1
orientation((0.5, 0.4999999999999919 ) q, r) -> -1
orientation((0.5, 0.49999999999999184) q, r) -> -1
orientation((0.5, 0.4999999999999918 ) q, r) -> -1
orientation((0.5, 0.49999999999999173) q, r) -> -1
orientation((0.5, 0.4999999999999917 ) q, r) -> -1
orientation((0.5, 0.4999999999999916 ) q, r) -> -1
orientation((0.5, 0.49999999999999156) q, r) -> -1
orientation((0.5, 0.4999999999999915 ) q, r) -> -1
orientation((0.5, 0.49999999999999145) q, r) -> -1
orientation((0.5, 0.4999999999999914 ) q, r) -> -1
orientation((0.5, 0.49999999999999134) q, r) -> -1
orientation((0.5, 0.4999999999999913 ) q, r) -> -1
orientation((0.5, 0.49999999999999123) q, r) -> -1
orientation((0.5, 0.4999999999999912 ) q, r) -> -1
orientation((0.5, 0.4999999999999911 ) q, r) -> -1
orientation((0.5, 0.49999999999999106) q, r) -> -1
orientation((0.5, 0.499999999999991  ) q, r) -> -1
orientation((0.5, 0.49999999999999095) q, r) -> -1
orientation((0.5, 0.4999999999999909 ) q, r) -> -1
orientation((0.5, 0.49999999999999084) q, r) -> -1
orientation((0.5, 0.4999999999999908 ) q, r) -> -1
orientation((0.5, 0.49999999999999073) q, r) -> -1
orientation((0.5, 0.4999999999999907 ) q, r) -> -1
orientation((0.5, 0.4999999999999906 ) q, r) -> -1
orientation((0.5, 0.49999999999999056) q, r) -> -1
orientation((0.5, 0.4999999999999905 ) q, r) -> -1
orientation((0.5, 0.49999999999999045) q, r) -> -1
orientation((0.5, 0.4999999999999904 ) q, r) -> -1
orientation((0.5, 0.49999999999999034) q, r) -> -1
orientation((0.5, 0.4999999999999903 ) q, r) -> -1
orientation((0.5, 0.49999999999999023) q, r) -> -1
orientation((0.5, 0.4999999999999902 ) q, r) -> -1
orientation((0.5, 0.4999999999999901 ) q, r) -> -1
orientation((0.5, 0.49999999999999006) q, r) -> -1
orientation((0.5, 0.49999999999999   ) q, r) -> -1

The colour coding (added later) represents whether the algorithm reckons the points are above the line (in blue), on the line (in yellow) or below the line (in red). The only point which is actually on the line is in green.

By this point you should at least be wary of using floating point arithmetic for geometric computation. Lest you think this can easily be solved by introducing a tolerance value, or some other clunky solution, we'll save you the bother by pointing out that doing do merely moves these fringing effects to the edge of the tolerance zone.

What to do? Fortunately, as we alluded to at the beginning of this tale, Python gives us a solution into the form of the rational numbers, implemented as the Fraction type.

Let's make a small change to our program, converting all numbers to Fractions before proceeding with the computation. We'll do this by modifying the orientation() to convert each of its three arguments from a tuple containing a pair of numeric objects into a pair of Fractions. The Fraction constructor accepts a selection of numeric types, including float:

def orientation(p, q, r):
      """Determine the orientation of three points in the plane.

      Args:
        p, q, r: Two-tuples representing coordinate pairs of three points.

      Returns:
          -1 if p, q, r is a turn to the right, +1 if p, q, r is a turn to the
          left, otherwise 0 if p, q, and r are collinear.
      """
      p = (Fraction(p[0]), Fraction(p[1]))
      q = (Fraction(q[0]), Fraction(q[1]))
      r = (Fraction(r[0]), Fraction(r[1]))

      d = (q[0] - p[0]) * (r[1] - p[1]) - (q[1] - p[1]) * (r[0] - p[0])
      return sign(d)

The variable d will now also be a Fraction and the sign() function will work as expected with this type since it only uses comparison to zero.

Let's run our modified example:

orientation((0.5, 0.49999999999999   ) q, r) -> -1
orientation((0.5, 0.49999999999999006) q, r) -> -1
orientation((0.5, 0.4999999999999901 ) q, r) -> -1
orientation((0.5, 0.4999999999999902 ) q, r) -> -1
orientation((0.5, 0.49999999999999023) q, r) -> -1
orientation((0.5, 0.4999999999999903 ) q, r) -> -1
orientation((0.5, 0.49999999999999034) q, r) -> -1
orientation((0.5, 0.4999999999999904 ) q, r) -> -1
orientation((0.5, 0.49999999999999045) q, r) -> -1
orientation((0.5, 0.4999999999999905 ) q, r) -> -1
orientation((0.5, 0.49999999999999056) q, r) -> -1
orientation((0.5, 0.4999999999999906 ) q, r) -> -1
orientation((0.5, 0.4999999999999907 ) q, r) -> -1
orientation((0.5, 0.49999999999999073) q, r) -> -1
orientation((0.5, 0.4999999999999908 ) q, r) -> -1
orientation((0.5, 0.49999999999999084) q, r) -> -1
orientation((0.5, 0.4999999999999909 ) q, r) -> -1
orientation((0.5, 0.49999999999999095) q, r) -> -1
orientation((0.5, 0.499999999999991  ) q, r) -> -1
orientation((0.5, 0.49999999999999106) q, r) -> -1
orientation((0.5, 0.4999999999999911 ) q, r) -> -1
orientation((0.5, 0.4999999999999912 ) q, r) -> -1
orientation((0.5, 0.49999999999999123) q, r) -> -1
orientation((0.5, 0.4999999999999913 ) q, r) -> -1
orientation((0.5, 0.49999999999999134) q, r) -> -1
orientation((0.5, 0.4999999999999914 ) q, r) -> -1
orientation((0.5, 0.49999999999999145) q, r) -> -1
orientation((0.5, 0.4999999999999915 ) q, r) -> -1
orientation((0.5, 0.49999999999999156) q, r) -> -1
orientation((0.5, 0.4999999999999916 ) q, r) -> -1
orientation((0.5, 0.4999999999999917 ) q, r) -> -1
orientation((0.5, 0.49999999999999173) q, r) -> -1
orientation((0.5, 0.4999999999999918 ) q, r) -> -1
orientation((0.5, 0.49999999999999184) q, r) -> -1
orientation((0.5, 0.4999999999999919 ) q, r) -> -1
orientation((0.5, 0.49999999999999195) q, r) -> -1
orientation((0.5, 0.499999999999992  ) q, r) -> -1
orientation((0.5, 0.49999999999999206) q, r) -> -1
orientation((0.5, 0.4999999999999921 ) q, r) -> -1
orientation((0.5, 0.4999999999999922 ) q, r) -> -1
orientation((0.5, 0.49999999999999223) q, r) -> -1
orientation((0.5, 0.4999999999999923 ) q, r) -> -1
orientation((0.5, 0.49999999999999234) q, r) -> -1
orientation((0.5, 0.4999999999999924 ) q, r) -> -1
orientation((0.5, 0.49999999999999245) q, r) -> -1
orientation((0.5, 0.4999999999999925 ) q, r) -> -1
orientation((0.5, 0.49999999999999256) q, r) -> -1
orientation((0.5, 0.4999999999999926 ) q, r) -> -1
orientation((0.5, 0.4999999999999927 ) q, r) -> -1
orientation((0.5, 0.49999999999999273) q, r) -> -1
orientation((0.5, 0.4999999999999928 ) q, r) -> -1
orientation((0.5, 0.49999999999999284) q, r) -> -1
orientation((0.5, 0.4999999999999929 ) q, r) -> -1
orientation((0.5, 0.49999999999999295) q, r) -> -1
orientation((0.5, 0.499999999999993  ) q, r) -> -1
orientation((0.5, 0.49999999999999306) q, r) -> -1
orientation((0.5, 0.4999999999999931 ) q, r) -> -1
orientation((0.5, 0.49999999999999317) q, r) -> -1
orientation((0.5, 0.4999999999999932 ) q, r) -> -1
orientation((0.5, 0.4999999999999933 ) q, r) -> -1
orientation((0.5, 0.49999999999999334) q, r) -> -1
orientation((0.5, 0.4999999999999934 ) q, r) -> -1
orientation((0.5, 0.49999999999999345) q, r) -> -1
orientation((0.5, 0.4999999999999935 ) q, r) -> -1
orientation((0.5, 0.49999999999999356) q, r) -> -1
orientation((0.5, 0.4999999999999936 ) q, r) -> -1
orientation((0.5, 0.49999999999999367) q, r) -> -1
orientation((0.5, 0.4999999999999937 ) q, r) -> -1
orientation((0.5, 0.4999999999999938 ) q, r) -> -1
orientation((0.5, 0.49999999999999384) q, r) -> -1
orientation((0.5, 0.4999999999999939 ) q, r) -> -1
orientation((0.5, 0.49999999999999395) q, r) -> -1
orientation((0.5, 0.499999999999994  ) q, r) -> -1
orientation((0.5, 0.49999999999999406) q, r) -> -1
orientation((0.5, 0.4999999999999941 ) q, r) -> -1
orientation((0.5, 0.49999999999999417) q, r) -> -1
orientation((0.5, 0.4999999999999942 ) q, r) -> -1
orientation((0.5, 0.4999999999999943 ) q, r) -> -1
orientation((0.5, 0.49999999999999434) q, r) -> -1
orientation((0.5, 0.4999999999999944 ) q, r) -> -1
orientation((0.5, 0.49999999999999445) q, r) -> -1
orientation((0.5, 0.4999999999999945 ) q, r) -> -1
orientation((0.5, 0.49999999999999456) q, r) -> -1
orientation((0.5, 0.4999999999999946 ) q, r) -> -1
orientation((0.5, 0.49999999999999467) q, r) -> -1
orientation((0.5, 0.4999999999999947 ) q, r) -> -1
orientation((0.5, 0.4999999999999948 ) q, r) -> -1
orientation((0.5, 0.49999999999999484) q, r) -> -1
orientation((0.5, 0.4999999999999949 ) q, r) -> -1
orientation((0.5, 0.49999999999999495) q, r) -> -1
orientation((0.5, 0.499999999999995  ) q, r) -> -1
orientation((0.5, 0.49999999999999506) q, r) -> -1
orientation((0.5, 0.4999999999999951 ) q, r) -> -1
orientation((0.5, 0.49999999999999517) q, r) -> -1
orientation((0.5, 0.4999999999999952 ) q, r) -> -1
orientation((0.5, 0.4999999999999953 ) q, r) -> -1
orientation((0.5, 0.49999999999999534) q, r) -> -1
orientation((0.5, 0.4999999999999954 ) q, r) -> -1
orientation((0.5, 0.49999999999999545) q, r) -> -1
orientation((0.5, 0.4999999999999955 ) q, r) -> -1
orientation((0.5, 0.49999999999999556) q, r) -> -1
orientation((0.5, 0.4999999999999956 ) q, r) -> -1
orientation((0.5, 0.49999999999999567) q, r) -> -1
orientation((0.5, 0.4999999999999957 ) q, r) -> -1
orientation((0.5, 0.4999999999999958 ) q, r) -> -1
orientation((0.5, 0.49999999999999584) q, r) -> -1
orientation((0.5, 0.4999999999999959 ) q, r) -> -1
orientation((0.5, 0.49999999999999595) q, r) -> -1
orientation((0.5, 0.499999999999996  ) q, r) -> -1
orientation((0.5, 0.49999999999999606) q, r) -> -1
orientation((0.5, 0.4999999999999961 ) q, r) -> -1
orientation((0.5, 0.49999999999999617) q, r) -> -1
orientation((0.5, 0.4999999999999962 ) q, r) -> -1
orientation((0.5, 0.4999999999999963 ) q, r) -> -1
orientation((0.5, 0.49999999999999634) q, r) -> -1
orientation((0.5, 0.4999999999999964 ) q, r) -> -1
orientation((0.5, 0.49999999999999645) q, r) -> -1
orientation((0.5, 0.4999999999999965 ) q, r) -> -1
orientation((0.5, 0.49999999999999656) q, r) -> -1
orientation((0.5, 0.4999999999999966 ) q, r) -> -1
orientation((0.5, 0.49999999999999667) q, r) -> -1
orientation((0.5, 0.4999999999999967 ) q, r) -> -1
orientation((0.5, 0.4999999999999968 ) q, r) -> -1
orientation((0.5, 0.49999999999999684) q, r) -> -1
orientation((0.5, 0.4999999999999969 ) q, r) -> -1
orientation((0.5, 0.49999999999999695) q, r) -> -1
orientation((0.5, 0.499999999999997  ) q, r) -> -1
orientation((0.5, 0.49999999999999706) q, r) -> -1
orientation((0.5, 0.4999999999999971 ) q, r) -> -1
orientation((0.5, 0.49999999999999717) q, r) -> -1
orientation((0.5, 0.4999999999999972 ) q, r) -> -1
orientation((0.5, 0.4999999999999973 ) q, r) -> -1
orientation((0.5, 0.49999999999999734) q, r) -> -1
orientation((0.5, 0.4999999999999974 ) q, r) -> -1
orientation((0.5, 0.49999999999999745) q, r) -> -1
orientation((0.5, 0.4999999999999975 ) q, r) -> -1
orientation((0.5, 0.49999999999999756) q, r) -> -1
orientation((0.5, 0.4999999999999976 ) q, r) -> -1
orientation((0.5, 0.49999999999999767) q, r) -> -1
orientation((0.5, 0.4999999999999977 ) q, r) -> -1
orientation((0.5, 0.4999999999999978 ) q, r) -> -1
orientation((0.5, 0.49999999999999784) q, r) -> -1
orientation((0.5, 0.4999999999999979 ) q, r) -> -1
orientation((0.5, 0.49999999999999795) q, r) -> -1
orientation((0.5, 0.499999999999998  ) q, r) -> -1
orientation((0.5, 0.49999999999999806) q, r) -> -1
orientation((0.5, 0.4999999999999981 ) q, r) -> -1
orientation((0.5, 0.49999999999999817) q, r) -> -1
orientation((0.5, 0.4999999999999982 ) q, r) -> -1
orientation((0.5, 0.4999999999999983 ) q, r) -> -1
orientation((0.5, 0.49999999999999833) q, r) -> -1
orientation((0.5, 0.4999999999999984 ) q, r) -> -1
orientation((0.5, 0.49999999999999845) q, r) -> -1
orientation((0.5, 0.4999999999999985 ) q, r) -> -1
orientation((0.5, 0.49999999999999856) q, r) -> -1
orientation((0.5, 0.4999999999999986 ) q, r) -> -1
orientation((0.5, 0.49999999999999867) q, r) -> -1
orientation((0.5, 0.4999999999999987 ) q, r) -> -1
orientation((0.5, 0.4999999999999988 ) q, r) -> -1
orientation((0.5, 0.49999999999999883) q, r) -> -1
orientation((0.5, 0.4999999999999989 ) q, r) -> -1
orientation((0.5, 0.49999999999999895) q, r) -> -1
orientation((0.5, 0.499999999999999  ) q, r) -> -1
orientation((0.5, 0.49999999999999906) q, r) -> -1
orientation((0.5, 0.4999999999999991 ) q, r) -> -1
orientation((0.5, 0.49999999999999917) q, r) -> -1
orientation((0.5, 0.4999999999999992 ) q, r) -> -1
orientation((0.5, 0.4999999999999993 ) q, r) -> -1
orientation((0.5, 0.49999999999999933) q, r) -> -1
orientation((0.5, 0.4999999999999994 ) q, r) -> -1
orientation((0.5, 0.49999999999999944) q, r) -> -1
orientation((0.5, 0.4999999999999995 ) q, r) -> -1
orientation((0.5, 0.49999999999999956) q, r) -> -1
orientation((0.5, 0.4999999999999996 ) q, r) -> -1
orientation((0.5, 0.49999999999999967) q, r) -> -1
orientation((0.5, 0.4999999999999997 ) q, r) -> -1
orientation((0.5, 0.4999999999999998 ) q, r) -> -1
orientation((0.5, 0.49999999999999983) q, r) -> -1
orientation((0.5, 0.4999999999999999 ) q, r) -> -1
orientation((0.5, 0.49999999999999994) q, r) -> -1
orientation((0.5, 0.5                ) q, r) ->  0
orientation((0.5, 0.5000000000000001 ) q, r) ->  1
orientation((0.5, 0.5000000000000002 ) q, r) ->  1
orientation((0.5, 0.5000000000000003 ) q, r) ->  1
orientation((0.5, 0.5000000000000004 ) q, r) ->  1
orientation((0.5, 0.5000000000000006 ) q, r) ->  1
orientation((0.5, 0.5000000000000007 ) q, r) ->  1
orientation((0.5, 0.5000000000000008 ) q, r) ->  1
orientation((0.5, 0.5000000000000009 ) q, r) ->  1
orientation((0.5, 0.500000000000001  ) q, r) ->  1
orientation((0.5, 0.5000000000000011 ) q, r) ->  1
orientation((0.5, 0.5000000000000012 ) q, r) ->  1
orientation((0.5, 0.5000000000000013 ) q, r) ->  1
orientation((0.5, 0.5000000000000014 ) q, r) ->  1
orientation((0.5, 0.5000000000000016 ) q, r) ->  1
orientation((0.5, 0.5000000000000017 ) q, r) ->  1
orientation((0.5, 0.5000000000000018 ) q, r) ->  1
orientation((0.5, 0.5000000000000019 ) q, r) ->  1
orientation((0.5, 0.500000000000002  ) q, r) ->  1
orientation((0.5, 0.5000000000000021 ) q, r) ->  1
orientation((0.5, 0.5000000000000022 ) q, r) ->  1
orientation((0.5, 0.5000000000000023 ) q, r) ->  1
orientation((0.5, 0.5000000000000024 ) q, r) ->  1
orientation((0.5, 0.5000000000000026 ) q, r) ->  1
orientation((0.5, 0.5000000000000027 ) q, r) ->  1
orientation((0.5, 0.5000000000000028 ) q, r) ->  1
orientation((0.5, 0.5000000000000029 ) q, r) ->  1
orientation((0.5, 0.500000000000003  ) q, r) ->  1
orientation((0.5, 0.5000000000000031 ) q, r) ->  1
orientation((0.5, 0.5000000000000032 ) q, r) ->  1
orientation((0.5, 0.5000000000000033 ) q, r) ->  1
orientation((0.5, 0.5000000000000034 ) q, r) ->  1
orientation((0.5, 0.5000000000000036 ) q, r) ->  1
orientation((0.5, 0.5000000000000037 ) q, r) ->  1
orientation((0.5, 0.5000000000000038 ) q, r) ->  1
orientation((0.5, 0.5000000000000039 ) q, r) ->  1
orientation((0.5, 0.500000000000004  ) q, r) ->  1
orientation((0.5, 0.5000000000000041 ) q, r) ->  1
orientation((0.5, 0.5000000000000042 ) q, r) ->  1
orientation((0.5, 0.5000000000000043 ) q, r) ->  1
orientation((0.5, 0.5000000000000044 ) q, r) ->  1
orientation((0.5, 0.5000000000000046 ) q, r) ->  1
orientation((0.5, 0.5000000000000047 ) q, r) ->  1
orientation((0.5, 0.5000000000000048 ) q, r) ->  1
orientation((0.5, 0.5000000000000049 ) q, r) ->  1
orientation((0.5, 0.500000000000005  ) q, r) ->  1
orientation((0.5, 0.5000000000000051 ) q, r) ->  1
orientation((0.5, 0.5000000000000052 ) q, r) ->  1
orientation((0.5, 0.5000000000000053 ) q, r) ->  1
orientation((0.5, 0.5000000000000054 ) q, r) ->  1
orientation((0.5, 0.5000000000000056 ) q, r) ->  1
orientation((0.5, 0.5000000000000057 ) q, r) ->  1
orientation((0.5, 0.5000000000000058 ) q, r) ->  1
orientation((0.5, 0.5000000000000059 ) q, r) ->  1
orientation((0.5, 0.500000000000006  ) q, r) ->  1
orientation((0.5, 0.5000000000000061 ) q, r) ->  1
orientation((0.5, 0.5000000000000062 ) q, r) ->  1
orientation((0.5, 0.5000000000000063 ) q, r) ->  1
orientation((0.5, 0.5000000000000064 ) q, r) ->  1
orientation((0.5, 0.5000000000000066 ) q, r) ->  1
orientation((0.5, 0.5000000000000067 ) q, r) ->  1
orientation((0.5, 0.5000000000000068 ) q, r) ->  1
orientation((0.5, 0.5000000000000069 ) q, r) ->  1
orientation((0.5, 0.500000000000007  ) q, r) ->  1
orientation((0.5, 0.5000000000000071 ) q, r) ->  1
orientation((0.5, 0.5000000000000072 ) q, r) ->  1
orientation((0.5, 0.5000000000000073 ) q, r) ->  1
orientation((0.5, 0.5000000000000074 ) q, r) ->  1
orientation((0.5, 0.5000000000000075 ) q, r) ->  1
orientation((0.5, 0.5000000000000077 ) q, r) ->  1
orientation((0.5, 0.5000000000000078 ) q, r) ->  1
orientation((0.5, 0.5000000000000079 ) q, r) ->  1
orientation((0.5, 0.500000000000008  ) q, r) ->  1
orientation((0.5, 0.5000000000000081 ) q, r) ->  1
orientation((0.5, 0.5000000000000082 ) q, r) ->  1
orientation((0.5, 0.5000000000000083 ) q, r) ->  1
orientation((0.5, 0.5000000000000084 ) q, r) ->  1
orientation((0.5, 0.5000000000000085 ) q, r) ->  1
orientation((0.5, 0.5000000000000087 ) q, r) ->  1
orientation((0.5, 0.5000000000000088 ) q, r) ->  1
orientation((0.5, 0.5000000000000089 ) q, r) ->  1
orientation((0.5, 0.500000000000009  ) q, r) ->  1
orientation((0.5, 0.5000000000000091 ) q, r) ->  1
orientation((0.5, 0.5000000000000092 ) q, r) ->  1
orientation((0.5, 0.5000000000000093 ) q, r) ->  1
orientation((0.5, 0.5000000000000094 ) q, r) ->  1
orientation((0.5, 0.5000000000000095 ) q, r) ->  1
orientation((0.5, 0.5000000000000097 ) q, r) ->  1
orientation((0.5, 0.5000000000000098 ) q, r) ->  1
orientation((0.5, 0.5000000000000099 ) q, r) ->  1
orientation((0.5, 0.50000000000001   ) q, r) ->  1

Using Fractions internally, our orientation() function gets the full benefit of exact arithmetic with effectively infinite precision and consequently produces an exact result with only one position of p being reported as collinear with q and r.

In the next article, we'll more fully explore the behaviour of the non-robust float-based version of this function based graphically, to get an impression of how lines are 'seen' by floating-point geometric functions.

[1]	Python's `float` is an IEEE-754 double precision 64-bit float.

June 20, 2014

The Folly of Floating-Point for Robust Geometric Computation

Rob Smallshire from Good With Computers

Computational geometry - a world where lines have zero thickness, circles are perfectly round and points are dimensionless. Creating robust geometric algorithms using finite precision number types such as float is fiendishly difficult because it's not possible to exactly represent numbers such as one-third, which rather gets in the way of performing seemingly simple operations like dividing a line into exactly three equal segments. In this short series of posts, we'll look at some of the pitfalls of geometric computation, with examples in Python, although the key messages are true with finite-precision floating point numbers in any language.

Rational numbers, modelled by Python's Fraction [1] type can be useful for implementing robust geometric algorithms. These algorithms are often deeply elegant and surprising because they must avoid any detour into the realm of irrational numbers which cannot be represented in finite precision, which means that using seemingly innocuous operations like square root, for example to determine the length of a line using Pythagoras, are not permitted.

One algorithm which benefits from rational numbers is a simple collinearity test determining whether three points lie on the same line. This can be further refined to consider whether a query point \(p\) is above, exactly on, or below the line. Now there are many ways to approach writing such a function, and like many questions in computational geometry the naÃ¯ve approaches are either overly complex, inefficient, or just plain wrong, albeit in subtle ways. I won't cover the story of how to arrive at a robust algorithm, that story is entertaining covered in Writing Programs for "The Book" by Brian Hayes. [2] Rather, we'll start where Brian leaves off by showing how to implement the algorithm in Python using both floating-point and exact arithmetic so we can understand the tradeoffs between performance and correctness inherent in these choices. Along the way, we'll perhaps touch on some aspects of Python which may be new to you.

Whether p is above, exactly on, or below line p, r can be determined from the orientation of triangle p, q, r.

You don't need to understand the mathematics of the orientation test to appreciate the point of what we're about to demonstrate, suffice to say that the orientation of three two-dimensional points can be concisely found by computing the sign of the determinant of a three by three matrix containing the \(x\) and \(y\) coordinates of the points in question, where the determinant happens to be the signed area of the triangle formed by the three points:

\begin{equation*} \newcommand{\sgn}{\mathop{\rm sgn}\nolimits} o(p, q, r) = \sgn \begin{vmatrix} 1 & p\_x & p\_y\\1 & q\_x & q\_y\\1 & r\_x & r\_y \end{vmatrix} \end{equation*}

The function \(o\) returns \(+1\) if the polyline \(p\), \(q\), \(r\) executes a left turn and the loop is counterclockwise, \(0\) if the polyline is straight, or \(-1\) if the polyline executes a right turn and the loop is clockwise. These values can in turn be interpreted in terms of whether the query point \(p\) is above, on, or below the line through \(q\) and \(r\).

To cast this formula in Python, we need a sign function and a means of computing the determinant. Both of these are straightforward, although perhaps not obvious, and give us the opportunity to explore a little appreciated aspect of Python. First, the sign() function. You may be surprised to learn âˆ’ and you wouldn't be alone âˆ’ that there is no built-in or library function in Python which returns the sign of a number as \(-1\), \(0\) or \(+1\), so we need to roll our own. The simplest solution is probably something like this:

>>> def sign(x):
...     if x < 0:
...         return -1
...     elif x > 0:
...         return 1
...     return 0
...
>>> sign(5)
1
>>> sign(-5)
-1
>>> sign(0)
0

This works well enough. A more elegant solution would be to exploit an interesting behaviour of the bool type, specifically how it behaves under subtraction. Let's do a few experiments:

>>> False - False
0
>>> False - True
-1
>>> True - False
1
>>> True - True
0

Intriguingly, subtraction of bool objects has an integer result! In fact, when used in arithmetic operations this way, True is equivalent to positive one and False is equivalent to zero. We can use this behaviour to implement a most elegant sign() function:

>>> def sign(x):
...     return (x > 0) - (x < 0)
...
>>> sign(-5)
-1
>>> sign(5)
1
>>> sign(0)
0

Now we need to compute the determinant. In our case this turns out to reduce down to simply:

\begin{equation*} \det = (q\_x - p\_x)(r\_y - p\_y) - (q\_y - p\_y)(r\_x - p\_x) \end{equation*}

so the definition of our orientation() function using tuple coordinate pairs for each point becomes just:

def orientation(p, q, r):
    d = (q[0] - p[0]) * (r[1] - p[1]) - (q[1] - p[1]) * (r[0] - p[0])
    return sign(d)

Let's test this on on some examples. First we set up three points a, b and c:

>>> a = (0, 0)
>>> b = (4, 0)
>>> c = (4, 3)

Now we test the orientation of a âž” b âž” c:

>>> orientation(a, b, c)
1

This represents a left turn, so the function returns positive one. On the other hand the orientation of a âž” c âž” b is negative one:

>>> orientation(a, c, b)
-1

Let's introduce a fourth point d which is collinear with a and c. As expected our orientation() function returns zero for the group a âž” c âž” d:

>>> d = (8, 6)
>>> orientation(a, c, d)
0

So far so good. Everything we have done so far is using integer numbers which, in Python, have arbitrary precision. Our function only uses multiplication and subtraction, with no division to result in float values, so all of that precision is preserved. But what happens if we use floating point values as our input data? Let's try some different values using floats. Here are three points which lie on a diagonal line:

>>> e = (0.5, 0.5)
>>> f = (12.0, 12.0)
>>> g = (24.0, 24.0)

As we would expect, our orientation test determines that these points are collinear:

>>> orientation(e, f, g)
0

Furthermore, moving the point e up a little, by increasing its \(y\) coordinate by even a tiny amount, gives the answer we would expect:

>>> e = (0.5, 0.5000000000000018)
>>> orientation(e, f, g)
1

Now let's increase the \(y\) coordinate just a little more. In fact, we'll increase it by the smallest possible amount to the next representable [3] floating point number:

>>> e = (0.5, 0.5000000000000019)
>>> orientation(e, f, g)
0

Wow! According to our orientation function the points e, f and g are collinear again. This cannot possibly be! In fact, we can go through the next 23 successive floating point values up to,

>>> e = (0.5, 0.5000000000000044)
>>> orientation(e, f, g)
0

with our function still reporting that the three points are collinear, until we get to this value,

>>> e = (0.5, 0.5000000000000046)

at which point things settle down and become well behaved again:

>>> orientation(e, f, g)
1

What's happening here is that we've run into problems with the finite precision of Python floats at points very close the diagonal line, and the mathematical assumptions we make in our formula about how numbers work break down due to the fact that floating point numbers are a far from a perfect model of real numbers. Next time, we'll investigate more thoroughly, the behaviour of this orientation test at the limits of floating-point precision.

[1]	The `Fraction` type which models rational numbers is defined in the Python Standard Library fractions module.

[2]	Hayes, Brian. (2007) Writing Programs for "The Book". In: Oram, A. & Wilson, G., eds. Beautiful Code O'Reilly Media. pp. 539â€“551.

[3] In C we could use the nextafter() function to generate the next representable floating point number. Unfortunately, nextafter() is not exposed to Python. Various workarounds are available, including a version built into Numpy, directly calling the C version using ctypes and a pure Python implementation.

June 20, 2014

Robust Geometric Computation in Python

Rob Smallshire from Good With Computers

In this series we look in detail at the behaviour of one of the simplest geometric functions, demonstrate its flawed behaviour when implemented with floating point numbers, and fix it using an exact number type provided in the Python Standard Library.

April 8, 2014

Top four JavaZone 2013 talk – The Unreasonable Effectiveness of Dynamic Typing

Rob Smallshire from Good With Computers

I'm very happy to see that my talk on The Unreasonable Effectiveness of Dynamic Typing was rated fourth of all the talks in the show. Thanks to everyone who attended and voted.

JavaZone 2013 top talks http://t.co/gfbMnJRsaU - 4th: "The Unreasonable Effectiveness of Dynamic Typing for Practical â€¦" by @robsmallshire
— JavaZone (@javazone) April 8, 2014

This talk is perhaps deliberately provocative, but only with the intention of provoking critical thinking and empiricism around the tools we use. I'm genuinely curious as to why programs in dynamic languages are as reliable as they are, although I confess I don't yet have many of the answers.

Author: Rob Smallshire

Understanding Transducers through Python

Predictive Models of Development Teams and the Systems They Build

Legacy systems

The role of models

A model of software development

Productivity

Incorporating growth of software

Incorporating turnover

Simulating a project

Monte Carlo simulation

A refinement of Conway's Law?

Rational Computational Geometry in Python

The Folly of Floating-Point for Robust Geometric Computation

Robust Geometric Computation in Python

Top four JavaZone 2013 talk – The Unreasonable Effectiveness of Dynamic Typing