Team DNA-impersonators create a business plan

Derek Jones from The Shape of Code

This weekend I was at the Hack the Police hackthon, sponsored by the Metropolitan Police+other organizations. My plan was to find an interesting problem to help solve, using the data we were told would be available. My previous experience with crime data is that there is not enough of it to allow reliable models to be built, this is a good thing in that nobody wants lots of crime. Talking to a Police intelligence officer, the publicly available data contained crimes (i.e., a court case had found somebody guilty), not reported incidents, and was not large enough to build allow a good model to be built.

Looking for a team to join, I got talking to Joe and Rebecca. Joe had discovered a very interesting possible threat to the existing DNA matching technique, and they were happy for me to join them analyzing this threat model; team DNA-impersonators was go.

Some background (Joe and Rebecca are the team’s genetic experts, I’m a software guy who has read a few books on the subject; all the mistakes in this post are mine). The DNA matching technique used by the Police is based on 17 specific sequences (each around 100 bases, known as loci), within the human genome (which contains around 3 billion bases).

There are companies who synthesize sequences of DNA to order. I knew that machines for doing this existed, but I did not know it was possible to order a bespoke sequence online, and how inexpensive it was.

Some people have had their DNA sequenced, and have allowed it to be published online; Steven Pinker is the most famous person I could find, whose DNA sequence is available online (link not given; it requires work+luck to find). The Personal Genome Projects aims to sequence and make available the complete genomes of 100,000 volunteers (the UK arm of this project is on hold because of lack of funding; master criminals in the UK have a window of opportunity: offer to sponsor the project on condition that their DNA is included in the public data set).

How much would it cost to manufacture bottles of spray-on Steven Pinker DNA? Is there a viable business model selling Pinker No. 5?

The screen shot below shows a quote for 2-nmol of DNA for the sequence of 100 bases that are one of the 17 loci used in DNA matching. This order is for concentrated DNA, and needs to be diluted to the level likely to be found as residue at a crime scene. Joe calculated that 2-nmol can be diluted to produce 60-liters of usable ‘product’.

Quote for synthesis of 100 bases of human DNA.

There was not enough time to obtain sequences for the other 16-loci, and get quotes for them. Information on the 17-loci used for DNA matching is available in research papers; a summer job for a PhD student to sort out the details.

The concentrate from the 17-loci dilutes to 60-liters. Say each spray-on bottle contains 100ml, then an investment of £800 (plus researcher time) generates enough liquid for 600-bottles of Pinker No. 5.

What is the pricing model? Is there a mass market (e.g., Hong Kong protesters wanting to be anonymous), or would it be more profitable to target a few select clients? Perhaps Steven Pinker always wanted to try his hand at safe-cracking in his spare time, but was worried about leaving DNA evidence behind; he might be willing to pay to have the market flooded, so Pinker No. 5 residue becomes a common occurrence at crime scenes (allowing him to plausible claim that any crime scene DNA matches were left behind by other people).

Some of the police officers at the hack volunteered that they knew lots of potential customers; the forensics officer present was horrified.

Before the 1980s, DNA profiling was not available. Will the 2020s be the decade in which DNA profiling ceases being a viable tool for catching competent criminals?

High quality photocopiers manufacturers are required to implement features that make it difficult for people to create good quality copies of paper currency.

What might law enforcement do about this threat to the viability of DNA profiling?

Ideas include:

  • Requiring companies in the bespoke DNA business to report suspicious orders. What is a suspicious order? Are enough companies in business to make it possible to order each of the 17-loci from different company (we think so)?
  • Introducing laws making it illegal to be in possession of diluted forms of other people’s DNA (with provisions for legitimate uses).
  • Attacking the economics of the Pinker No. 5 business model by having more than 17-loci available for use in DNA matching. Perhaps 1,000 loci could be selected as potential match sites, with individual DNA testing kits randomly testing 17 (or more) from this set.

New books and cards – stuff happening

Allan Kelly from Allan Kelly Associates

RetroCardsLores-2019-09-13-16-51.jpg

I am sure some of you have noticed that my blogs have been a less regular the last few months. That is because I’ve been busy on other stuff. So a break from deep thoughts and advice on the software world to mention some other stuff I’ve been working on.

UserStories_Audio-2019-09-13-16-51.jpg
For a start, available right now, my Little Book of Requirements and User Stories is now available in audio format to listen to. Full details – and the FAQ as a free download on my website. You will find links there to buy it on Audible and Apple (its cheaper on Apple, don’t ask me why.)

To my surprise Little Book has long been my best-seller so I teamed up again with Stacy Gonzalez – who voiced Project Myopia for me – to produce an audio version of Little Book. In the few weeks it has been available sales are already outstripping Project Myopia!

AOPO-2019-09-13-16-51.jpg

Second, as some know I’ve been working with Apress to turn The Art of Product Ownership from a LeanPub eBook into a full regular book. That should be out in October, you can pre-order it on Amazon now.

(And if you can’t wait, I’ve got a pre-copy edit version I can share with you provided you promise to write an Amazon review when the book is published. Mail me or use the contact form if you are interested.)

Finally, that picture at the top of the page. I’ve been working with Nicolas Umiastowski to create a playing card retrospective. These are based on my Retrospective Dialogue Sheets In our experiments they have given retrospectives another twist. More about these soon – and details of how you can get a pack (in the mean time get in contact if you are really keen to try them.)


Like this post? – Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

Check out my latest books – Continuous Digital and Project Myopia – and the Project Myopia audio edition

The post New books and cards – stuff happening appeared first on Allan Kelly Associates.

Natural elimination, or the survival of the good enough

Derek Jones from The Shape of Code

Thanks to Darwin, the world is full of people who think that evolution, in nature, works by: natural selection, or the survival of the fittest. I thought this until I read “Good Enough: The Tolerance for Mediocrity in Nature and Society” by Daniel Milo.

Milo makes a very convincing case that nature actually works by: natural elimination, or the survival of the good enough.

Why might Darwin have gone with natural selection in his book, On the Origin of Species? Milo makes the point that the only real evidence that Darwin had to work with was artificial selection, that is the breeding of farm animals and domestic pets to select for traits that humans found desirable. Darwin’s visit to the Galápagos islands triggered a way of thinking, it did not provide him with the evidence he needed; Darwin’s Finches have become a commonly cited example of natural selection at work, but while Darwin made the observations it was not until 80 years later that somebody else spotted their relevance.

The Origin of Species, or to use its full title: “On the Origin of Species by means of natural selection, or the preservation of favored races in the struggle for life.” is full of examples and terminology relating to artificial selection.

Natural selection, or natural elimination, isn’t the result the same?

Natural selection implies an optimization process, e.g., breeders selecting for a strain of cows that produce the most milk.

Natural elimination is a good enough process, i.e., a creature needs a collection of traits that are good enough for them to create the next generation.

A long-standing problem with natural selection is that it fails to explain the diversity present in a natural population of some breed of animal (there is very little diversity in each breed of farm animal, they have been optimized for consistency). Diversity is not a problem for natural elimination, which does not reduce differences in its search for fitness.

The diversity produced as a consequence of natural elimination creates a population containing many neutral traits (i.e., characteristics that have no positive or negative impact on continuing survival). When a significant change in the environment occurs, one or more of the neutral traits may suddenly have positive or negative survival consequences; the creatures with the positive traits have opportunity time to adapt to the changed environment. A population whose members possess a diverse range of neutral traits has a higher chance of long-term survival than a population where diversity has been squeezed in the quest for the fittest.

I think that natural elimination also applies within software ecosystems. Commercial products survive if enough customers buy them, software developers need good enough know-how to get the job done.

I’m sure customers would prefer software ecosystems to operate on the principle of survival of the fittest (it reduces their costs). Over the long term is society best served by diverse software ecosystems or softwaremonocultures? Diversity is a way of encouraging competition, but over time there is diminishing returns on the improvements.

Cut Price Clusterings – a.k.

a.k. from thus spake a.k.

Last month we saw how we could efficiently generate hierarchical clusterings, which are sequences of sets of clusters, which are themselves subsets of a set of data that each contain elements that are similar to each other, such that if a pair of data are in the same clustering at one step then they must be in the same clustering in the next which will always be the case if we move from one step to the next by merging the closest pairs of clusters. Specifically, we used our ak.minHeap implementation of the min-heap structure to cache the distances between clusters, saving us the expense of recalculating them for clusters that don't change from one step in the hierarchy to the next.
Recall that we used three different schemes for calculating the distance between a pair of clusters, the average distance between their members, known as average linkage, the distance between their closest members, known as single linkage, and the distance between their farthest members, known as complete linkage, and that I concluded by noting that our algorithm was about as efficient as possible in general but that there is a much more efficient scheme for single linkage clusterings; efficient enough that sorting the clusters in each clustering by size would be the most costly operation and so in this post we shall implement objects to represent clusterings that don't do that.

The problem of the problem

Allan Kelly from Allan Kelly Associates

Light-2019-09-3-19-31.jpg

Some years ago I was managing a team at an internet TV pioneer. Shortly after a release our biggest customer – ITV – was on the phone complaining about a bug. Unfortunately one set of data fields were retaining their value when they should be wiped clear. It took a couple of months before we could include a fix in the release but we were confident we would make ITV happy.

A couple of hours after the fix went live ITV were on the phone. Why had we removed the daily cache? They weren’t just “working around” the bug, they were utilising it as a feature.

Sometimes it seems the problem you think you are dealing with isn’t the problem you are dealing with. Sometimes the way to solve a problem is not to fix the problem head on. Rather the solution comes from reframing the problem so that it is easier to solve. To put it another way: the problem you are trying to solve is knowing the problem you are trying to solve.

I sometimes think of this as “go and look at it from a different place”. Whatever you are looking at – problem, opportunity, objective, way of life – looks different if you go and stand somewhere else. The more different the place you stand in the more different the thing looks.

Clearly one way of solving a problem is to define it as not a problem: “it is not a bug, it is a feature.” As the story above shows, one person’s “bug” can be another person’s “feature.”

The question is then: is the reframing acceptable to others? – can others share the reframed perspective?

By way of example, lets apply this to agile.

A big part of agile is focus: small user stories, morning stand-ups and sprint planning all help create focus. Once you decide what you are going to tackle you focus on it, push other things out of the way and do it. And do it to completion.

You might call this “eating the elephant”: don’t eat it all at once, carve off a small piece, eat and repeat.

Pushed to either extreme this approach has bad side effects. Focus too tightly or too inflexibility and you might deliver a thing but not a thing which others recognise as a solution to the problem. But taking a view too broadly, or being very flexible, negates the way of working because you don’t deliver anything, – or the solution you deliver doesn’t please anyone (the “you can’t please all the people all the time” problem.)

Reframing can be a powerful technique but it can also work against you if other’s reframe the problem.

So, how do you know, or rather how do you frame and decide what your objective is?

Deciding what the problem is requires some imagination, it requires focus and flexibility – to stare at the problem but be flexible in how you see it. It also requires understanding and then the ability to communicate that understanding to others.

While I often hear people say “we should focus on the problem” I wish we could spend more time focusing on the problem of knowing what the problem is.

Reframing the problem is an inherently human thing. While machines may now, or in the near future, be able to solve many problems there is a problem of knowing what problem to present to the machine.

A big part of human work – often managerial work – is framing problems and deciding which to solve and which not to solve. Framing can make a small problem big, or a big problem small. That is inherently a human activity.

So, how do you know what problems to address? how do you know where to look for opportunities?

Now isn’t that a problem in itself? – surely we could set that up as an objective in its own right. Again it needs defining and framing and… So how does one decide to eat an elephant? And which elephant? And even, why eat an elephant at all?

It all becomes recursive. You can’t recurse for ever so hopefully sooner or later you need to come to the top – otherwise you have just reinvented paralysis by analysis. Detailed problem thinking needs to be combined with vaguer, even oblique, thinking.

To make this very real: I’m a big fan of metrics, they help focus, they help you know if you are going in the right direction. But I detest metrics too because they are a blunt instrument which can mislead so easily.

Metrics are great for focus but they need to be combined with a healthy dose of scepticism and oblique thinking. Metrics have limitations.

Sorry no solutions today. Just awareness of the problem of problems.


Like this post? – Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

Check out my latest books – Continuous Digital and Project Myopia – and the Project Myopia audio edition

The post The problem of the problem appeared first on Allan Kelly Associates.