Team DNA-impersonators create a business plan

Derek Jones from The Shape of Code

This weekend I was at the Hack the Police hackthon, sponsored by the Metropolitan Police+other organizations. My plan was to find an interesting problem to help solve, using the data we were told would be available. My previous experience with crime data is that there is not enough of it to allow reliable models to be built, this is a good thing in that nobody wants lots of crime. Talking to a Police intelligence officer, the publicly available data contained crimes (i.e., a court case had found somebody guilty), not reported incidents, and was not large enough to build allow a good model to be built.

Looking for a team to join, I got talking to Joe and Rebecca. Joe had discovered a very interesting possible threat to the existing DNA matching technique, and they were happy for me to join them analyzing this threat model; team DNA-impersonators was go.

Some background (Joe and Rebecca are the team’s genetic experts, I’m a software guy who has read a few books on the subject; all the mistakes in this post are mine). The DNA matching technique used by the Police is based on 17 specific sequences (each around 100 bases, known as loci), within the human genome (which contains around 3 billion bases).

There are companies who synthesize sequences of DNA to order. I knew that machines for doing this existed, but I did not know it was possible to order a bespoke sequence online, and how inexpensive it was.

Some people have had their DNA sequenced, and have allowed it to be published online; Steven Pinker is the most famous person I could find, whose DNA sequence is available online (link not given; it requires work+luck to find). The Personal Genome Projects aims to sequence and make available the complete genomes of 100,000 volunteers (the UK arm of this project is on hold because of lack of funding; master criminals in the UK have a window of opportunity: offer to sponsor the project on condition that their DNA is included in the public data set).

How much would it cost to manufacture bottles of spray-on Steven Pinker DNA? Is there a viable business model selling Pinker No. 5?

The screen shot below shows a quote for 2-nmol of DNA for the sequence of 100 bases that are one of the 17 loci used in DNA matching. This order is for concentrated DNA, and needs to be diluted to the level likely to be found as residue at a crime scene. Joe calculated that 2-nmol can be diluted to produce 60-liters of usable ‘product’.

Quote for synthesis of 100 bases of human DNA.

There was not enough time to obtain sequences for the other 16-loci, and get quotes for them. Information on the 17-loci used for DNA matching is available in research papers; a summer job for a PhD student to sort out the details.

The concentrate from the 17-loci dilutes to 60-liters. Say each spray-on bottle contains 100ml, then an investment of £800 (plus researcher time) generates enough liquid for 600-bottles of Pinker No. 5.

What is the pricing model? Is there a mass market (e.g., Hong Kong protesters wanting to be anonymous), or would it be more profitable to target a few select clients? Perhaps Steven Pinker always wanted to try his hand at safe-cracking in his spare time, but was worried about leaving DNA evidence behind; he might be willing to pay to have the market flooded, so Pinker No. 5 residue becomes a common occurrence at crime scenes (allowing him to plausible claim that any crime scene DNA matches were left behind by other people).

Some of the police officers at the hack volunteered that they knew lots of potential customers; the forensics officer present was horrified.

Before the 1980s, DNA profiling was not available. Will the 2020s be the decade in which DNA profiling ceases being a viable tool for catching competent criminals?

High quality photocopiers manufacturers are required to implement features that make it difficult for people to create good quality copies of paper currency.

What might law enforcement do about this threat to the viability of DNA profiling?

Ideas include:

  • Requiring companies in the bespoke DNA business to report suspicious orders. What is a suspicious order? Are enough companies in business to make it possible to order each of the 17-loci from different company (we think so)?
  • Introducing laws making it illegal to be in possession of diluted forms of other people’s DNA (with provisions for legitimate uses).
  • Attacking the economics of the Pinker No. 5 business model by having more than 17-loci available for use in DNA matching. Perhaps 1,000 loci could be selected as potential match sites, with individual DNA testing kits randomly testing 17 (or more) from this set.

New books and cards – stuff happening

Allan Kelly from Allan Kelly Associates

RetroCardsLores-2019-09-13-16-51.jpg

I am sure some of you have noticed that my blogs have been a less regular the last few months. That is because I’ve been busy on other stuff. So a break from deep thoughts and advice on the software world to mention some other stuff I’ve been working on.

UserStories_Audio-2019-09-13-16-51.jpg
For a start, available right now, my Little Book of Requirements and User Stories is now available in audio format to listen to. Full details – and the FAQ as a free download on my website. You will find links there to buy it on Audible and Apple (its cheaper on Apple, don’t ask me why.)

To my surprise Little Book has long been my best-seller so I teamed up again with Stacy Gonzalez – who voiced Project Myopia for me – to produce an audio version of Little Book. In the few weeks it has been available sales are already outstripping Project Myopia!

AOPO-2019-09-13-16-51.jpg

Second, as some know I’ve been working with Apress to turn The Art of Product Ownership from a LeanPub eBook into a full regular book. That should be out in October, you can pre-order it on Amazon now.

(And if you can’t wait, I’ve got a pre-copy edit version I can share with you provided you promise to write an Amazon review when the book is published. Mail me or use the contact form if you are interested.)

Finally, that picture at the top of the page. I’ve been working with Nicolas Umiastowski to create a playing card retrospective. These are based on my Retrospective Dialogue Sheets In our experiments they have given retrospectives another twist. More about these soon – and details of how you can get a pack (in the mean time get in contact if you are really keen to try them.)


Like this post? – Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

Check out my latest books – Continuous Digital and Project Myopia – and the Project Myopia audio edition

The post New books and cards – stuff happening appeared first on Allan Kelly Associates.

Natural elimination, or the survival of the good enough

Derek Jones from The Shape of Code

Thanks to Darwin, the world is full of people who think that evolution, in nature, works by: natural selection, or the survival of the fittest. I thought this until I read “Good Enough: The Tolerance for Mediocrity in Nature and Society” by Daniel Milo.

Milo makes a very convincing case that nature actually works by: natural elimination, or the survival of the good enough.

Why might Darwin have gone with natural selection in his book, On the Origin of Species? Milo makes the point that the only real evidence that Darwin had to work with was artificial selection, that is the breeding of farm animals and domestic pets to select for traits that humans found desirable. Darwin’s visit to the Galápagos islands triggered a way of thinking, it did not provide him with the evidence he needed; Darwin’s Finches have become a commonly cited example of natural selection at work, but while Darwin made the observations it was not until 80 years later that somebody else spotted their relevance.

The Origin of Species, or to use its full title: “On the Origin of Species by means of natural selection, or the preservation of favored races in the struggle for life.” is full of examples and terminology relating to artificial selection.

Natural selection, or natural elimination, isn’t the result the same?

Natural selection implies an optimization process, e.g., breeders selecting for a strain of cows that produce the most milk.

Natural elimination is a good enough process, i.e., a creature needs a collection of traits that are good enough for them to create the next generation.

A long-standing problem with natural selection is that it fails to explain the diversity present in a natural population of some breed of animal (there is very little diversity in each breed of farm animal, they have been optimized for consistency). Diversity is not a problem for natural elimination, which does not reduce differences in its search for fitness.

The diversity produced as a consequence of natural elimination creates a population containing many neutral traits (i.e., characteristics that have no positive or negative impact on continuing survival). When a significant change in the environment occurs, one or more of the neutral traits may suddenly have positive or negative survival consequences; the creatures with the positive traits have opportunity time to adapt to the changed environment. A population whose members possess a diverse range of neutral traits has a higher chance of long-term survival than a population where diversity has been squeezed in the quest for the fittest.

I think that natural elimination also applies within software ecosystems. Commercial products survive if enough customers buy them, software developers need good enough know-how to get the job done.

I’m sure customers would prefer software ecosystems to operate on the principle of survival of the fittest (it reduces their costs). Over the long term is society best served by diverse software ecosystems or softwaremonocultures? Diversity is a way of encouraging competition, but over time there is diminishing returns on the improvements.

Cut Price Clusterings – a.k.

a.k. from thus spake a.k.

Last month we saw how we could efficiently generate hierarchical clusterings, which are sequences of sets of clusters, which are themselves subsets of a set of data that each contain elements that are similar to each other, such that if a pair of data are in the same clustering at one step then they must be in the same clustering in the next which will always be the case if we move from one step to the next by merging the closest pairs of clusters. Specifically, we used our ak.minHeap implementation of the min-heap structure to cache the distances between clusters, saving us the expense of recalculating them for clusters that don't change from one step in the hierarchy to the next.
Recall that we used three different schemes for calculating the distance between a pair of clusters, the average distance between their members, known as average linkage, the distance between their closest members, known as single linkage, and the distance between their farthest members, known as complete linkage, and that I concluded by noting that our algorithm was about as efficient as possible in general but that there is a much more efficient scheme for single linkage clusterings; efficient enough that sorting the clusters in each clustering by size would be the most costly operation and so in this post we shall implement objects to represent clusterings that don't do that.

The problem of the problem

Allan Kelly from Allan Kelly Associates

Light-2019-09-3-19-31.jpg

Some years ago I was managing a team at an internet TV pioneer. Shortly after a release our biggest customer – ITV – was on the phone complaining about a bug. Unfortunately one set of data fields were retaining their value when they should be wiped clear. It took a couple of months before we could include a fix in the release but we were confident we would make ITV happy.

A couple of hours after the fix went live ITV were on the phone. Why had we removed the daily cache? They weren’t just “working around” the bug, they were utilising it as a feature.

Sometimes it seems the problem you think you are dealing with isn’t the problem you are dealing with. Sometimes the way to solve a problem is not to fix the problem head on. Rather the solution comes from reframing the problem so that it is easier to solve. To put it another way: the problem you are trying to solve is knowing the problem you are trying to solve.

I sometimes think of this as “go and look at it from a different place”. Whatever you are looking at – problem, opportunity, objective, way of life – looks different if you go and stand somewhere else. The more different the place you stand in the more different the thing looks.

Clearly one way of solving a problem is to define it as not a problem: “it is not a bug, it is a feature.” As the story above shows, one person’s “bug” can be another person’s “feature.”

The question is then: is the reframing acceptable to others? – can others share the reframed perspective?

By way of example, lets apply this to agile.

A big part of agile is focus: small user stories, morning stand-ups and sprint planning all help create focus. Once you decide what you are going to tackle you focus on it, push other things out of the way and do it. And do it to completion.

You might call this “eating the elephant”: don’t eat it all at once, carve off a small piece, eat and repeat.

Pushed to either extreme this approach has bad side effects. Focus too tightly or too inflexibility and you might deliver a thing but not a thing which others recognise as a solution to the problem. But taking a view too broadly, or being very flexible, negates the way of working because you don’t deliver anything, – or the solution you deliver doesn’t please anyone (the “you can’t please all the people all the time” problem.)

Reframing can be a powerful technique but it can also work against you if other’s reframe the problem.

So, how do you know, or rather how do you frame and decide what your objective is?

Deciding what the problem is requires some imagination, it requires focus and flexibility – to stare at the problem but be flexible in how you see it. It also requires understanding and then the ability to communicate that understanding to others.

While I often hear people say “we should focus on the problem” I wish we could spend more time focusing on the problem of knowing what the problem is.

Reframing the problem is an inherently human thing. While machines may now, or in the near future, be able to solve many problems there is a problem of knowing what problem to present to the machine.

A big part of human work – often managerial work – is framing problems and deciding which to solve and which not to solve. Framing can make a small problem big, or a big problem small. That is inherently a human activity.

So, how do you know what problems to address? how do you know where to look for opportunities?

Now isn’t that a problem in itself? – surely we could set that up as an objective in its own right. Again it needs defining and framing and… So how does one decide to eat an elephant? And which elephant? And even, why eat an elephant at all?

It all becomes recursive. You can’t recurse for ever so hopefully sooner or later you need to come to the top – otherwise you have just reinvented paralysis by analysis. Detailed problem thinking needs to be combined with vaguer, even oblique, thinking.

To make this very real: I’m a big fan of metrics, they help focus, they help you know if you are going in the right direction. But I detest metrics too because they are a blunt instrument which can mislead so easily.

Metrics are great for focus but they need to be combined with a healthy dose of scepticism and oblique thinking. Metrics have limitations.

Sorry no solutions today. Just awareness of the problem of problems.


Like this post? – Like to receive these posts by e-mail?

Subscribe to my newsletter & receive a free eBook “Xanpan: Team Centric Agile Software Development”

Check out my latest books – Continuous Digital and Project Myopia – and the Project Myopia audio edition

The post The problem of the problem appeared first on Allan Kelly Associates.

Ecosystems chapter of “evidence-based software engineering” reworked

Derek Jones from The Shape of Code

The Ecosystems chapter of my evidence-based software engineering book has been reworked (I have given up on the idea that this second pass is also where the polishing happens; polishing still needs to happen, and there might be more material migration between chapters); download here.

I have been reading books on biological ecosystems, and a few on social ecosystems. These contain lots of interesting ideas, but the problem is, software ecosystems are just very different, e.g., replication costs are effectively zero, source code does not replicate itself (and is not self-evolving; evolution happens because people change it), and resources are exchanged rather than flowing (e.g., people make deals, they don’t get eaten for lunch). Lots of caution is needed when applying ecosystem related theories from biology, the underlying assumptions probably don’t hold.

There is a surprising amount of discussion on the computing world as it was many decades ago. This is because ecosystem evolution is path dependent; understanding where we are today requires knowing something about what things were like in the past. Computer memory capacity used to be a big thing (because it was often measured in kilobytes); memory does not get much publicity because the major cpu vendor (Intel) spends a small fortune on telling people that the processor is the most important component inside a computer.

There are a huge variety of software ecosystems, but you would not know this after reading the ecosystems chapter. This is because the work of most researchers has been focused on what used to be called the desktop market, which over the last few years the focus has been shifting to mobile. There is not much software engineering research focusing on embedded systems (a vast market), or supercomputers (a small market, with lots of money), or mainframes (yes, this market is still going strong). As the author of an evidence-based book, I have to go where the data takes me; no data, then I don’t have anything to say.

Empirical research (as it’s known in academia) needs data, and the ‘easy’ to get data is what most researchers use. For instance, researchers analyzing invention and innovation invariably use data on patents granted, because this data is readily available (plus everybody else uses it). For empirical research on software ecosystems, the readily available data are package repositories and the Google/Apple Apps stores (which is what everybody uses).

The major software ecosystems barely mentioned by researchers are the customer ecosystem (the people who pay for everything), the vendors (the companies in the software business) and the developer ecosystem (the people who do the work).

Next, the Projects chapter.

More Productive C++ with TDD

Phil Nash from level of indirection

The title might read a little like click-bait, and there are certainly some nuances and qualifications here. But, hey! That's what the article is for.

Those that know me know that I have been a practitioner of, and advocate for, TDD in C++ for many years. I originally introduced Catch in an attempt to make that easier. I've given many talks that touch on the subject, as well as giving coaching and consultancy. For the last year I've added to that with a workshop class that I have given at several conferences - in both one-day and two-day forms (at CppCon you can do either!).

But why am I all in on TDD? Is it really that good?

What has TDD ever done for me?

Most of the time, especially for new code (including new parts being added to legacy code) the benefits of using TDD include (but are not limited to):

  1. A decent set of tests with high coverage (100% if you're following a strict approach).
  2. Well factored, "clean code".
  3. (an aspect of 2) code that is easy to change.
  4. A more thoughtful design that is easier to work with.

But attaining these benefits is not automatic. Applying the discipline of TDD steers you towards the "pit of success" - but you still have to jump in! Fully achieving all the benefits relies on a certain amount of experience - with TDD itself - but also with a range of other design principles. TDD is like a spirit guide, nudging you in the right direction. It's down to you to listen to what TDD is telling you and take the right actions.

This is the first hurdle that people trying TDD fall down at. The core steps to TDD are simple and can be taught in about 10-20 minutes. Getting all the benefits from them takes experience. If you start trying to apply TDD to your production code straight away you will almost certainly just see overhead and constraints - with little-to-no benefit!

On your own this may just be a case of doing enough katas and side-projects until you feel confident to slowly start using it in more critical areas. If you're able to take a class (whether mine or someone else's), you can be helped past this initial stage - and also shown the cloud of additional things around TDD that will let you get the best out of it.

Types, Tests and EOP

First of all, I don't consider TDD to be the complete picture. There are some cases where it's just not practical (although these are often not the ones people think - it takes exprience to make that call). Where it is practical (the majority of cases, in my experience) it can form the backbone for your code-design approach - but should be complemented by other techniques.

If you think of these things as a toolbox to select from - then combining the right tools is the only way to be fully effective. Proficiency - if not mastery - with those tools is also necessary,

The tools that I find, again-and-again, work well with TDD are: using the type system to reduce - or eliminate - potential errors, and what I call Expression-Oriented Programming, which is really a distillation of Functional Programming.

These are two big topics in their own right, and I plan to follow up with deeper dives on each. In the meantime you'll get a better idea of my views on FP from my talk, "Functional Programming for Fun & Profit". I've yet to do a talk specifically on my views on how the type system can help, but there are elements in my recent series on Error Handling.

The bottom line, for now, is that the more you can lean on Types and FP, the less there will be left that needs testing.

TDD or not TDD

Alas, poor Yorik...

That's the question.

I've already hinted that it may not always be the best approach (but usually is). But even when it is - it is not sufficient. That is, the tests you are left with through pure TDD are only half the story. They helped you arrive at the design, gave you a users perspective on your code, and the confidence to make changes. But they won't find the edges of your design that you didn't explicitly consider. TDD might help you think of some pathological cases you might have otherwise left out (and leaning on type system may eliminate many more) - but you're going to have to consider other forms of testing to find the things you didn't design for.

Then there's the thorny issue of how to get legacy code under test - or should you even try? There are more tools in the toolbox to help with all of these things. Even though they are not part of TDD itself, to practice TDD well you'll need to be able to be able to weave it into this bigger picture.

Short-cutting the Gordian Knot

So while learning TDD, technically, is easy - mastering it is a far richer proposition. Returning to our opening question - is it worth it? My experience has been a resounding "yes"! All the pieces that come into play (and we've not mentioned them all, here) will give you more tools, greater experience and a better understanding of the bigger picture - even if you choose not to use TDD in a particular case.

But how can we get up to speed? You could do what I did and pick up bits here and there, reconcile it with your own years of experience, observe the results, adjust course and try again - eventually settling on a pattern that works.

Or you could join me on one of my workshops as I try to give you the benefit of that experience in a distilled form. You won't walk out a seasoned expert, but I think I can help you well down the road.

My next outings are:

Or if you'd like me to come into your company to deliver on-site training you can reach me on atdd@philnash.me.

Compiler validation used to be a big thing

Derek Jones from The Shape of Code

Compiler validation used to be a big thing; a NIST quarterly validated products list could run to nearly 150 pages, and approaching 1,000 products (not all were compilers).

Why did compiler validation stop being a thing?

Running a compiler validation service (NIST was also involved with POSIX, graphics, and computer security protocols validation) costs money. If there are enough people willing to pay (NIST charged for the validation service), the service pays for itself.

The 1990s was a period of consolidation, lots of computer manufacturers went out of business and Micro Focus grew to dominate the Cobol compiler business. The number of companies willing to pay for validation fell below the number needed to maintain the service; the service was terminated in 1998.

The source code of the Cobol, Fortran and SQL + others tests that vendors had to pass (to appear for 12 months in the validated products list) is still available; the C validation suite costs money. But passing these tests, then paying NIST’s fee for somebody to turn up and watch the compiler pass the tests, no longer gets your product’s name in lights (or on the validated products list).

At the time, those involved lamented the demise of compiler validation. However, compiler validation was only needed because many vendors failed to implement parts of the language standard, or implemented them differently. In many ways, reducing the number of vendors is a more effective means of ensuring consistent compiler behavior. Compiler monoculture may spell doom for those in the compiler business (and language standards), but is desirable from the developers’ perspective.

How do we know whether today’s compilers implement the requirements contained in the corresponding ISO language standard? You could argue that this is a pointless question, i.e., gcc and llvm are the language standard; but let’s pretend this is not the case.

Fuzzing is good for testing code generation. Checking language semantics still requires expert human effort, and lots of it. People have to extract the requirements contained in the language specification, and write code that checks whether the required behavior is implemented. As far as I know, there are only commercial groups doing this, i.e., nothing in the open source world; pointers welcome.

CppCon 2019 Class, Presentation and Book Signing

Anthony Williams from Just Software Solutions Blog

It is now less than a month to this year's CppCon, which is going to be in Aurora, Colorado, USA for the first time this year, in a change from Bellevue where it has been for the last few years.

The main conference runs from 15th-20th September 2019, but there are also pre-conference classes on 13th and 14th September, and post-conference classes on 21st and 22nd September.

I will be running a 2-day pre-conference class, entitled More Concurrent Thinking in C++: Beyond the Basics, which is for those looking to move beyond the basics of threads and locks to the next level: high level library and application design, as well as lock-free programming with atomics. You can book your place as part of the normal CppCon registration.

I will also be presenting a session during the main conference on "Concurrency in C++20 and beyond".

Finally, I will also be signing copies of the second edition of my book C++ Concurrency In Action now that it is in print.

I look forward to seeing you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Breakfast: One for the bikers with Matt Leach of Geotekk

Paul Grenyer from Paul Grenyer


Breakfast: One for the bikers with Matt Leach of Geotekk

When: Tuesday: September 3, 2019 - 7:30am to 8:30pm
Where: The Maids Head Hotel, Tombland, Norwich, NR3 1LB
How much: £13.95
RSVP: https://www.meetup.com/Norfolk-Developers-NorDev/events/qqwhznyzmbfb/

Matt will talk about Geotekk’s product design and fund raising journey and how the company has developed through a belief that anything which serves to reduce stress and worry in everyday lives enables a happier life empowering us to “Live More”.

Matt is co-founder of Geotekk, a company specialising in smart alarms for bikes. Founded in 2015 in response to ever-rising levels of bike theft, Matt and his co-founder James strive to provide customers with freedom and peace of mind by creating an affordable, versatile and best-in-class smart alarm. This alarm would combine and improve the most effective features of other security products into one multi-functional package.