CppOnSea

Frances Buontempo from BuontempoConsulting

CppOnSea 2019

Phil Nash organised a new conference, CppOnSea, this year. I was lucky enough to be accepted to speak, so attended to two conference days, but not the workshops.


There were three tracks, along with a beginners track, run by Tristan Brindle, who organises the C++ London Uni, which is a great resource for people who want to learn C++. I'll do a brief write-up of the talks I attended.


The opening keynote was by Kate Gregory, called "Oh The Humanity!" She made us think about the words we use. For example, Foo and Bar trace back to military usage, hinting at people putting their lives on the line. Perhaps we need better names for our variable and functions. We are not fighting a war. What about one letter variable names? 'k. Nuff said. What about errorMessage? If you call the helpMessage instead, how does that affect your thinking?
Kate was also talking about trying to keep the code base friendly, to increase confidence:



I went to see Kevlin Henney next. I have no idea how to summarise this. He covered so many thing. What does structured programming really mean? By looking back to various uses and abuses of goto, By highlighting the structure in various code snippets, he was emphasising some styles make the structure and intent easier to see.

I saw Andreas Fertig next. Inspired by Matthew CompilerExplorer's Godbolt, he has created https://cppinsights.io/. Try it out. It unwraps some of the syntactic sugar, so you can see what the compiler has created for, say, a range based for loop. This can remind you where you might be creating temporaries or have references instead of copies or vice versa, without dropping down into assembly. Do you know the full horror of what might be going on inside a Singleton?

This led to an aside about statics and the double checked locking pattern. His headline point was the spirit of C++ is "you pay only for what you use", so be clear about what you are using. The point isn't that the new language features are expensive. They are often cheaper than old skool ways of going things. Just try to be clear about what's going on under the hood. Play with the insights tool.

Next up, I saw Barney Dellar, talking about strong types in C++. His slides are probably clear enough by themselves, since they have thorough speaker notes. My main note to myself says "MIB: mishap investigation board", which amused me. He was talking about the trouble that can happen if you have doubles, or similar, for all your types, like mass and force:

double CalculateForce(double mass);

It's really easy to use the wrong units or send things in in the wrong order. By creating different types, known as strong types, you can get the compiler to stop you making mistakes. Use a template with tags, you can write clear code avoiding these mistakes.

Next up was a plenary talk by Patricia Aas on Deconstructing Privilege. She's given the talk before, so you'll be able to find the slides or previous versions on YouTube. Her take is that privilege is about things that haven't happened to you. Many people get defensive if you say they have been privileged,  but this way of framing the issue gives a great perspective. Loads of people turned up and listened. Maybe surprising for a serious geek C++ conference, but the presence of https://www.includecpp.org/ ensured there were many like minded people around. If you are privileged, listen and try to help. And be careful asking intrusive questions if you meet someone different to you.

After quite a heavy, but great talk, I was "in charge" of the lightning talks. Eleven people got slots. More volunteered, but there wasn't time for every one:

Simon Brand; C++ Catastrophes: A Poem.
Odin Holmes; volatile none of the things
Paul Williams; std::pmr
Heiko Frederik Bloch; the finer points of parameter packs
Barney Dellar;imposter syndrome or mob programming
Matt Godbolt; "words of power"
Kevlin Henney; list
Neils Dekker; noexcept considered harmful???
Patricia Aas; C++ is like JavaScript
Louise Brown; The Research Software Engineer - A Growing Career Path in Academia
Denis Yaroshevskiy; A good usecase for if constexpr


My heartfelt thanks to Jim from http://digital-medium.co.uk and Kevlin "obi wan kenobi" Henney for helping me switch between powerpoint, power point in presenter mode and the pdfs, and getting them to show on the main screen and my laptop. No body knows what was happening with the screen on the stage for the speaker. If you ever attend a conference, do volunteer to give a lightning talk. Sorry to the people we didn't have time for.

Day one done. Day two begun.First up, for me, after missing my own talk pitch, was Nico Josuttis. Don't forget his leanpub C++17 book. It's still growing. He talked about a variety of C++17 features. The standout point for me was the mess you can get into with initialisation. He's using {} everywhere, near enough. Like

for (int i{0}; i<n; ++i)
{
}

Adding an equals can end up doing horrible things.

Much as I wanted to go see Simon Brand, Vittorio Romeo and Hana Dusikova (with slide progression by Matt Godbolt) next, I had a talk to do myself. I managed to diffuse my way out of a paper bag, while reminding us why C's rand is terrible, how useful property based testing can be, using some very simple mathematics: adding up and multiplying. This was based on a chapter of my book, and you can download the source code from that page if you want, even if you don't buy the book. I used the SFML to draw the diffusing green blobs. Sorry for not putting up a list of resources near the end.

I attempted to go to Guy Davidson's Linear algebra talk next, but the room was packed and I was a bit late. I heard great things about this. In particular, how important it is to design a good interface if you are making libraries.

My final choice was Clare Macrae's "Quickly testing legacy code". This was my unexpected hidden gem of the conference. She talked about approval testing. This compares a generated file to a gold standard file  and bolts straight into googletest or Catch. It's available for several other languages. It generates the file on your first run, allowing you to get almost anything, provided it writes out a file you can compare, under test. Which then means you can start writing unit tests, if you need to change the code a bit. Changing legacy code to get it under test, without a safety hardness is dangerous. This keeps you safer. Her world involves Qt and chemical molecules visualisations. These can be saved as pngs, so she can check she hasn't broken anything. She showed how you can bolt in custom comparators, so it doesn't complain about different generated dates and does a closer than the human eye could notice RGB difference. Her code samples from the talk are here.  I've not seen this as a formal framework before. Her slides were really clear and she explained what she was up to step by step. Subsequently twitter has been talking about this a fair bit, including adding support for Python3.

Matt "Compiler Explorer" Godbolt gave the closing keynote. Apparently, the first person Phil Nash has met who;s first conference talk was a keynote. He also had some AV issues:


If you've not encountered the compiler explorer before, try it out. You can chose which compiler you want to point your C++ code at and see what it generates. You need a little knowledge of the "poetry" it generates. More lines doesn't mean slower code. His tl;dr; message was many people spread rumours about what's slow, for example virtual functions. Look and see what your compiler actually does, rather than stating things that were true years ago. Speculative de-virtualization is a thing. Your compiler might decide you only really have one likely virtual function you'll call so checks the address and does not then have the "overhead" it used to years ago. He also demonstrated what happened to various bit counting algos - most got immediately squashed down to one instruction, no matter how clever they looked. How many times have you been asked to count bits at interview. Spin up Godbolt and explore.This really shows you need to keep up to date with your knowledge. Something that was true ten years ago may not longer hold with new compiler versions. Measure, explore, think.

There was a lovely supportive atmosphere and a variety of speakers. People were brave enough to ask questions, and only a few people were showing off they though they new something the speaker didn't.

I'll try to back fill links to slides as I get them. Thanks to Phil for arranging this.

Did I mention I wrote a book?


xkcd-style plots in MatPlotLib

Frances Buontempo from BuontempoConsulting

Most programmers I know are familiar with xkcd, the webcomic of romance, sarcasm, math, and language. In order to create diagrams for my machine learning book, I wanted a way to create something I could have fun with.

I discovered that Python's MatPlotLib library has an xkcd style, which you simply wrap round a plot. This allowed me to piece together what I needed using line segments, shapes, and labels.

Given a function, f, which draws what you need on some axes ax, use the style like this, and you're done:

with plt.xkcd():
    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    f(ax)

    plt.show()

For example to explain what happens when you fire cannons at different angles:


I gave a talk at Skillsmatter, called "Visualisation FTW", which was recorded, so you can watch it if you want. I also wrote this up for ACCU's CVu magazine. The ACCU runs an annual best article survey, and I was runner up, which was a pleasant surprise. You need to be a member to view the article, but it covers the same ground as the short talk at Skillsmatter.

Buy a copy of my book, or go play with xkcd style pictures. Have fun; I did.



Does machine learning really involve data?

Frances Buontempo from BuontempoConsulting

Many definitions of machine learning start by proclaiming it uses data, to learn. I want to challenge this, or remind us where the term originally came from and consider why the meaning has shifted.

For a long time machine learning seemed to be a new technology, but I notice we're starting to say AI and machine learning interchangeably. Job postings often sneak the word scientist in there too. What is a data scientist? What do any of these words mean?

Current trends often come with an air of mystery. I suspect a lot of data science roles involve data entry, in order to clean input data. Not as appealing as the headline role suggests. Several day to day techniques being described as machine learning  could also be described as statistics. In fact, look at the table of contents of a statistics book, such as An Introduction to Statistical Learning. Look at a small selection of the topics:

  • accuracy
  • k-means clustering
  • making predictions
  • cross-validation
  • support vector machines, SVM
  • principal component analysis, PCA


Most, if not all, of these topics are covered in an average machine learning course and included in ML software packages. Yet statistics doesn't sound as exciting as machine learning, to many people.

Wikipedia defines statistics as "a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation." No mention of learning, though each of these activities form an essential part of data science. The article goes on to discuss descriptive and inferential statistics. Inference involves making predictions: many people use the term machine learning to mean the very same. Can you spot patterns in purchases automatically and suggest other items a customer might be interested in? Can you detect unusual or anomalous behaviour, indicating fraud or similar? Again, these are now labelled as AI or machine learning, but usually rely on well established statistical techniques. Admittedly, today's faster machines mean number crunching can happen quickly. This has contributed to the resurgence of machine learning.

Many problem solving algorithms are not about numbers. Some techniques, such as evolutionary computing, including genetic algorithms, don't fit comfortably into a data-driven view of learning. Do these methods count as machine learning? I'll leave that for you to think about. My book explores genetic algorithms and several other areas that do not need numbers to learn.

Arthur Samuel came out with the phrase "machine learning", by which he meant something along the lines of a "field of study that gives computers the ability to learn without being explicitly programmed." The abstract of his 1959 paper, "Some studies in machine learning using the game of checkers" states,

Two machine-learning procedures have been investigated in some detail using the game of checkers. Enough work has been done to verify the fact that a computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program. Furthermore, it can learn to do this in a remarkably short period of time (8 or 10 hours of machine-playing time) when given only the rules of the game, a sense of direction, and a redundant and incomplete list of parameters which are thought to have something to do with the game, but whose correct signs and relative weights are unknown and unspecified. The principles of machine learning verified by these experiments are, of course, applicable to many other situations.

AI and machine learning are both very old terms. I think they encompass a much broader field than data analysis. As a final thought, Turing designed an algorithm to play chess. In effect, he was trying to make an artificial brain, before the term AI was invented or computers, in their modern sense, existed.

I think machine learning is much broader than investigating data. Its history involves attempting to get computers to learn, and specifically to learn to play games.Let the games continue.


Read my book and see what you think.


I wrote a book

Frances Buontempo from BuontempoConsulting

I've written a book pulling together some of my previous talks showing how to code your way out of a paper bag using a variety of machine learning techniques and models, including genetic algorithms.
It's on pre-order at Amazon and you can download free excerpts from the publishers website.

The sales figures show I've sold over 1,000 copies already. I'm going through the copy edits at the moment. I can't wait to see the actual paper book.

Thank you to everyone at ACCU who helped and encouraged me while I wrote this.

I will be giving some talks at conferences and hopefully some meetups based on ideas in some of the chapters in 2019.

Watch this space.

Gitlab certificates

Frances Buontempo from BuontempoConsulting

On Ubuntu, cloning a repo from a machine you don't have a certificate for will give the error:

fatal: unable to access 'https://servername': server certificate verification failed. CAFuile /etc/ssl/certs/your_filename CRLfile: None

You can work around this by tell git clone not to use the certificate e.g.

git config --system http.sslverify false


which is asking for trouble. However you can install the certificate, so you don't need to keep doing this. 

Using an answer here: https://stackoverflow.com/questions/21181231/server-certificate-verification-failed-cafile-etc-ssl-certs-ca-certificates-c  looks to have worked, by trying things one step at a time:

hostname=gitlab.city.ac.uk
port=443
trust_cert_file_location=`curl-config --ca`
sudo bash -c "echo -n | openssl s_client -showcerts -connect $hostname:$port \
    2>/dev/null  | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p'  \
    >> $trust_cert_file_location"
I did try this first – so errors don’t end up in dev null:

openssl s_client -showcerts -connect $hostname:$port


Also, I first got the error sed: unrecognised option '--ca'
It took a moment to realise the --ca came from curl-config, which I needed to install.

Windows batch files

Frances Buontempo from BuontempoConsulting

I've been writing a batch file to run some mathematical models over a set of inputs.
The models are software reliability growth models, described here.

We are using
  • du: Duane
  • go: Goel and Okumto
  • jm: Jelinski and Moranda
  • kl: Keiller and Littlewood
  • lm: Littlewood model
  • lnhpp: Littlewood non-homogeneous Poisson process
  • lv: Littlewood and Verrall
  • mo: Musa and Okumoto
Littlewood appears many times: he founded the group where I currently work. 

So, far too much background. I have one executable for each model, after making a make file; yet another story. And a folder of input files, named as f3[some dataset]du.dat, f3[some dataset]go.dat,... f3[some dataset]mo.dat. I also have some corresponding output files someone else produced a while ago, so in theory I can check I get the same numbers.I don't but that's going to be yet another story.

You can also use the original file and generated file to recalibrate, giving yet another file. Which I have previously generated results from. Which also don't match. 

I wanted to be able to run this on Ubuntu and Windows, and managed to make a bash script easily enough. Then I tried to make a Windows batch file to do the same thing. I'll just put my final result here, and point out the things I tripped up on several times.


ECHO OFF
setlocal EnableDelayedExpansion
setlocal 


for %%m in (du go jm kl lm lnhpp lv mo) do (
  echo %%m
  for %%f in (*_%%m.dat) do (
    echo %%~nf
    set var=%%~nf
    echo var: !var!
    set var=!var:~2!
    echo var now: !var!

    swrelpred\%%m.exe %%~nf.dat "f4!var!"
    swrelpred\%%mcal.exe %%~nf.dat "f4!var!" "f9!var!"
  )
)


1. First, turn the echo off because there's way too nosie otherwise.
2. Next, enable delayed expansion, otherwise things in blocks get expanded on sight and therefore don't change in the loop: "Delayed expansion causes variables delimited by exclamation marks (!) to be evaluated on execution"  from stack exchanges' Superuser site
3. Corollary: Use ! in the variables in the block not % for delayed expansion.
4.  But we're getting ahead of ourselves. The setlocal at the top means I don't set the variables back at my prompt. Without this, as I changed my script to fix mistakes it did something different between two runs, since a variable I had previously set might end up being empty when I broke stuff.
5. "Echo is off" spewed to the prompt means I was trying to echo empty variables, so the var: etc tells me which line something is coming from.
6. !var:~2! gives me everything from the second character, so I could drop the f3 at the start of the filename and make f4 and f9 files to try a diff on afterwards. Again pling for delayed expansion.




I suspect I could improve this, but it's six importnat things to remember another time.

Writing this in Python might have been easier. Or perhaps I should learn Powrshell one day.


Call a dll from Python

Frances Buontempo from BuontempoConsulting

Letting VS2015 make a dll called SomeDLL for me with these implementations

// SomeDll.cpp : 
// Defines the exported functions for the DLL application.
//

#include "stdafx.h"
#include "SomeDll.h"


// This is an example of an exported variable
SOMEDLL_API int nSomeDll=0;

// This is an example of an exported function.
SOMEDLL_API int fnSomeDll(void)
{
    return 42;
}

// This is the constructor of a class that has been exported.
// see SomeDll.h for the class definition
CSomeDll::CSomeDll()
{
    return;
}
 

I then make a python script, using ctypes and loading the dll, using os to find it:

import os
import ctypes

os.chdir("C:\\Users\\sbkg525\\src\\SomeDll\\Debug")
SomeDll = ctypes.WinDLL("SomeDll.dll")


I can either use attributes of the library or use protoypes. I tried protoypes first.The function returns an int and takes no parameters:

proto = ctypes.WINFUNCTYPE(ctypes.c_int)
params = ()

answer = proto(("fnSomeDll", SomeDll), params)


Unfortunately this says

AttributeError: function 'fnSomeDll' not found

because C++ is name mangled. extern "C" FTW another time; for now

link.exe /dump /exports Debug\SomeDll.dll

          1    0 0001114F ??0CSomeDll@@QAE@XZ =

                  @ILT+330(??0CSomeDll@@QAE@XZ)
          2    1 00011244 ??4CSomeDll@@QAEAAV0@$$QAV0@@Z =

                  @ILT+575(??4CSomeDll@@QAEAAV0@$$QAV0@@Z)
          3    2 0001100A ??4CSomeDll@@QAEAAV0@ABV0@@Z =

                  @ILT+5(??4CSomeDll@@QAEAAV0@ABV0@@Z)
          4    3 0001111D ?fnSomeDll@@YAHXZ =

                  @ILT+280(?fnSomeDll@@YAHXZ)
          5    4 00018138 ?nSomeDll@@3HA =

                  ?nSomeDll@@3HA (int nSomeDll)

Looks like we want 4;

answer = proto(("?fnSomeDll@@YAHXZ", SomeDll), params)

print answer

>>> <WinFunctionType object at 0x0248A7B0>


Of course. It's a function. Let's call the function

print answer()
>>> 42


Done. I'll try attributes next. And functions which take parameters. Later, I'll look at other calling conventions.

First, the docs says, "foreign functions can be accessed as attributes of loaded shared libraries.
Given

extern "C"
{
    SOMEDLL_API void hello_world()
    {
        //now way of telling this has worked
    }

}

we call it like this

lib = ctypes.cdll.LoadLibrary('SomeDll.dll')
lib.hello_world()


It seems to assume a return type of int. For example,

extern "C"
{
    SOMEDLL_API double some_number()
    {
        return 1.01;
    }

}

called as follows

lib = ctypes.cdll.LoadLibrary('SomeDll.dll')
lib.hello_world()
val = lib.some_number()
print val, type(val)

Gives

-858993460, <type 'int'>


We need to specify the return type, since it's not an int:



lib.some_number.restype = ctypes.c_double
val = lib.some_number()
print val, type(val)


then we get what we want

1.01 <type 'float'>

We need to do likewise for parameters

extern "C"
{
    SOMEDLL_API double add_numbers(double x, double y)
    {
        return x + y;
    }
}


If we just try to call it with some floats we get an error

    total = lib.add_numbers(10.5, 25.7)
ctypes.ArgumentError: argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1


Stating the parameter type fixes this

lib.add_numbers.restype = ctypes.c_double
lib.add_numbers.argtypes = [ctypes.c_double, ctypes.c_double]
total = lib.add_numbers(10.5, 25.7)
print total, type(total)


36.2 <type 'float'>

Just one starter thought on strings. Returning a const char * seems to give a str in python

SOMEDLL_API const char * speak()
{
    return "Hello";


called like this



lib.speak.restype = ctypes.c_char_p
print lib.speak()


says "Hello"; using parameters as strings and particularly as refrences that can be changed needs some investigation.

Does a similar approach using attributes of the loaded library work for non-extern "C" functions? Can I use the proto approach on the extern "C" functions?

Let's see...







Codes of Conduct

Frances Buontempo from BuontempoConsulting

There's been yet another thread about code of conducts at conferences on twitter, and I wanted to compare them to conditions of sale for gig or festival tickets. Here's an example:
Let's consider a sample:
1. I can't turn up before it starts? Why even say that? Oh, and 2. I have to leave when it's over. THIS IS PC GONE MAD. OK, not PC. But why do you need to spell this out? It's like putting a precondition of a string has to terminate with a null for you to call strlen.
3. Yeah, rulez. Whatever. Surprised you didn't say it's illegal to break the law. What is this?
Blahblahblah.
9. Yeah, fair enough. But what about flame throwers?
No, no, no...
16. Does that mean I can bring *Real* weapons?
Blahblahblah.

25. There might be swear words?! Why tell me that?

Or maybe some people will bring kids with them and that might need a sane conversation about a time and a place for certain behaviours. I know the bands, I know what kind of music to expect. I know what I'm walking in to.

I also know there's a clearly marked welfare tent on site just in case.

And wardens on the campsite wearing obvious vests.

And hundreds of programmers listening to their favourite rock stars.
(OK, not all the punters are programmers, but many are).

Why did I ever go to a field full of metallers? (Knowing I'd probably be one of the few women there). I went with friends, which made me feel safer. I wanted to listen to bands I knew, and discover new music. If I'd gone by myself, knowing in advance about wardens and the welfare tent would have made me feel OK. Hey, the conditions of sale show the organisers have thought about things that might go wrong or concern people, and that makes me feel safer.

Do they make people think, "This is PC gone mad? If you need to tell me not to set fire to myself and drink myself to death then you are insulting me and I want nothing to do with this?"

Not by the looks of the number of people who turn up. And there have been more women and kids recently which is great.

What are codes of conduct for? Quite frankly me. But not just me. They are not there to tell you how to behave because the organisers think white guys don't know how to conduct themselves, and that all such guys are potential rapists or murders. In some ways, the logic of stating a CoC is pointless because we know how to be nice taken to extreme could mean laws are pointless. Why say "You aren't allowed to commit murder?" Surely that doesn't need pointing out? And yet most countries have such a law.

Conditions of sale and code of conducts aren't laws, but they give a shared statement of expectation and make me feel OK about going to conferences/talks/festivals/gigs alone. And then I meet loads of new people and discover new music. And I can't wait til Bloodstock. Or the next tech talk/conference I go to.





Elastic stack – RTFM

Frances Buontempo from BuontempoConsulting

I tried to setup ELK (well, just elasticsearch and kibana initially), with a view to monitoring a network.

Having tried to read the documentation for an older version than I'd downloaded and furthermore one for *Nix when I'm using Windows, I eventually restarted at the "Learn" pages on https://www.elastic.co/

There are a lot of links in there, and it's easy to get lost, but it is very well written.

This is my executive summary of what I think I did.

First, download the zip of kibana and elasticsearch.

From the bin directory for elasticsearch, run elasticsearch.bat file, or run service install then service run. If you run the batch file it will spew logs to the console, as well as a log file (in the logs folder). You can tail the file if you choose to run it as a service. Either works.

If you then open http://localhost:9200/ in a suitable browser you should see something like this:

{
"name" : "Barbarus",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "bE-p5dLXQ_69o0FWQqsObw",
"version" : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},
"tagline" : "You Know, for Search"
}
 
The name is a randomly assigned Marvel character. You can configure all of this, but don't need to just to get something up and running to explore. kibana will expect elasticsearch to be on port 9200, but again that is configurable. I am getting ahead of myself though.

Second, unzip kibana, and run the batch file kibana.bat in the bin directory. This will witter to itself. This starts a webserver, on port 5601 (again configurable, but this by default): so open http://localhost:5601 in your browser.

kibana wants an "index" (way to find data), so we need to get some into elasticsearch: the first page will say "Configure an index pattern". This blog has a good walk through of kibana (so do the official docs).

All of the official docs tell you to use curl to add (or CRUD) data in elasticsearch, for example
curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
{
"name": "John Doe"
}'

NEVER try that from a Windows prompt, even if you have a curl library installed. You need to escape out the quote, and even then I had trouble. You can put the data (-d part) in a file instead and use @, but it's not worth it.
Python to the rescue. And Requests:HTTP for Humans
pip install requests
to the rescue.
Now I can run the instructions in Python instead of shouting at a cmd prompt.

import requests
r = requests.get('http://localhost:9200/_cat/health?v')

r.text


Simple. The text shows me the response. There is a status code property too. And other gooides. The the manual. For this simple get command you could just point your browser at localhost:9200/_cat/health?v


Don't worry if the status is yellow - this just means you only have omne node so it can't replicate in cause of disaster.

Notice the transport, http:// at the start. If you forget this, you'll get an error like
>>> r = requests.put('localhost:9200/customer/external/1?pretty', json={"name": "John Doe"})
...
    raise InvalidSchema("No connection adapters were found for '%s'" % url) requests.exceptions.InvalidSchema: No connection adapters were found for 'localhost:9200/customer/external/1?pretty'



Now we can put in some data.

First make an index (elastic might add this if you try to put data under a non-existent index). We will then be able to point kibana at that index - I mentioned kibana wanted an index earlier.
r = requests.put('http://localhost:9200/customer?pretty')


Right, now we want some data.
>>> payload = {'name': 'John Doe'}
>>> r = requests.post('http://localhost:9200/customer/external/1?pretty', json=payload)


If you point your browser at localhost:9200/customer/external/1?pretty you (should) then see the data you created. We gave it an id of 1, but it will be automatically assigned a unique id if we left that off.

We can use requests.delete to delete, and requests.post to update:
 >>> r = requests.post('http://localhost:9200/customer/external/1/_update', \
 json={ "doc" : {"name" : "Jane Doe"}})

Now, this small record set won't be much use to us. The docs have a link to some json data. I downloaded some ficticious account data. SO to the rescue for uploading the file:


>>> with open('accounts.json', 'rb') as payload:
...   headers = {'content-type': 'application/x-www-form-urlencoded'}
...   r = requests.post('http://localhost:9200/bank/account/_bulk?pretty', \ 

              data=payload,  verify=False, headers=headers)
...

>>> r = requests.get('http://localhost:9200/bank/_search?q=*&pretty')
>>> r.json()
This is equivalent to using

>>> r = requests.post('http://localhost:9200/bank/_search?pretty', \

      json={"query" : {"match_all": {}}})
 i.e. instead of q=* in the uri we have put it in the rest body.



Either way, you now have some data which you can point kibana at. In kibana, the discover tab allows you to view the data by clicking through fields. The visualise tab allows you to set up graphs. What wasn't immeditely apparent was once you have selected your buckets, fields and so forth, you need to press the green "play" button by the "options" to make it render your visualisation. And finally, I got a pie chart of the data.  I now need to point it at some real data.

 
 

 

Pipeline 2016

Frances Buontempo from BuontempoConsulting

A write up of my notes: they may or may not make any sense.

Keynote: Jez Humble "What I Learned From Three Years Of Sciencing The Cr*p Out Of Continuous Delivery" or "All about SCIENCE"

Suverys

Surveys are measures looking for latent constructs for feelings and similar - see psychometrics.
Surveys need a hypothesis to test and should be worded carefully.
Consider discriminant and convergent validity.
Test for false positives.

Consider the Westrum toypology.
With 6 axes (rows) scaled across three columns: pathological, bureaucratic, generative you can start spotting connections.

Pathological
Bureaucratic
Generative
Power Oriented
Rule Oriented
Performance Oriented
Low cooperation
Modest cooperation
High cooperation
Messengers shot
Messengers neglected
Messengers trained
Responsibilities shirked
Narrow responsibilities
Risks are shared
Bridging discouraged
Bridging tolerated
Bridging encouraged
Failure leads to scapegoating
Failure leads to justice
Failure leads to inquiry
Novelty crushed
Novelty leads to problems
Novelty implemented

For example "Failure leads to" has three different options: scapegoating, justice or inquiry. Where does your org come out for each question? If they say "It's all Matt's fault" and sack Matt that won't avoid mistakes happening again. Blameless postmortems are important.
IT and aviation are both high-tempo, high consequence environments. They are adaptive complex systems: there is frequently not enough information to make a decision. Therefore reduce the consequences of things going wrong.
In general for surveys, use a Likert type scale - use clearly worded statements on a scale, allowing numerical analysis. See if your questions "load together" (or bucket). Maybe spotting what's gone wrong with some software buckets into notification from outside (customers etc) and notification from inside (alerts etc).
Consider CMV, CMB - common method variance or bias. Look for early versus late respondents.
See https://puppetlabs.com/2015-devops-report for the previous devops survey.
In fact take this year's https://puppetlabs.com/blog/2016-state-devops-survey-here

IT performance

How do you measure it? How do you predict it? It seems that "I am satisfied with my job" is the biggest predictor of organisational performance.
Does your company have a culture of "autonomy, mastery, purpose"? What motivates us? [See Pink]

How do we measure IT performance? Consider lead time, release frequency, time to restore, change failure rate...
Going faster doesn't mean you break things, it actually makes you *more* stable, if you look at the data [citation needed]
"Bi-modal IT" is wrong: watch out for Jez's upcoming blog about "fast doesn't compromise safety"

Do we still want to work in the dark-ages of manual config and no test automation?

We claim we are doing continuous integration (CI) by redefining CI. Do devs merge to trunk daily? Do you have tests? Do you fix the build if it goes red?

Aside: "Surveys are a powerful source of confirmation bias"

Question: Can we work together when things go wrong?

Do you have peer reviewed changes? (Mind you, change advisory boards)

Science again (well, stats)

SEM: structured equation modelling: use this to avoid spurious correlations.

Apparently 25% of people do TDD - it's the lost XP practice. TDD forces you to write code in testable ways: it's not about the tests.

How good are your tests? Consider mutation testing e.g. Ivan Moore's Jester

Change advisory boards don't work. They obviously impact throughput but have negligible impact on stability. Jez suggested the phrase "Risk management theatre".


Ian Watson and Chris Covell "Steps closer to awesome"

They work at Call Credit (used to be part of the Skipton building soc) and talked about how to change an organisation.

Their hypothesis: "You already have the people you need."
"Metal as a service" sneaked a mention, since some people were playing buzz-word bingo.
Question: what would make this org "nirvana"?
They started broadcasting good (and bad) things to change the culture. e.g. moving away from a fear of failure. Having shared objectives helped.

We are people, not resources. "Matrix management" (queue obvious slides)  - not a good thing. Be the "A" team instead. (Or the goonies).

The environment matters. They suggested blowing up a red balloon each time you are interrupted for 15 seconds or more, giving a visual aid of the distractions.

They mentioned "Death to manual deployments" being worth reading.

They said devs should never have access to prod.
You need centres of excellence: peer pressure helps.
They have new bottlenecks: "two speed IT" .... the security team should be enablers not the police.
They mentioned the "improvement kata"
They said you need your ducks in a straight line == a backlog of good stories.

Gary Frost "Financial Institutions Carry Too Much Risk, It’s Time To Embrace Continuous Delivery"

of 51zero.com
Sarbanes-Oxley (SOx) was introduced because of risk in finance. Has it worked? No.
It brought about a segregation of duties and lots of change control review. "runbooks" This is still high risk. There have been lots of breeches from IT departments e.g. Knight Capital, NatWest (3 times).
Why are we still failing, despite these "safety measures"?
We need fully automated testing including security and performance. We need micro-services (and containers), giving us isolation.
Aside; architecture diagrams...! Are they helpful? Are they even correct? Why not automatically generate these too so they are at least correct?

What are the blockers? Silos. Move to collaborative environments.

Look out for new FinTech disruption (start-ups I presume)

Gustavo Elias "How To Deal With A Hot Potato"

He was landed with legacy code that was deeply flawed, had multiple responsibilities and high maintenance costs. In fact he calculated these costs and told management, For example, with downtime for deployment and 40 minutes to restarted calculate the cost at over £500 per day per dev.
How to change this?
  • Re-architect
  • Reach zero downtime
  • Detach from the old release cycle
How?
Re-architect with micro-services and the strangle-vine pattern.
Reach zero downtime with a canary release and blue/green deployment. You need business onside for the extra hardware.
Old release cycle: bamboo plan - but this needs new machines.
In the end, be proud.

Pete Marshall "Achieving Continuous Delivery In A Legacy Environment"

The tech architect at Planday (a shift work app)
C.D. in a legacy environment: and not "chaotic delivery".
Ask the question: "What are you business goals?"
They had DNS load balancing, "interesting stand-ups" (nobody cared), no monitoring.
He started a tech radar: goals to get people on board.
He used a corp screensaver to communicate the pipeline vision.
How easy is your code to build? Do you know what's actually in prod? Can you find the delta?
He changed nant to msbuild.
He became a test mentor, having half hour sessions to increase test coverage.
They had estimation sessions and planning sessions.
Teams started to release on their own schedule with minimal disruption to others. 
Logging, monitoring and alerting helped: look for patterns in the logs. n.b. loggly (though cloud based with no instance in Europe so might be slow)
He mentioned feature toggles (I wondered how he implemented these: please not boolean flags in a database, but enough of my pain), though watch out - you can still get surprises.
He used the strangle pattern.
Don't do loads of things: do a couple of things you can actually measure.
Ask yourself "What's the risk of failure?"

Sally Goble "What do you do if you don't do testing?"

From QA at The Guardian
They previously has a two-week release cycle, with a staging environment and lots of manual testing.
They deployed at 8am on a Wednesday. A big news day delayed the release cycle by a week. 
They couldn't roll back.
They moved to automated tests - perhaps selenium. They were mainly comparing pixels.
Then they threw them out.
So, what does QA do if it doesn't do testing? They now make sure they are "not wrong long." i.e. they can fix things quickly.
They have feature switching, canary releases and monitoring (but avoid noise).
They are not a testing department but a quality department. They can concentrate on other things - like less data so apps don't blow out users' data plans or similar.

Steve Elliott "Measure everything, not just production"

Laterooms: something about badgers.
Tools: log aggregation: elastic stack. Metrics: kibana, grafana. Alerting: icinga(2) [like nagios only prettier]
Previously dev/test was slow, had no investment. They had flaky tests and it was difficult to spot trends.
They moved to instrumentation and tooling in dev.
"Measure ALL the things"
Be aware that dashboard fatigue is a thing.
He pointed us at github
Have lots of metrics but don't used them to be Orwellian. Have data-driven retrospectives. (I once made a graph of who was asking who for code review to reveal cliques in our team - data makes a difference! And pictures more so.) He mentioned that you need to make space for feelings in the retrospectives too.
He suggested mixing up the format to keep retrospectives fresh: consider using http://plans-for-retrospectives.com/index.html

He said he was running sentiment analysis on the tweets he got during his talk. 

He mentioned that Devops Manchester is always looking for speakers.

Summary

I'm so glad I went. It's useful to see people talking about their successes (and failures) and to reflect on common themes. "People not resources" struck a deep note for me. I am always inspired when I see people trying to make things better, no matter how hard.
I loved the brief mention of stats in the keynote. The main themes were, of course, about measuring and automating. I will spend time thinking about what else I can measure and how to do stats and present them to non-statisticians in a clear way.
Never under-estimate the power of saying "Prove it" when someone makes a claim.