- My January 2016 C Vu magazine column was Bug Hunting (Part 2), concluding my series on finding and fixing software faults.
- My March 2016 column was Software Development Is..., an investigation of the finer details (the art, craft, science, and, well... gardening) of the programmer's world.
- My May 2016 column was Organised Chaos, a look at how the programmer can stay focused and organised.
The number of ways in which Maven, Surefire, Failsafe, Jacoco, Selenium and Jetty can be mis-configured is enormous.
I have explored this space and honestly this is the only one which worked!
JaCoCo UnitTest and IntegrationTest Configuration Example on github with results on a Maven generated github.io site.
Keynote: Jez Humble "What I Learned From Three Years Of Sciencing The Cr*p Out Of Continuous Delivery" or "All about SCIENCE"
SuverysSurveys are measures looking for latent constructs for feelings and similar - see psychometrics.
Surveys need a hypothesis to test and should be worded carefully.
Consider discriminant and convergent validity.
Test for false positives.
Consider the Westrum toypology.
With 6 axes (rows) scaled across three columns: pathological, bureaucratic, generative you can start spotting connections.
For example "Failure leads to" has three different options: scapegoating, justice or inquiry. Where does your org come out for each question? If they say "It's all Matt's fault" and sack Matt that won't avoid mistakes happening again. Blameless postmortems are important.
In general for surveys, use a Likert type scale - use clearly worded statements on a scale, allowing numerical analysis. See if your questions "load together" (or bucket). Maybe spotting what's gone wrong with some software buckets into notification from outside (customers etc) and notification from inside (alerts etc).
Consider CMV, CMB - common method variance or bias. Look for early versus late respondents.
In fact take this year's https://puppetlabs.com/blog/2016-state-devops-survey-here
Does your company have a culture of "autonomy, mastery, purpose"? What motivates us? [See Pink]
How do we measure IT performance? Consider lead time, release frequency, time to restore, change failure rate...
Going faster doesn't mean you break things, it actually makes you *more* stable, if you look at the data 
"Bi-modal IT" is wrong: watch out for Jez's upcoming blog about "fast doesn't compromise safety"
Do we still want to work in the dark-ages of manual config and no test automation?
We claim we are doing continuous integration (CI) by redefining CI. Do devs merge to trunk daily? Do you have tests? Do you fix the build if it goes red?
Aside: "Surveys are a powerful source of confirmation bias"
Question: Can we work together when things go wrong?
Do you have peer reviewed changes? (Mind you, change advisory boards)
Science again (well, stats)SEM: structured equation modelling: use this to avoid spurious correlations.
Apparently 25% of people do TDD - it's the lost XP practice. TDD forces you to write code in testable ways: it's not about the tests.
How good are your tests? Consider mutation testing e.g. Ivan Moore's Jester
Change advisory boards don't work. They obviously impact throughput but have negligible impact on stability. Jez suggested the phrase "Risk management theatre".
Ian Watson and Chris Covell "Steps closer to awesome"They work at Call Credit (used to be part of the Skipton building soc) and talked about how to change an organisation.
Their hypothesis: "You already have the people you need."
"Metal as a service" sneaked a mention, since some people were playing buzz-word bingo.
Question: what would make this org "nirvana"?
They started broadcasting good (and bad) things to change the culture. e.g. moving away from a fear of failure. Having shared objectives helped.
We are people, not resources. "Matrix management" (queue obvious slides) - not a good thing. Be the "A" team instead. (Or the goonies).
The environment matters. They suggested blowing up a red balloon each time you are interrupted for 15 seconds or more, giving a visual aid of the distractions.
They mentioned "Death to manual deployments" being worth reading.
They said devs should never have access to prod.
You need centres of excellence: peer pressure helps.
They have new bottlenecks: "two speed IT" .... the security team should be enablers not the police.
They mentioned the "improvement kata"
They said you need your ducks in a straight line == a backlog of good stories.
Gary Frost "Financial Institutions Carry Too Much Risk, It’s Time To Embrace Continuous Delivery"
It brought about a segregation of duties and lots of change control review. "runbooks" This is still high risk. There have been lots of breeches from IT departments e.g. Knight Capital, NatWest (3 times).
Why are we still failing, despite these "safety measures"?
What are the blockers? Silos. Move to collaborative environments.
Gustavo Elias "How To Deal With A Hot Potato"He was landed with legacy code that was deeply flawed, had multiple responsibilities and high maintenance costs. In fact he calculated these costs and told management, For example, with downtime for deployment and 40 minutes to restarted calculate the cost at over £500 per day per dev.
- Reach zero downtime
- Detach from the old release cycle
In the end, be proud.
Pete Marshall "Achieving Continuous Delivery In A Legacy Environment"
They had DNS load balancing, "interesting stand-ups" (nobody cared), no monitoring.
He changed nant to msbuild.
He used the strangle pattern.
Sally Goble "What do you do if you don't do testing?"From QA at The Guardian
They previously has a two-week release cycle, with a staging environment and lots of manual testing.
Steve Elliott "Measure everything, not just production"
He pointed us at github
Once you have Networking working there is still a long way to go.
yum groupinstall "Development Tools"
yum install kernel-devel
yum install kde-workspace
yum group install "X Window System"
yum groupinstall "Fonts"
yum install gdm
Now we can login without a GUI but startx when one is needed.
Installing Guest Additions
The guest Centos is a stock distribution, you have to tell it that it is inside VirtualBox.
Make the additions visible to the guest:
In the "Devices" menu in the virtual machine's menu bar, VirtualBox has a handy menu item named "Insert Guest Additions CD image", which mounts the Guest Additions ISO file inside your virtual machine.
yum install dkms
mkdir -p /media/cdrom
# Note change from /dev/scd0 in CentOS6
mount /dev/sr0 /media/cdrom
We are now able to move the mouse seamlessly between our guest and host and window systems understand each other.
Sharing files between the host and guest
In the host (Windows) create C:\vbshared and using the VirtualBox interface share this with the guest. In the guest:
mount -t vboxsf vbshared /vbshared
it will be visible as /vbshared/ from inside the guest.
The CentOS 7 iso does not enable networking during the installation, unlike Ubuntu. So your shiny new CentOS cannot get to the outside world.
Add the following to /etc/sysconfig/network-scripts/ifcfg-enp0s3
# Note this was set to no
TEST_CASE("simulation starting at 0 remains at 0", "[Property]")
Oh dear. If only we had some random magic to help. We need something that allows us to test that properties hold for a variety of cases. We don’t want to hand roll lots of ad-hoc test cases ourselves. If we generate random test cases we need the results to be clearly reported so we know what went wrong if something fails. We need property-based testing. Good news! Haskell got there long before us.
prop_RevRev xs = reverse (reverse xs) == xs
Main> quickCheck prop_RevRev
OK, passed 100 tests.
I hope this has sparked some excitement about new ways of testing your code. Next time someone asks “Unit tests or integration tests?” say “Yes, and also property-based tests”.
When starting a new project or joining an existing one there are a number of tools and features which should be in place. I have ordered them in order both of importance and the order in which the global community learnt the painful lessons that none of these are optional.
This is based upon Project initiation - a recipe.
Google it, ensure it is available as a url, check twitter.
If there is no README create it now!
The only decision is public or private. It will be a git repo.
If any other SCM system is in place convert to git before doing anything else.
Decide on git usage strategy: git flow, release branches, developer forks with feature branches and merge to master.
Do we really want to develop in Fortran under VMS? oh, OK.
Develop on the operating system you are deploying to. If you develop on OSX and deploy to debian it will bite you. Developing for Redhat using Windows should be made illegal.
Jenkins of course.
Track the code coverage, anything less than 100&percent; is not acceptable.
For legacy projects Sonar establishes a baseline, for new projects it holds the line throughout the projects life.
The closer to Continuous Deployment the fewer platform types are needed.
Metrics enable blue green deployment and A/B testing.
Issue tracking and work planning
Just you: gitthub, team: Jira
When your CI server is becoming too big to fail
This post was written when I was responsible for a heavily used CI server, for a company which is no longer trading, so the tenses may be a mixed
Once an organisation starts to use Jenkins, and starts to buy into the Continuous Integration methodology, very quickly the Continuous Integration server becomes indispensable.
The success of Jenkins is based upon its plugin based architecture. This has enabled Kohsuke Kawaguchi to keep tight control over the core whilst allowing others to contribute plugins. This has led to rapid growth of the community and a very low bar to contributing (there are currently over 1000 plugins).
Each plugin has the ability to bring your CI server to a halt. Whilst there is a Long Term Support version of Jenkins the plugins, which supply almost all of the functionality, do not have any enforced gate keeping.
A completely resilient CI service is an expensive thing to achieve. The following elements must be applied baring in mind the proportion of the risk of failure they mitigate.
Split its jobs onto multiple CI servers
This should be a last resort, splitting tasks out across slaves achieves many of the benefits without losing a single reporting point.
Split jobs out to SSH slaves
One disadvantage of using ssh slaves is that it requires copies of the ssh keys to be manually copied from the master server to the slaves.
Because jobs are initiated from master to the slave the master cannot be restarted during a job's execution (this is currently also true for JNLP slaves, but is not necessarily so).
The main disadvantage of ssh slaves is that by referencing real slaves they make the task of creating a staging server more complex, as a simple copy of the master would initiate jobs on the real slaves.
Split jobs out to JNLP slaves
This is the recommended setup, which we used eventually for most jobs.
Minimise Shared Resources
In addition to sharing plugins, and hence sharing faulty plugins, another way in which jobs can adversely interact is by their use of shared resources(disk space, memory, cpus) and shared services(databases, message queues, mail servers, web application servers, caches and indexes).
Run the LTS version on production CI servers
There are two plugin feeds, one for bleeding edge, the other for LTS.
Strategies for Plugin upgrade
Hope and trust
Up until our recent problem I would have said that the Jenkins community is pretty high quality, most plugins do not break your server, your ability to predict which ones will break your installation is small so brace yourself and be ready to fix and report any problems that there are. I have run three servers for five years and not previously had a problem.
Upgrade plugins one at a time, restart server between each one.
This seems reasonable, but at a release rate of 4.3 per day, seven days a week since 2011-02-21 even your subset of plugins are going to get updated quite frequently.
Use a staging CI server, if you can
If your CI server and its slaves are all setup using puppet, then you can clone it all, including repositories and services, so that any publishing acts do not have any impact on the real world, otherwise you will send emails and publish artefacts which interfere with your live system. Whilst we are using ssh slaves the staging server would either initiate jobs on real slaves or they too would need to be staged.
Use a partial staging CI server
You can prune your jobs down to those which are idempotent, ie those which do not publish and do not use ssh slaves, but the non-idempotent jobs cannot be re-run.
Control and monitor the addition of plugins
From the above it is clear that for a production CI server the addition of plugins is not risk or cost free.
Remove unused plugins, after consulting original installer
Plugins build up over time.
Monitor the logs
A log monitor which detects java exceptions might be used.
Backup the whole machine
Once a month restore from backup to a clean machine.
Store the configuration in Git
This process is only one element of recreating a server. Once a month restore from git to a clean machine.