DevOps star Benjamin Wooton (@benjaminwooton) has published the latest installment of his DevOpsFriday newsletter – Insight from DevOps Thought Leaders – at http://devopsfriday.com/devops120413.pdf, including articles by David Mytton of @serverdensity, Matt Watson of @Stackify, Sandy Walsh (@TheSandyWalsh) and the RethinkDB team (@rethinkdb).
I contributed the following article on software operability and why it is so important for today’s software systems; it takes the form of an interview, with Benjamin Wooton asking the questions.
(Update: devopsfriday.com seems to be down, but Google has an HTML version of the PDF)
What is software operability and why is it important?
Operability is an engineering term concerning the qualities of a system which make it work well over its lifetime, and software operability applies these core engineering principles to software systems.
An operable software system is one which delivers not only reliable end-user functionality, but also works well from the perspective of the operations team. Such software has been built to operate successfully without needing application restarts, server reboots, load-balancer hacks, or any of the countless other fixes and work-arounds which operations teams have to use in order to make many business software systems work in practice on a daily basis.
Software systems which follow software operability good practice will tend to be simpler to operate and maintain, with a reduced cost of ownership, and almost certainly fewer operational problems.
Where did your interest in operability come from?
Early in my career I built software systems for MRI (brain) scanners and oil & gas exploration. Operability for such systems is essential; it’s no use building an MRI scanner which can produce 3D brain images if it needs rebooting after taking every second image. Likewise, it was cheaper to drill a new oil well than to extract a faulty down-hole pressure gauge; these systems had to operate reliably with minimal human intervention. Since then I have too often seen the negative effects of operational features being dropped before go-live, which usually results in significant operational costs and more incidents in Production.
There is no good reason in 2013 why businesses should put up with (and pay for) second-rate software which needs arduous human attention every few hours or days just in order to maintain normal operation. In my experience, most modern business software is simple enough (at a systems level at least) that we can significantly reduce operational cost and downtime by introducing software operability as a key concern for software product delivery teams. Ultimately, it’s about lower cost of ownership, better engineering, and fewer late nights debugging flaky software!
What are some of the low hanging fruit a software team can tackle to make their software more operable?
The best thing a software team can do to make their software more operable is to write a draft operation manual alongside feature development. The operation manual (aka ‘run book’) eventually contains the full details of how the software system is operated in Production. By writing a draft operation manual, the software team can demonstrate to the operations folks that either all the major operability concerns have been addressed or that some operability criteria are beyond the expertise of the software team, but at least there will be no ‘nasty surprises’ when the software is put into operation.
The act of having to think about things like backups, time changes, health checks, and clear-down steps in the context of their software tends to mean that the software team members will implement small but crucial changes to the software to provide ‘hooks’ for monitoring, alerting, backups, failover, etc., which improve the operability of the software.
Beyond that, what would represent higher level of operability?
Software with a high level of operability is easy to deploy, test, and interrogate in the Production environment. Highly operable software provides the operations team with the right amount of good-quality information about the state of the service being provided, and will exhibit predicable and non-catastrophic failure modes when under high load or abnormal conditions. Systems with good software operability also lend themselves to rapid diagnosis and simple recovery following a problem, because they have been built with operational criteria as first-class concerns.
How do you make the case for operability when the main business focus is usually on features?
I think one of the most important changes to make is to stop using the term “non-functional requirements” for things like performance and stability requirements; instead, use the term “operational requirements”, or even better, “operational features”, and include these in the product backlog alongside end-user features. This gets away from the artificial (and unhelpful) contrast of “functional” vs. “non-functional” requirements, and helps to communicate to the business that the operational aspects of the software also require specific features if the business requirements are going to be met.
A useful approach (discussed at the excellent DevOpsDays 2013 event in London) is to make the product owner responsible not only for feature delivery but also operational success of the software; after a few early morning Priority 1 call-outs due to the application servers needing a restart, the product owner will probably start to realise the importance of operational features!
Making any operational problems more visible is also crucial. If the operations team needs to restart the app servers every night, make this visible, and include the product owner or business sponsor in the email notifications – every day. Draw analogies with systems familiar to the product owner: if they had to have their car fixed by a mechanic every two days, they’d soon either buy a new car or pay to have the faulty part replaced. So, don’t hide the effort which you’re expending on keeping their software product running; make sure they see the cost (and the pain!).
Where should we look for further information on operability?
A good starting point to learn more about software operability is the excellent book Patterns for Performance and Operability by Ford, Gileadi, et al. (ISBN 978-1420053340), which explains the core concepts and works through several real-world examples. In the 1980s and 90s the US space agency NASA did some really useful work on operability as part of the space shuttle programme, and much of the research is available online; Richard Crowley’s talk on Developing Operability at SuperConf 2012 is also worth reading and understanding (http://rcrowley.org/2012/02/25/superconf.html). I recently began a blog at softwareoperability.com which I plan to turn into a book in late 2013 or early 2014 to help software teams get to grips with software operability.
It’s worth saying that teams with a DevOps approach will generally produce systems with better operability than teams split into the traditional Dev + Ops silos. I’m approaching software operability from this siloed world of Dev + Ops, mainly because this is where most organisations still are today, and in fact I hope that by gaining a better understanding of software operability, many engineering teams will move instinctively towards a DevOps model.
More info: http://softwareoperability.com/, @Operability and #operability on Twitter.
Matthew blogs at http://matthewskelton.net/ and is on Twitter at @matthewpskelton.