The book Release It! by Michael Nygard (@mtnygard) is essential reading for anyone concerned with the operability of software. “What about the tl;dr version?”, you ask. There is no tl;dr version of Release It! – it’s all hugely valuable, so if you’re serious about software operability, read the whole book.
Once you’ve done that, here are some page numbers for quick reference which relate to software operability:
- p.212 – Multiple NICs and multiple IP addresses
- p.240 – Keeping configuration out of the application and in version control
- p.252 – The importance of human eyeballs on monitoring systems
- p.261 – Recovery-Oriented Computing (and by implication MTTR)
- p.263 – Sensing changes in the application
- p.267 – The importance of data trends
- p.274-281 – Logging, including logging levels, log message formats, and log message semantics
- p.318-322 – The architecture of the organisation, and how this affects operability
Arguably the most important theme of Release It! in terms of software operability is that we should treat logging and metrics as first-class cross-functional aspects of our applications. We can write all the fancy circuit-breaker or exponential backoff code we like, but if the system operators do not know what is happening, the system as a whole is not operable.