Since you are reading this post, I am sure that you can relate to the classic plight of the IT, sysadmin, or Operations team: They are invisible until things go wrong.

For practitioners of DevOps and Site Reliability Engineering, that can also be true, especially for teams where the low-hanging fruit has already been addressed.

When the big outage happens, it’s all too common for management to have the kneejerk reaction to ask questions like...