Launch/Operate Products With Confidence!


Being able to run a reliable product and infrastructure is an essential skill.

Sometimes even large and well-funded organizations don’t get it right:

  • The total cost of building Healthcare.gov was $1.7B, but only successfully served 1% of potential signups during its first week in production.
  • Southwest Airlines’ revenue losses from their ‘meltdown’ during the 2022 holiday season stranding 100k+ passengers is estimated between $500M-$1B.
  • Atlassian’s major incident in April 2022 prevented 775 customers from accessing any of their business-critical cloud services for up to two weeks.

In these incidents, the products and the teams supporting them were not prepared to handle the challenges that a product launch or an unplanned event presented. This came at a high cost in terms of time, money, and reputation, but more importantly- it had real impacts on the people that depended on these services to work.

These major incidents are really painful for large entities like government agencies, airlines, and major SaaS vendors, but can be fatal for smaller businesses.

The good news is that we can learn from their mistakes and build resilient systems supported by simple yet effective processes- especially if you have an experienced guide to help you and your team along the way.

Reliability Fundamentals is a short-term engagement where I directly collaborate with your engineering/IT team to establish the basics for running a reliable service.

This is based on years of experience running successful SRE engagements with teams in Big Tech and Enterprise SaaS. It will consist of the following activities:

  • Develop the initial version of Service Level Objectives for monitoring the product’s health and identify the necessary engineering work to support it.
  • Establish a mature end-to-end incident response process based on effective alerting, sustainable on-call rotations, and blameless postmortems.
  • Create a clear and consistent change management process to enable a fast flow of features to production while minimizing risk and impact of self-inflicted incidents.
  • Perform a Production Readiness Assessment to reveal operational strengths as well as improvement opportunities that are product and team-specific.

This foundation creates a proven mechanism to remove sources of incidents that erode customer trust and your ability to build and ship the features that create a competitive edge.

Get started building more resilient products- for a single flat fee! ($5000)

Contact me to schedule an introduction!