Article Blog Image

Thoughts On The First SEV0 Conference

Events

As systems grow larger and more complex, mastering incident response isn’t just a necessity— it’s critical for a tech company’s survival.

SEV0, hosted by incident.io in San Francisco a few days ago, tackled this head-on, bringing together thought leaders and practitioners to share best practices, hard-earned lessons, and bold new ideas in the world of incident management.

As you know, I’m pretty obsessive about the end-to-end process of incident response, so of course...

Article Blog Image

Webinar: Effective SLOs

Events

Let’s get real: Service Level Objectives are hard to get right. They are indeed a transformative technique in making services reliable, however there are many potential pitfalls and antipatterns when implementing them that can lead to frustration. Let’s explore several that I’ve observed in my career!

Common SLO Pitfalls

SLOs can be hard to explain

It can be a challenge to clearly articulate what they are and the value they provide, and quoting...

Article Blog Image

Webinar: Lean SRE

Events

When we think about Site Reliability Engineering, we tend to associate it with large tech companies that have the budget to build entire departments to improve production. I think that smaller organizations and startups sadly avoid adopting these practices due to that misconception.

I argue that SRE can be implemented by much smaller companies and yield significant benefits in reduced operational costs and time savings, freeing them to build a more compelling product.

Nothing is...