Media

DevOps Lessons From a Meta SRE

Media

25

Feb

18:00

I sat down with Better Stack’s podcast to discuss the AI velocity -> operational chaos thesis, my time at Meta, my SRE career in general, and a few tales from my life on the road!

Thanks Better Stack for an action-packed conversation!

Youtube: link
Spotify: link

SRE is Ops With Boundaries

Media

01

Feb

16:00

Join fellow SRE Paige Cruz and yours truly on an exploration of my history being on-call, using multiple generations of observability tools, and how to make the experience as painless as possible.

One of the major points of the discussion is how SRE sets boundaries around taking on-call burden on behalf of engineering teams in contrast to classic IT Operations teams.

I also share a funny story about the last page I received while...

New Podcast: Reliability Rebels

Media

10

Nov

10:00

I’m pleased to announce that I’ve launched a new podcast: Reliability Rebels!

Over the past couple years, I’ve had the privilege to be a guest on several tech podcasts (and will continue to do so), however I decided to create my own.

(And yes, I produced the intro music!)

I wanted to explore how people in tech sometimes have to challenge the status quo to improve their systems, as that was definitely my experience across...

Thoughts On The First SEV0 Conference

Media

26

Sep

14:00

As systems grow larger and more complex, mastering incident response isn’t just a necessity— it’s critical for a tech company’s survival.

SEV0, hosted by incident.io in San Francisco a few days ago, tackled this head-on, bringing together thought leaders and practitioners to share best practices, hard-earned lessons, and bold new ideas in the world of incident management.

As you know, I’m pretty obsessive about the end-to-end process of incident response, so of course...

Effective SLOs Workshop

Media

16

Sep

20:00

A few months back, I presented the Effective SLOs webinar, where we discussed how to select, implement, and iterate on Service Level Objectives (SLOs)— a cornerstone of how we ensure the reliability of our systems.

(If you haven’t seen the recording yet, you can access it here.

Today, I’m excited to announce the release of a companion workshop, which is available for download.

This workshop offers hands-on experience, guiding participants through the...

An Open Letter To Product Management

Media

05

Apr

13:00

Hey, product managers!

I’m an engineer. We need to talk! (I promise not to spout technical jargon at you.)

Let’s be honest: our two groups don’t see eye to eye as much as we should. Perhaps now is a chance to change that!

To start, we (as engineers) understand that your job is to take the product’s vision (informed by customer desire) and bring it into reality. We get that it can be...

Slight Reliability Ep82: CI/CD

Media

13

Feb

14:00

Another appearance on the Slight Reliability Podcast! This time we go over the basics of CI/CD, change management, my experience running a Change Advisory Board(CAB), testing in prod, and how to treat your test/deploy infrastructure!

Slight Reliability Ep70: Meta SRE

Media

08

Oct

16:00

I return to the Slight Reliability Podcast to discuss my experience in Meta’s Production Engineering… and tell a story about how I almost burnt down a server room early in my career! Don’t miss this one!

Video: Beating Big Tech Coding Interviews

Media

23

Aug

16:00

On Aug 19th I presented this talk at the monthly Vegas Programmers Meetup. This is an excellent followup to post “How to Get an SRE Role” as it goes in-depth on how to prepare for one of the most difficult parts of the process.

(Image Credit: This is Engineering)

Podcast Appearance: All Things Ops

Media

18

Aug

07:00

Another podcast! This week I’m a guest on All Things Ops from CheckMK!

(I used CheckMK years ago as it provided an improved interface and plugin system over stock Nagios.)

Host Elias Voelker and I discussed:

What makes the perfect Site Reliability Engineer?
The reasons for and benefits of a DevOps transformation
The most important tools for modern Site Reliability Engineering
Real behind-the-scenes stories of major outages

One of my most...

Podcast Appearance: Day Two Cloud

Media

20

Jul

07:00

I’m continuing my tour as a guest on tech podcasts! This time I’m on the Day Two Cloud podcast from Packet Pushers which focuses on the realities of cloud adoption.

I really enjoyed the conversation with hosts Ned Bellavance and Ethan Banks, who were both very insightful and funny!

Don’t miss this one as it was an action-packed discussion! Together, we covered:

What it means to be an SRE
How an SRE differs...

Podcast Appearance: Slight Reliability

Media

12

Jul

20:00

Another podcast guest appearance! This time I’m on the Slight Reliability podcast, which answers “what is site reliability engineering (SRE) really about?”.

(I’m on the road this week! Next week we’ll return to our usually-scheduled articles.)

In this episode, host Stephen Townshend and I cover a lot of ground including making ops work visible, measuring toil, the power of calculating the monetary value of work, getting developers on-call, the embedded model for SRE, SLOs,...

Podcast Appearance: Practical Operations

Media

28

Jun

10:00

This week I’m a guest on the Practical Operations podcast, which focuses on “systems, operations and scaling with a focus on real world use cases and solutions to common problems”.

We discuss my experience in DevOps transformations, running a Site Reliability Engineering team, and my experience as a consultant!

Episode 137 - Amin Astaneh

I highly recommend following this podcast as the hosts are very knowledgeable and are really entertaining to listen to!

...

Video: SRE, Demystified

Media

05

Jun

10:00

On May 30th I presented this talk at the monthly Boston DevOps Meetup. It serves as an excellent introduction to the ideas and practices behind Site Reliability Engineering and provides food for thought when starting your own team. Enjoy!

(Image Credit: Kelvin Augustinus)

CERTO MODO

category

Media

DevOps Lessons From a Meta SRE

Media

SRE is Ops With Boundaries

Media

New Podcast: Reliability Rebels

Media

Thoughts On The First SEV0 Conference

Media

Effective SLOs Workshop

Media

An Open Letter To Product Management

Media

Slight Reliability Ep82: CI/CD

Media

Slight Reliability Ep70: Meta SRE

Media

Video: Beating Big Tech Coding Interviews

Media

Podcast Appearance: All Things Ops

Media

Podcast Appearance: Day Two Cloud

Media

Podcast Appearance: Slight Reliability

Media

Podcast Appearance: Practical Operations

Media

Video: SRE, Demystified

Media