o11y

Assessing SLO Maturity

Events

Jul

10:00

Last year, I shared a framework for defining and using Effective SLOs- helping teams understand the health of their systems to guide real decision-making.

That’s great when you’re an SRE introducing SLOs for the first time. But what if you’re responsible for reliability across an entire organization? How do you assess whether a team’s SLOs actually set them up for success?

On August 20th at 14:00 ET, I’m teaming up with Nobl9 for...

System Call Tracing

Tools

Jun

16:00

I want to introduce one of the most powerful techniques in our arsenal when supporting production systems: system call tracing. But first: what is a system call?

Simply put, system calls are how programs interact with the operating system to request and manage resources like memory, files, network sockets, and hardware devices.

System call tracing allows you to observe the behavior of running processes and how they use those resources in real time.

Why is...

Incident Management: Monitoring

Best Practices

Apr

09:00

When running a production system, one of the main responsibilities is being able to respond when things go wrong. This is especially important for newly-launched or rapidly-changing systems where incidents are guaranteed, usually due to defect leakage or performance/scaling challenges.

Incident readiness typically involves the following capabilities:

Monitoring (a computer is aware of your system’s health)
An escalation path (when monitoring doesn’t work)
Alerting (how to notify when something breaks)
An on-call rotation (who...

Hidden Benefits Of SLOs

Best Practices

Feb

15:00

There are many articles online about Service Level Objectives(SLOs), particularly on the value they provide to customers as part of a Service Level Agreement(SLA).

Let’s discuss some of the benefits of SLOs that aren’t apparent at first glance.

Before we do, let’s quickly review the terminology from the source:

SLI: a service level indicator—a carefully defined quantitative measure of some aspect of the level of service that is provided.
SLO: is a service...

Read More

Observability In A Box

Tools

14

Feb

09:00

I believe we’re entering a golden age of observability- we can gather metrics from our applications and infrastructure, better interpret them with query languages and pretty dashboards, and get notifications in chatrooms and our oncall systems. All of this technology at our fingertips- without any software licensing fees!

The challenge I see with these new tools is that they tend to assume ‘cloud-native’ infrastructure- the happy path for setup and configuration usually requires a container...

Read More

CERTO MODO

tag

o11y

Assessing SLO Maturity

Events

System Call Tracing

Tools

Incident Management: Monitoring

Best Practices

Hidden Benefits Of SLOs

Best Practices

Observability In A Box

Tools