Article Blog Image

Running Successful Engagements

Best Practices

Previously we discussed several types of engagement models that SRE can use when collaborating with software engineering teams, as well as their tradeoffs. Let’s go over some ways in which SRE managers or team leads can successfully start and run an engagement!

To refresh, an SRE engagement can take the form of: taking on operational ownership of a service from an engineering team, embedding SREs on an engineering team, or providing a set of services to R&D at large. Regardless of approach, its purpose is to improve the reliability and scalability of the service(s) that the engineering team owns.

Identify The Team’s POC

It’s important to establish and maintain a relationship with the person representing the team that you are serving. Depending on the organization, this person can be the engineering manager, the team lead, or the product manager, so long as they are the person responsible for work prioritization and accountable for the team’s business outcomes.

Do Some Initial Discovery

Before starting an engagement, gather information about the team’s operational posture, either via a structured process like a Production Readiness Review or even simply interviewing members of the team.

You may reveal additional needs or risks that weren’t discussed in previous conversations.

Create an Engagement Document

Clearly articulate the goals of the engagement in a document and get agreement from the team’s POC before it begins. Examples are:

  • All critical services meet production readiness criteria prior to launching to paying customers.
  • Reduce the team’s operational load by 50% within 6 months.
  • Reduce mean-time-to-recover(MTTR) for production incidents from 3 hours to 1 hour within 6 months.

Other aspects to consider:

  • Does the team need SRE fundamentals set up? (Observability, SLO, etc)
  • How many engineers should be allocated?
  • What meetings do SREs need to be added to?
  • Where in the PDLC is the team working? Are they preparing for a launch, or trying to address existing issues in a production service? Ensure that the goals align with where the team is trying to go.
  • Make sure that the engagement goals are measurable and feasible. The S.M.A.R.T. framework is useful for this.
  • Defining exit criteria for the engagement may be useful, especially if there are more engineering teams than SREs can serve. Set expectations up-front that SRE engagements don’t last forever, and that the team will need to demonstrate higher levels of operational maturity once it is over.

Note you are responsible for showing the business value that SRE is providing- an engagement document helps you articulate the expected return on investment.

Allow Time To Onboard

It will take time for SREs to gain context on the processes, technologies, and code that a particular team is responsible for. Set up a reasonable timeline to fully onboard before expecting them to successfully drive major initiatives.

Similarly, allow time for SREs to join the on-call rotation.

Continuously Poll for Feedback

Hold weekly/biweekly 1-1s with the following people:

  • The POC (is the engagement providing value, in their view?)
  • The SRE(s) (is the engagement sustainable? How is the collaboration with the team?)

Use active listening skills, track improvements identified in those conversations, and set expectations on how you will follow up on each one. Note that the feedback you will receive will be diverse and can require soft skills such as people management, negotiation, and conflict management.

Watch for Common Pitfalls

I have seen the following issues come up during SRE engagements. A common cause usually is that the team hasn’t yet cultivated a sense of ownership around operational matters.

  • Discussion of ‘SRE Work vs SWE Work’: Work with the POC to assign some feature work to SREs and operational work to SWEs. On embedded engagements, it’s important that SWEs and SREs become indistinguishable from each other over time by sharing in all types of work.
  • Oncall responsibility isn’t shared: Similarly, every member of the team should be in an on-call rotation.
  • Key SRE practices aren’t being followed: Practices like SLO enforcement can fall by the wayside if teams are facing undue pressures such as arbitrary deadlines.
  • SREs aren’t being included in team chats/meetings: in order for engagements to be successful, SREs need to be included in the team’s communications and collaborative tasks such as standups, retrospectives, design sessions, and roadmap planning.

Evaluate The Engagement Periodically

Ask yourself and the POC the following questions quarterly:

  • Is this engagement still more important than anything else the team can do?
  • Have business priorities changed? Should the engagement goals similarly change?
  • Is this engagement a healthy partnership?
  • Should the engagement model change?
  • Should the number of SREs change?

With that information, work with the POC to adjust the engagement document as needed. Don’t be afraid to end an engagement if it isn’t working out- that can sometimes happen.

Conclusion

These tips should give SRE managers and team leads food for thought on how to conduct a successful SRE engagement.

SRE is a full-contact sport. Need help getting in the game? Reach out for an introduction!

(Image credit: Fauxels)

Tags: