page icon

SRE Maturity Assessment

Background of the initiative

Physically, it's difficult to embed SRE across all products, so we were looking for a way to promote SRE across the organization. Also, because we lacked data and metrics to get an overview of the whole picture, we weren't able to allocate resources efficiently as an organization, and we often fell behind in risk management. To solve these problems, we developed an SRE maturity assessment tool.

What is SRE maturity assessment?

This was created based on an integrated capability maturity model to provide an overview and data-driven representation of the entire business unit.
Furthermore, we have created a list of necessary items based on factors such as service reliability faults, and have kept it as simple as possible to make evaluation easier.
図1. 成熟度概要
Figure 1. Overview of Maturity Levels

What can be achieved through SRE maturity assessment?

By utilizing SRE maturity assessments, you can promote SRE across the entire organization (including enabling it).
Furthermore, knowing your current position makes it easier to create improvement plans and move closer to the ideal state for the product.

SRE Maturity Assessment Process

The SRE maturity assessment is conducted in four main steps.
  1. preparation
  1. Evaluation and planning
  1. Improvements implemented
  1. Looking back

1. Preparation

This section explains the concept, usage flow, and Level 3 guidelines for conducting an SRE maturity assessment.
  • Level 3 Guidelines
    • This section consists of questions designed to guide the consideration of best practices for each item.
    • Ideal state for each product = Lv.3
    • The ideal state differs for each product, so it's not necessary to meet all of them.

2. Evaluation and Planning

Using Level 3 for each item as a reference, we will align our understanding of the current maturity level and ideal state for each item. (※ Level 3 for each item will be shared around June 2023.)
Once everyone is on the same page, the final step is to create an improvement plan. First, create a quarterly improvement plan, and then organize the action items and their owners. We also recommend prioritizing the creation of improvement plans for monitoring, incident response, and post-mortems if they are at Level 1.
図3. SRE成熟度評価シート
Figure 3. SRE Maturity Assessment Sheet
図4. SRE成熟度改善計画書
Figure 4. SRE Maturity Improvement Plan

3. Implementation of improvements

We will improve the maturity level of each item by leveraging knowledge gained from other services. We also provide readily usable templates for post-mortems and incident response.
図5. ナレッジデータベース
Figure 5. Knowledge Database

4. Reflection

After implementing improvements, conduct a review every quarter or semi-annually and revise the improvement plan. Initially, conduct a review every quarter, and if the operational burden is high, it is better to do it every six months.

Results obtained from SRE maturity assessment

  • The data now allows us to get an overview of the entire business unit.
  • This makes it easier to determine which products and areas for improvement should be prioritized for resource allocation.
  • I was able to learn about internal practices that I wasn't aware of.