HOME/Technical Articles/The importance of SRE was reaffirmed in the large-scale voting service "WINTICKET"

The importance of SRE was reaffirmed in the large-scale voting service "WINTICKET"

2024/12/3 19:362024/12/6 11:29

Mr. Hasegawa (Service Reliability Group (SRG) of the Media Headquarters)@rarirureluis)is.

#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.

This article will introduce the benefits I gained from participating in WINTICKET as an Embedded SRE.

This article isCyberAgent Group SRE Advent Calendar 2024This is the 6th day's article.

SRE x Sales Increased Customer Lifetime Value (LTV)Improved customer satisfaction through system reliability Correlation between SRE and sales SRE benefits gained from WINTICKET Ease of collaboration with businesses Potential deterioration in service quality can be visualized WINTICKET Service Introduction self-introduction Embedded SRE the purpose To do SRE in another department SRE and Monitoring Personal monitoring and alerting tools Alert maintenance Unified alert system Grafana study session Toil reduction Cost reduction communication WINTICKET system configuration Monitoring architecture Before After Implementing SLI/SLO CUJ Identifying SLIs/SLOs Adding and Implementing SLOs SLI is good quality Metrics used for SLI Availability and Latency SLO Which is better?Target Window Error Budget Error budget operations How to calculate the error budget Error Budget Burn Rate fast burn rate slow burn rate Latency is slow only, availability varies depending on the nature Burn rate alert at midnight If you want a more accurate burn rate When the error budget is depleted SLI/SLO penetration What to visualize Conducting study sessions Review SLOs with the whole team every two weeks How to review on WINTICKET In fact, the quality of service is getting worse every day.Why has SLI/SLO become so widespread?Conclusion: The future and ideals of SLI/SLO The need for business penetration and a culture where those who improve SLOs are praised Chat Embedded SRE? Enabling SRE?

SRE x Sales

For those who are unfamiliar with SRE, I will first explain the benefits of doing SRE.

The adoption of SRE contributes directly and indirectly to increased salesSpecifically, it will have a positive impact on sales in the following ways:

Increased Customer Lifetime Value (LTV)

SREs promote customer satisfaction by improving service stability, which can significantly improve LTV (customer lifetime value), especially for subscription-based services like SaaS.

Improved customer satisfaction through system reliability

Stable system operationThis will help you gain customer trust and increase your sales in the long term. In particular, the following factors influence your sales:

Reduced service downtime

Improved system response performance

Reduced error rate

Improved profitability through cost reduction

SRE is not just a system operation method,A strategic approach that directly and indirectly contributes to increasing corporate salesis.

If you happen to come across this article in a business role, please support any engineers who are trying to introduce SRE.

Correlation between SRE and sales

I think many people want to correlate the effectiveness of SRE with quantitative figures (sales). I thought the same thing, but it was quite difficult and I gave up.

Specifically, there is a correlation between "high sales = high server load = worsening SLO."

There are papers overseas that say SRE contributes to sales.

User-Engagement Score and SLIs/SLOs/SLAs Measurements Correlation of E-Business Projects Through Big Data Analysis

The Covid-19 crisis lockdown caused rapid transformation to remote working/learning modes and the need for e-commerce-, web-education-related projects development, and maintenance. However, an increase in internet traffic has a direct impact on infrastructure and software performance. We study the problem of accurate and quick web-project infrastructure issues/bottleneck/overload identification. The research aims to achieve and ensure the reliability and availability of a commerce/educational web project by providing system observability and Site Reliability Engineering (SRE) methods. In this research, we propose methods for technical condition assessment by applying the correlation of user-engagement score and Service Level Indicators (SLIs)/Service Level Objectives (SLOs)/Service Level Agreements (SLAs) measurements to identify user satisfaction types along with the infrastructure state. Our solution helps to improve content quality and, mainly, detect abnormal system behavior and poor infrastructure conditions. A straightforward interpretation of potential performance bottlenecks and vulnerabilities is achieved with the developed contingency table and correlation matrix for that purpose. We identify big data and system logs and metrics as the central sources that have performance issues during web-project usage. Throughout the analysis of an educational platform dataset, we found the main features of web-project content that have high user-engagement and provide value to services’ customers. According to our study, the usage and correlation of SLOs/SLAs with other critical metrics, such as user satisfaction or engagement improves early indication of potential system issues and avoids having users face them. These findings correspond to the concepts of SRE that focus on maintaining high service availability.

https://www.mdpi.com/2076-3417/10/24/9112

I think it would be quite difficult to do the same analysis as this paper.

So what motivates you to do SRE?

"We can proactively address the visibly deteriorating quality of service."is.

It's quite motivating when you can see the service quality visibly deteriorating.

SRE benefits gained from WINTICKET

First, I'd like to talk about the benefits of being an SRE at WINTICKET at this stage.

Ease of collaboration with businesses

SLI/SLO is very useful if you want to immediately share whether WINTICKET is affected when an external failure occurs.

ビジネスからの影響度に、SLI/SLO を共有することで連携がしやすい図 — Sharing SLI/SLO based on business impact makes it easier to collaborate

Potential deterioration in service quality can be visualized

An alert was triggered for the error budget burn rate, and when we looked at the metrics, we saw that the service quality was gradually deteriorating and the error budget was being consumed rapidly.

今回 SLO 導入を一緒に進めている WINTICKET 最強エンジニアの1人 @taba2424 — One of the strongest engineers at WINTICKET who is working with us on the SLO implementation@taba2424

This is not just limited to WINTICKET, but being able to see this kind of potential deterioration in service quality is something that would not be noticed without using SLI/SLO, and I think this is one of the benefits of being an SRE.

WINTICKET Service Introduction

WINTICKETwas released in 2019 as an internet betting service for publicly managed Keirin and Auto Races. The service's features include the ability to bet while watching race footage, and an extensive database of WINTICKET's own original data, including AI predictions and EX data.

We also offer functions linked to ABEMA's Keirin and Auto Race channels. WINTICKET became the number one Keirin betting service about two years after its release, and is still growing today.

self-introduction

Media Headquarters Service Reliability Group (SRG)@rarirureluis is.

The reason I introduced myself here is that I am not affiliated with WINTICKET.

In this Advent Calendar, as part of SRG, I would like to introduce my work as an Embedded SRE for another team, WINTICKET.

Embedded SRE

This is an activity (Enabling SRE) in which Site Reliability Engineers instill the culture and knowledge of SRE (Site Reliability Engineering) within development organizations, enabling developers themselves to practice SRE practices.

I'm on the SRG team, so I'm an Embedded SRE.

There is also the term "Enabling SRE," but I think it's roughly synonymous. (By the way, when I searched for "Enabling SRE" overseas, I didn't get any hits.)

the purpose

Spreading SRE culture and knowledge to product development teams (culture building)

Support developers to independently implement SRE practices (cultivating a culture)

Improving Service Reliability (SRE)

The ultimate goal of Enabling SRE is to develop members within each product team who can autonomously practice SRE.

To do SRE in another department

When I joined as an SRE, instead of just talking about SRE right away, I did various things to gain the trust of the service team.

Additionally, there is a possibility that the service side may become fatigued before it can reap the benefits of SRE, so the goal is to alleviate this somewhat by building up its own reliability first.

When I join a new team, my tasks are to set up monitoring and alerts, reduce Toil, and communicate.

SRE and Monitoring

The reason for setting up monitoring and alerting first has to do with SRE.

When the error budget runs out, there is no way to investigate without a proper monitoring environment.

That's why Monitoring is the foundation of the pyramid diagram you often see.

Personal monitoring and alerting tools

WINTICKET uses Google Managed Prometheus and Grafana, but I personally found Datadog easier to use.

Before joining WINTICKET, I used Datadog for a service called DotMoney. Its graphical UI and wide range of ways to define SLIs (log-based, SLIs that treat latency and availability equally) are features that Cloud Monitoring does not have.

I tried SRE for a service as an Embedded SRE - CyberAgent SRG #ca_srg

This is Hasegawa (@rarirureluis) from the Service Reliability Group (SRG) of the Technology Division. #SRG (Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new services, and contributing to OSS. This article is about the horizontal axis

https://ca-srg.dev/post/first-embedded-sre

Although I haven't used them in actual operations, I got the impression that SigNoz, which is open source software, and Grafana Cloud are easy to use if you are okay with small-scale deployments or SaaS.

APM and SRE with OSS SigNoz - rarirureluis

This article is the 23rd article of the CyberAgent Group SRE Advent Calendar 2023

https://rarirure.rip/archives/1276

💡

Although the article states that it cannot be used from an SRE perspective, a recent update has made it possible to use Range Vector Selectors like those found in PromQL.Can it be used for SRE purposes?I think so.

Alert maintenance

Initially, the alert environment was in place, but there were several issues. The alert tools were split between Cloud Monitoring and Self-hosted Alertmanager, and the same alerts were defined in both, or unnecessary alerts were included. This meant that alert maintenance was insufficient, reducing the effectiveness of fault detection.

So we worked on the following two points.

Unified alert system

The alert tool was unified with Grafana, which was already being used as a visualization tool.

Review alert definitions with your team

List all alerts and align the necessity and importance of the alerts with team members

These efforts have resulted in a simpler and more effective alert structure and improved monitoring practices.

Unified alert system

Since we are already using Grafana, we decided to unify our alert system with Grafana as well.

Cloud Monitoring's alerting system is less flexible than Grafana's.

There were nearly 400 alerts to migrate, and we had to select which ones to keep and which to discard.

Delete alert rules that can be supplemented with SLI/SLO, reduce the number of alerts that require on-call...

Grafana's label-based alert rules (Notification Policy) are intuitive, and you can easily make them on-call by simply labeling only the alerts you want to handle.

In addition to the above, we gained four other benefits.

WINTICKET also uses AWS, making it easy to monitor using Grafana.

You no longer need to set up Alertmanager (GKE) with port forwarding.

Client team alerts can now be managed with Grafana

Added flexibility to alert notifications

Grafana Notification Template による情報の整理と出力 — Organizing and outputting information with Grafana Notification Templates

Grafana study session

The purpose of this workshop is to provide knowledge about Grafana to the team and enable them to easily add alerts from Terraform.

That said, an ulterior motive is to increase my profile as an SRE within my team.

Toil reduction

Reducing Toil is a quick way to gain credibility.

for example,

Completely unmaintained IaC

Long-running deployment flow

is.

Reducing toil allows you to understand the infrastructure configuration and deployment flow of a service, so if you find toil, you're lucky. Even if you don't find any, you can still understand the configuration, so it's worth trying.

Cost reduction

I think reducing costs is also a quick way to gain trust.

The advantage over Toil reduction is that it can be easier in some cases and more effective because the cost savings are good news for business teams.

communication

Participate in daily server team evening meetings, regular events, drinking parties, etc.

It's small, but it adds up.

WINTICKET system configuration

As an overview, this is the WINTICKET system configuration diagram. (In reality, it is multi-regional and quite large-scale.)

Monitoring architecture

The monitoring architecture has been unified to Grafana, so Alertmanager has disappeared and it has become simpler.

Before

After

Implementing SLI/SLO

When it comes time to actually implement SLI/SLO, we will work with engineers on the team who are interested in SLI/SLO.

You can work alone, but having someone who knows the team's situation well next to you can help things go a little more smoothly, so if possible, it's best to work with a team.

Not only will the process proceed more smoothly, but the members will also gain knowledge about SLI/SLO, which will be more effective when it comes to spreading the knowledge throughout the team.

This time@taba2424We proceeded as follows.

CUJ

WINTICKET was already operating SLI/SLO in its app, so we reflected that on the server side as well.

WINTICKET Flutter app SRE for Mobile initiative | CyberAgent Developers Blog

Introduction I'm Takuma Nagata (@ostk0) a native application engineer at WinTicket Inc.

https://developers.cyberagent.co.jp/blog/archives/41308/

If you would like to know more about CUJ, please read this article.

I tried SRE for a service as an Embedded SRE - CyberAgent SRG #ca_srg

https://ca-srg.dev/post/first-embedded-sre#block-50e3db02c0694ebd9236bc10a61e9e7a

Identifying SLIs/SLOs

Once the CUJ is decided, we will begin to identify the SLI.

We will summarize the current latency and error rate in a spreadsheet like the one below and organize them accordingly.

In situations like this, it goes more smoothly if you have someone on your team with you.

WINTICKET 最強エンジニアの1人 @taba2424 作 — One of WINTICKET's strongest engineers@taba2424 Made by

Adding and Implementing SLOs

Once the identification is complete, we will actually implement the SLO.

We use Cloud Monitoring's SLO feature and operate the SLO dashboard with Grafana.

Service Monitoring Concepts | Google Cloud Observability

The Service Monitoring and SLO API helps you manage your services similar to how Google manages its services. The key concepts of the Service Monitoring feature are:

https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring?hl=ja

First, let me introduceCloud Monitoring dashboard can only display 100 SLOs per service(Actually, more than 100 can be registered.) Therefore, we visualize the SLOs using Grafana.

Server metrics are also visualized using Grafana, which makes it easy to avoid having to switch between tools.

SLI is good quality

When you operate a service, you may encounter access from crawlers or malicious users.

By playing such requests in advance,High-quality SLOcan be measured.

WINTICKET uses Cloud Armor to take various measures to prevent malicious requests from reaching its microservices.

Specifically, we use rate limiting and Google Cloud Armor's pre-configured WAF rules to detect and quickly reject malicious requests. This prevents inappropriate requests from being detected, maintaining the health of the system and further improving the reliability of SLIs/SLOs.

Metrics used for SLI

The metrics used as SLIs at WINTICKET are as follows:

Prometheus metrics emitted by microservices

GCP metrics collected by Cloud Monitoring

Availability and Latency SLO

WINTICKET's SLI/SLO sets latency and availability as independent SLOs.

Personally, I think this is common in the world, but you can also combine latency and availability and measure them as a single SLO.

In fact, DotMoney, which I joined before WINTICKET mentioned above, uses synthesized SLOs.

Cloud Monitoring's SLOs do not allow you to combine multiple metrics like Datadog does.

Which is better?

I think you should also take into account the characteristics of the service, but generally it's easier to keep them separate.

Together: Easy to manage

Separate: Availability and latency are separate, making it easier to investigate.

If they are separate, the number of SLOs to manage will increase.GranularityBecause it's low, it's much easier when reviewing SLOs.

Target Window

The Target Window is determined by the regular frequency and development cycle.

The deployment cycle for WINTICKET itself is approximately once a week, but the Target Window is set to 30 days, and SLO review meetings are held with the entire team once every two weeks.

Ideally, if your service deployments are every Wednesday, you could set up regular SLO meetings every Thursday with a one-week Target Window, allowing you to discuss changes to SLOs and error budgets due to feature releases.

However, frequent reviews are unlikely to make users realize the effects of SLO, so we deliberately set the review period to be longer to identify potential degradation and make users feel that it was a good idea to implement SLO.

Also, considering how things will behave when the error budget runs out, which I will explain in the next section, I think it might be quite difficult to set the Target Window to one week.

Error Budget

Error budgets are derived from SLOsAcceptable loss of reliabilityis.

日々増減するエラーバジェットの図 — A diagram of the error budget, which fluctuates daily

Servers may temporarily violate their SLO due to high database load, etc.

It's an idea of how much of that violated SLO you can tolerate.

Error budget operations

While it may be praised that the error budget is not being consumed, in other words, it can be seen as "is it because there are fewer deployments?" or "are they not taking on technical challenges?"

The error budget is also the budget for "technical challenges" allocated to the Target SLO.

If you have excess error budget, tighten your SLO to ensure it is exactly 0% of the target window.

How to calculate the error budget

The error budget is calculated from the SLO target values.

For example, if your SLO is 99.9%, your error budget is 0.1%. If your Target Window is 30 days (43,200 minutes), your error budget equates to 43.2 minutes of downtime.

Specific calculation example:If the SLO is 99.9%
SLO: 0.999
Target Window: 30 days (43,200 minutes)
=(1−0.999)×43,200=0.001×43,200=43.2

Error Budget Burn Rate

Burn rate is a term coined by Google that is a unitless value that indicates how quickly your error budget is consumed relative to the target length of your SLO. For example, if your target is 30 days, a burn rate of 1 means that at a constant rate of 1, your error budget will be completely consumed in exactly 30 days. A burn rate of 2 means that at a constant rate, your error budget will be depleted in 15 days, and a burn rate of 3 means that your error budget will be depleted in 10 days.

Datadog's documentation on burn rate is easy to understand.

Burn Rate Alerts

Use monitors to alert on SLO burn rates

https://docs.datadoghq.com/ja/service_management/service_level_objectives/burn_rate/

WINTICKET sets alerts on burn rate for one SLO with 5-minute and 1-hour time windows.

These will be referred to as fast burn rate and slow burn rate, respectively.

fast burn rate

The purpose is to detect if the issue fires after a release, indicating that the quality of the service has been degraded due to the changed code.

Although it is not yet defined in the flow within WINTICKET, we would like to make it possible to use it as a criterion for deciding whether to roll back after a canary release.

slow burn rate

This is an important warning sign that indicates a chronic deterioration in the system's service quality, rather than a sudden failure like a fast burn rate.

It does not require immediate action, but it is an indicator that should be continually monitored.

Latency is slow only, availability varies depending on the nature

For the burn rate alert for availability SLO, we set the alerts to fast and slow as mentioned above, but for latency we only set slow.

One reason we don't apply a fast burn rate to latency is that it can easily result in noisy alerts.

If the endpoint being measured depends on an external API, we do not apply a fast burn rate to the latency because it will be affected by the latency of that external API.

slow will give you good results.

On the other hand, availability is set to take advantage of the characteristics of both fast and slow.

Burn rate alert at midnight

With a short time window, simple burn rate alerts often occur during times of low request volume, such as late at night.

For example, if your system receives 10 requests per hour, one failed request results in a 10% error rate per hour. With a 99.9% SLO, this request would result in a 1,000x burn rate, consuming 13.9% of your 30-day error budget and immediately triggering an alert.

WINTICKET Appチーム anies1212 作 — WINTICKET App Teamanies1212 Made by

WINTICKET mitigates this somewhat by setting the burn rate threshold at 3 or 20.

Another approachsre.googleWhat was introduced was a method to artificially create normal requests.

This is a way to mitigate the impact of error requests even during times when there are few requests.

It seems a little unrealistic, but I think it's interesting.

There are many other approaches introduced here.

Google SRE - Prometheus Alerting: Turn SLOs into Alerts

Turn SLOs into actionable alerts on significant events using Prometheus alerting. Improve precision, recall, detection time, and time for alerting.

https://sre.google/workbook/alerting-on-slos/#pros_and_cons_of_using_multiple_b-id00013

If you want a more accurate burn rate

Combining multiple windows and multiple burn rates can eliminate false positives.

In this example, the alert will fire when the burn rate reaches 14.4 (14.4: 2% of the error budget consumed) over both the 5 minute and 1 hour intervals.

Alerting on SLOsQuote from

This will eliminate the benefit of immediacy, but it will allow you to configure essential alerts that indicate deterioration in service quality, excluding noisy alerts that will quickly recover.

When the error budget is depleted

Last time, I participated in DotMoney as an Embedded SRE, and I had a discussion with the business manager."Except for responding to production outages, fixes to restore reliability, and feature releases involving external companies, we prohibit feature releases when the error budget is exhausted."We were able to agree.

I tried SRE for a service as an Embedded SRE - CyberAgent SRG #ca_srg

https://ca-srg.dev/post/first-embedded-sre#block-6ec67a4c84794b85860b704d7b71e8f7

At WINTICKET, when the error budget is depleted, the server team determines whether the cause is external, and if not, cuts tasks and assigns members to restore the error budget. This creates a unique culture within the team.

SLI/SLO penetration

Now we've reached the point where we can actually code and visualize the SLIs/SLOs.

We create Grafana dashboards for each component.

What to visualize

What is the purpose of visualization?

The following items are required when reviewing SLOs:

Error Budget

Current SLO

各コンポーネントの SLO サマリ — SLO summary for each component

But this is not enough.

The error budget is depleted, and more information is needed to dig deeper.

Error Budget Time Series Graph

SLI Time Series Graph

Time Series Latency Graph (Latency) for the SLI

Time Series response code (availability) for the applicable SLI

各コンポーネントの SLO の詳細 — SLO details for each component

By deploying this in the same dashboard, you can enjoy the following benefits when reviewing:

You can determine when the condition worsened or improved.

If it overlaps with the release, that's the reason
External service failure

Can determine if the condition is continuing to worsen

If the SLO has deteriorated, you can take measures to determine whether it is improving.

There are no other issues, and the set SLI/SLO is strict, so you can make a decision to adjust it.

Conducting study sessions

Conduct training sessions for the whole team.

The purpose is to help people understand SLI/SLO at least a little, but there is no way that they can understand it in a one-time study session.

I didn't understand it either, so we shared things like, "This is what we're going to do!" and "These are the benefits!" and did it in a sort of rallying cry kind of atmosphere.

Review SLOs with the whole team every two weeks

The purpose of reviewing SLOs with the entire team is to:

Creating a culture of SLI/SLO through team-wide efforts

(If service quality actually deteriorates) Help employees understand the benefits of SLI/SLO and maintain their motivation

How to review on WINTICKET

At the time of writing, WINTICKET has 103 SLOs.

It would be quite exhausting to review all of this.

Therefore, the review looks at "only the SLOs where the error budget was depleted + what happened to the SLOs where the error budget was depleted at the time of the previous review."

This allows you to start with a simple operation that won't tire you out.

💡

Ultimately, we aim to reach a point where we agree with business roles on how to act when the error budget is depleted, and someone on the server team takes action to restore the error budget when it is depleted.

💡

What should we do about SLOs that don't consume any of the error budget? At WINTICKET, we identify cases where the error budget is hovering around 100% every three or six months, and tighten our SLIs and SLOs to prevent the error budget from becoming excessive.

WINTICKET uses Wrike for task management, so we also use Wrike for SLO retrospectives.

We try to keep the number of tools to a minimum and aim to operate in a way that doesn't tire out our members.

Wrike での SLO タスク管理 — SLO Task Management in Wrike

The actual review process is as follows:

Randomly select a facilitator

Create a team for each category

Teams can view the SLOs for their assigned categories in Grafana.

Wake up the error budget exhaustion in Wrike

Judgment on whether or not to respond

If the problem is with the external API, no action is required.

Check the status of SLOs already created in Wrike (error budget exhausted)

Comment and update the error budget field

In fact, the quality of service is getting worse every day.

As mentioned earlier, the quality of service is getting worse every day.

For example, latency increases as the number of DB records increases.

Or maybe you added a table as part of a new initiative, but there was an index error.

Even if there is no problem at first, the impact will become obvious as the number of records increases.

日々悪化していく例 — Cases that get worse every day

Why has SLI/SLO become so widespread?

Looking at the current state of WINTICKET adoption, it appears that members other than those who promoted SLI/SLO have begun to look for the causes of deterioration using burn rate alerts and the SLO dashboard, and that the server team is independently implementing SLI/SLO.

In fact, I was the facilitator for the first review meeting, but now someone from the server team is taking over.

For now, we've been able to get this far because we've been working together.@taba2424It was also reassuring because he was so excellent.

In addition, the WINTICKET development manager@akihisasenHowever, I think it was also important that they had been interested in SLI/SLO from the beginning, knew its benefits, and created a structure that made it easy to move forward.

Conclusion: The future and ideals of SLI/SLO

Currently, SLI/SLO isA common language within the server teamHowever, it has not yet achieved its original goal of "creating a common language with business."

The next step for the SRE team is to spread the knowledge to business roles. To achieve this, we plan to take the following approach:

Improved comprehensiveness

Adding a new SLO
Increased confidence in SLOs (e.g., SLI adjustments)

Involve someone from the business team in SLO review meetings with the server team

We have also created a business dashboard that allows business people to see the current SLOs in an easy-to-understand manner.

We will utilize these to gradually spread the word.

The need for business penetration and a culture where those who improve SLOs are praised

I believe that anyone who improves service quality should be praised and recognized not only by engineers, but by everyone involved with the product.

However, WINTICKET currently has no agreement with the business side.

First, we are currently working on improving the reliability and comprehensiveness of SLOs while implementing and promoting SLOs across the entire server team.

Chat Embedded SRE? Enabling SRE?

I've been hearing the term "Enabling SRE" a lot recently, so I looked into it, but I haven't seen it anywhere overseas.

X's community "SRE and observabilityWhen I asked a question, I received a lot of information.

There seems to be a description of "Enabling" in the book "Team Topology: Adaptive Organizational Design for Rapid Delivery of Valuable Software."

Team Topology: Adaptive Organizational Design for Rapid Delivery of Valuable Software

Amazon offers many books with reward points, including Team Topology: Adaptive Organizational Design for Rapid Delivery of Valuable Software by Matthew Skelton, Manuel Pais, Kiro Harada, Miho Nagase, and Ryutaro Yoshiba. You can also get same-day delivery on eligible items with expedited shipping, including Team Topology: Adaptive Organizational Design for Rapid Delivery of Valuable Software by Matthew Skelton, Manuel Pais, Kiro Harada, Miho Nagase, and Ryutaro Yoshiba. Also, Team Topology: Adaptive Organizational Design for Rapid Delivery of Valuable Software is eligible for free standard shipping if shipped by Amazon.

https://www.amazon.co.jp/%E3%83%81%E3%83%BC%E3%83%A0%E3%83%88%E3%83%9D%E3%83%AD%E3%82%B8%E3%83%BC-%E4%BE%A1%E5%80%A4%E3%81%82%E3%82%8B%E3%82%BD%E3%83%95%E3%83%88%E3%82%A6%E3%82%A7%E3%82%A2%E3%82%92%E3%81%99%E3%81%B0%E3%82%84%E3%81%8F%E5%B1%8A%E3%81%91%E3%82%8B%E9%81%A9%E5%BF%9C%E5%9E%8B%E7%B5%84%E7%B9%94%E8%A8%AD%E8%A8%88-%E3%83%9E%E3%82%B7%E3%83%A5%E3%83%BC%E3%83%BB%E3%82%B9%E3%82%B1%E3%83%AB%E3%83%88%E3%83%B3/dp/4820729632

SRG is looking for people to work with us. If you're interested, please contact us here.

Recruitment information - CyberAgent SRG #ca_srg

About SRG SRG (Service Reliability Group) is working to improve reliability by promoting the introduction of SRE to the media business as a cross-sectional SRE, based on the vision of "improving reliability across the media business." The work is centered around the following three areas: Consolidating and deploying the technical know-how of each business

https://ca-srg.dev/careers