I entrusted my dreams to Cloud Service Mesh for Cloud Run

Masaya Matsuda (Service Reliability Group (SRG) of the Media Headquarters)@mm_matsuda816)is.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article isQualiArts Advent Calendar 2024It's the 13th day. I'm writing this post because I'm involved as an Embedded SRE.
We considered and tried to see if the issues we wanted to solve with Cloud Run could be solved with Cloud Service Mesh for Cloud Run.
The information in this blog is current as of December 1, 2024.
 

What is Cloud Service Mesh for Cloud Run?


At the time of writing this article, the feature is in preview.
Advanced traffic management capabilities with Cloud Service Mesh are now available on Cloud Run.
For more information, please refer to the official documentation below.
 

Issues to be resolved


Issue A: Increasing number of backend services in multiple dev environments for gRPC-based applications

In the dev environment, different versions of the application are deployed to Cloud Run for each environment, such as dev01, dev02, etc. For each Cloud Run, you need to create a Backend Service as shown in Figure 1.
One issue with this configuration is that if you enable Cloud Armor Enterprise Paygo/Annual*1, you will be charged for the number of Backend Services even if you have not set a policy. This is a bottleneck if you want to try out the same Cloud Armor Enterprise settings in the Dev environment (or Stg environment) as in the Prod environment.
図1 Dev環境の課題
Figure 1 Issues with the Dev environment
 
*1 Cloud Armor Enterprise Annual billing system
The flat rate is $3,000/month for up to 100 protected resources, and $30/month for each protected resource beyond 100 (see Figure 2).
The protected resources are Backend Service and Backend Bucket, and they are counted even if Cloud Armor policy is not applied. For details,documentPlease refer to.
図2 Cloud Armor Enterprise Annual料金 記事執筆時の情報
Figure 2 Cloud Armor Enterprise Annual pricing (as of the time of writing)
The actual amount ishttps://cloud.google.com/armor/pricingPlease refer to.
 

Challenge B: Blue/Green deployment of gRPC-based applications with immediate all-traffic switchover

Traffic shiftingWhen switching revisions using , even if you set it so that traffic is 100% directed to the Green revision, requests may be sent to the Blue revision between the time the changes are applied and the time the traffic is switched. *2
For backwards incompatible releases, this behavior is not acceptable.
 
*2 In the case of REST APISession affinityThis ensures that users who access a newer revision are not redirected to older versions.

The ideal solution I dreamed of. And I can't come to a conclusion.

Ideal A for Task A and the results of the study

In order to reduce the number of backend services, we wondered if we could aggregate NEGs using Cloud Service Mesh (Figure 4).
Figure 4 is the ideal diagram I had in mind before researching Cloud Service Mesh for Cloud Run.
(In the Dev environment that I would like to use, I would use gRPC route to communicate.)
図4 課題Aに対する理想A
Figure 4 Ideal A for Task A
Figure 5 shows the results of our investigation into Cloud Service Mesh for Cloud Run. Cloud Service Mesh controls the route to the backend service, but does not have the functionality to consolidate NEGs or backend services.
図5 現実A
Figure 5 Reality A

Ideal B for Task B

Figure 6 shows the configuration I came up with in the same way as in Task B. Ideally, by changing the HTTPRoute in this configuration, reviews-v2 will stop returning responses at the same time that reviews-v3 starts returning responses.
(The API I want to use communicates with gRPC, so I use gRPC Route, but I'm using HTTP Route to test with Bookinfo.)
図6 課題Bに対する理想B
Figure 6 Ideal B for Task B
Figure 7 shows what we were able to build. In the case of Cloud Service Mesh, we were unable to register a NEG with a revision tag in the Backend Service, so we had to create a separate Cloud Run service for each release version. We concluded that this configuration was unacceptable from an operational perspective.
We have not conducted performance tests to confirm that all traffic is switched as expected.
図7 理想Bの検討結果
Figure 7. Results of ideal B

Consideration of Ideal B with Bookinfo

We will show you the Terraform we used when testing the Istio sample application Bookinfo to see if it can actually be built.

Configuration Description

Since we are using Bookinfo this time, we are using HTTPRoute, but the same settings as those used this time can also be set up with gRPCRote.
google_network_services_http_route.reviews
The behavior verification content is translucens'I tried out Cloud Run's service meshI will skip this as it is the same as this blog.
 

Terraform Sample

Here is a sample of the terraform code to build Bookinfo.
Please modify the example as appropriate.

Consideration result: If we could specify Cloud Run tags in NEG...

INTERNAL_SELF_MANAGED
図8 Backend Serviceの制約
Figure 8 Backend Service constraints
This constraint required us to create a Cloud Run service for each release version.

Other dreams for Cloud Service Mesh for Cloud Run

I would be happy if I could access Cloud Run directly from LB via Cloud Service Mesh. It seems that it was introduced to improve traffic between Cloud Run services, so this function is out of its role.

Conclusion


In this article, we tried to see if Cloud Service Mesh for Cloud Run could solve the following issues and concluded that it could not.
  1. Unifying BackendServices in multiple dev environments for applications using gRPC
  1. Blue/Green deployment for instantaneous traffic switching in gRPC-based applications
 
TomorrowQualiArts Advent Calander The 14th day features hikyaru-suzuki.
 
SRG is looking for people to work with us. If you are interested, please contact us here.