I entrusted my dreams to Cloud Service Mesh for Cloud Run

Masaya Matsuda (Service Reliability Group (SRG) of the Media Headquarters)@mm_matsuda816)is.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article isQualiArts Advent Calendar 2024It's the 13th day. I'm writing this post because I'm involved as an Embedded SRE.
We considered and tried to see if Cloud Service Mesh for Cloud Run could solve the issues we wanted to solve with Cloud Run.
The information in this blog is current as of December 1, 2024.
 

What is Cloud Service Mesh for Cloud Run?


At the time of writing, this feature is in preview.
Advanced traffic management capabilities with Cloud Service Mesh are now available on Cloud Run.
For details, please refer to the official documentation below.
 

Issues to be solved


Challenge A: Increasing backend services in multiple Dev environments for gRPC-based applications

In the Dev environment, different versions of the application are deployed to Cloud Run for each environment (dev01, dev02, etc.). A Backend Service must be created for each Cloud Run, as shown in Figure 1.
One issue with this configuration is that if you enable Cloud Armor Enterprise Paygo/Annual*1, you will be charged for each Backend Service even if you have not set a policy. This can be a problem if you want to try out the same Cloud Armor Enterprise settings in your Dev (or Stg) environment as in your Prod environment.
図1 Dev環境の課題
Figure 1 Dev environment issues
 
*1 Cloud Armor Enterprise Annual billing system
The fee is a flat rate of $3,000/month for up to 100 protected resources, and $30/month for each protected resource above 100 (see Figure 2).
The protected resources are Backend Service and Backend Bucket, and they are counted even if Cloud Armor policy is not applied. For details,documentPlease refer to.
図2 Cloud Armor Enterprise Annual料金 記事執筆時の情報
Figure 2. Cloud Armor Enterprise Annual pricing (as of the time of writing)
The actual amount ishttps://cloud.google.com/armor/pricingPlease refer to.
 

Issue B: Blue/Green deployment of gRPC-based applications where all traffic is immediately switched

Traffic migrationWhen switching revisions using , even if you configure traffic to be 100% directed to the Green revision, requests may be sent to the Blue revision between the time the change is applied and the time the traffic is switched. *2
For a backwards incompatible release, this behavior is unacceptable.
 
*2 In the case of REST APISession AffinityThis ensures that users who access newer revisions are not directed to older versions.

The ideal solution I dreamed of, but I can't come to a conclusion

Ideal A for Task A and the results of the study

To reduce the number of backend services, we wondered whether we could aggregate NEGs using Cloud Service Mesh (Figure 4).
Figure 4 is the ideal diagram I had in mind before researching Cloud Service Mesh for Cloud Run.
(In the Dev environment we want to use, we communicate via gRPC, so we will use gRPCRoute.)
図4 課題Aに対する理想A
Figure 4 Ideal A for Task A
Figure 5 shows the results of our investigation into Cloud Service Mesh for Cloud Run. Cloud Service Mesh controls the route to the backend service, but does not have the functionality to aggregate NEGs or backend services.
図5 現実A
Figure 5 Reality A

Ideal B for Task B

Figure 6 shows the configuration I came up with in the same way as in Issue B. Ideally, by changing the HTTP Route in this configuration, reviews-v2 will stop returning responses at the same time that reviews-v3 starts returning responses.
(The API I want to use communicates via gRPC, so I use gRPC Route, but I'm using HTTP Route to test it with Bookinfo.)
図6 課題Bに対する理想B
Figure 6 Ideal B for Task B
Figure 7 shows what we actually built. Because Cloud Service Mesh doesn't allow NEGs with revision tags to be registered in the Backend Service, we had to create a separate Cloud Run service for each release version. We concluded that this configuration was unacceptable from an operational perspective.
We have not conducted performance tests to confirm that all traffic is switched as expected.
図7 理想Bの検討結果
Figure 7. Results of ideal B

Consideration of Ideal B with Bookinfo

To see if it can actually be built, we will show the Terraform used when testing the Istio sample application Bookinfo.

Configuration Description

Since we are using Bookinfo this time, we are using HTTPRoute, but the same settings as used here can also be made with gRPCRote.
google_network_services_http_route.reviews
The behavior verification content is translucens'sI tried out Cloud Run's service meshI will omit it as it is the same as this blog.
 

Terraform Sample

Here is a sample terraform code to build Bookinfo:
Please modify the example as appropriate.

Consideration result: If we could specify Cloud Run tags in NEG...

INTERNAL_SELF_MANAGED
図8 Backend Serviceの制約
Figure 8 Backend Service constraints
This constraint required us to create a Cloud Run service for each release version.

Other dreams for Cloud Service Mesh for Cloud Run

I would be happy if LB could access Cloud Run directly via Cloud Service Mesh. This feature seems to have been introduced to improve traffic between Cloud Run services, so this feature is a bit off-topic.

Conclusion


In this article, we tried to see if Cloud Service Mesh for Cloud Run could solve the following issues, and concluded that it could not.
  1. Unifying BackendServices across multiple dev environments in gRPC-based applications
  1. Blue/Green deployment for instant traffic switching in gRPC-based applications
 
Tomorrow'sQualiArts Advent Calander The 14th day is hikyaru-suzuki.
 
SRG is looking for people to work with us. If you're interested, please contact us here.