I entrusted my dreams to Cloud Service Mesh for Cloud Run
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article isQualiArts Advent Calendar 2024It's the 13th day. I'm writing this post because I'm involved as an Embedded SRE.
We considered and tried to see if Cloud Service Mesh for Cloud Run could solve the issues we wanted to solve with Cloud Run.
The information in this blog is current as of December 1, 2024.
What is Cloud Service Mesh for Cloud Run?Issues to be solvedChallenge A: Increasing backend services in multiple Dev environments for gRPC-based applicationsIssue B: Blue/Green deployment of gRPC-based applications where all traffic is immediately switchedThe ideal solution I dreamed of, but I can't come to a conclusionIdeal A for Task A and the results of the studyIdeal B for Task BConsideration of Ideal B with BookinfoConfiguration DescriptionTerraform SampleConsideration result: If we could specify Cloud Run tags in NEG...Other dreams for Cloud Service Mesh for Cloud RunConclusion
What is Cloud Service Mesh for Cloud Run?
At the time of writing, this feature is in preview.
Advanced traffic management capabilities with Cloud Service Mesh are now available on Cloud Run.
For details, please refer to the official documentation below.
Issues to be solved
Challenge A: Increasing backend services in multiple Dev environments for gRPC-based applications
In the Dev environment, different versions of the application are deployed to Cloud Run for each environment (dev01, dev02, etc.). A Backend Service must be created for each Cloud Run, as shown in Figure 1.
One issue with this configuration is that if you enable Cloud Armor Enterprise Paygo/Annual*1, you will be charged for each Backend Service even if you have not set a policy. This can be a problem if you want to try out the same Cloud Armor Enterprise settings in your Dev (or Stg) environment as in your Prod environment.

*1 Cloud Armor Enterprise Annual billing system
The fee is a flat rate of $3,000/month for up to 100 protected resources, and $30/month for each protected resource above 100 (see Figure 2).
The protected resources are Backend Service and Backend Bucket, and they are counted even if Cloud Armor policy is not applied. For details,documentPlease refer to.

Issue B: Blue/Green deployment of gRPC-based applications where all traffic is immediately switched
Traffic migrationWhen switching revisions using , even if you configure traffic to be 100% directed to the Green revision, requests may be sent to the Blue revision between the time the change is applied and the time the traffic is switched. *2
For a backwards incompatible release, this behavior is unacceptable.
*2 In the case of REST APISession AffinityThis ensures that users who access newer revisions are not directed to older versions.
The ideal solution I dreamed of, but I can't come to a conclusion
Ideal A for Task A and the results of the study
To reduce the number of backend services, we wondered whether we could aggregate NEGs using Cloud Service Mesh (Figure 4).
Figure 4 is the ideal diagram I had in mind before researching Cloud Service Mesh for Cloud Run.
Bookinfo is a sample application provided by Istio.was considered as an example.
(In the Dev environment we want to use, we communicate via gRPC, so we will use gRPCRoute.)

Figure 5 shows the results of our investigation into Cloud Service Mesh for Cloud Run. Cloud Service Mesh controls the route to the backend service, but does not have the functionality to aggregate NEGs or backend services.

Ideal B for Task B
Figure 6 shows the configuration I came up with in the same way as in Issue B. Ideally, by changing the HTTP Route in this configuration, reviews-v2 will stop returning responses at the same time that reviews-v3 starts returning responses.
(The API I want to use communicates via gRPC, so I use gRPC Route, but I'm using HTTP Route to test it with Bookinfo.)

Figure 7 shows what we actually built. Because Cloud Service Mesh doesn't allow NEGs with revision tags to be registered in the Backend Service, we had to create a separate Cloud Run service for each release version. We concluded that this configuration was unacceptable from an operational perspective.
We have not conducted performance tests to confirm that all traffic is switched as expected.

Consideration of Ideal B with Bookinfo
To see if it can actually be built, we will show the Terraform used when testing the Istio sample application Bookinfo.
Configuration Description
Since we are using Bookinfo this time, we are using HTTPRoute, but the same settings as used here can also be made with gRPCRote.
google_network_services_http_route.reviews
The behavior verification content is translucens'sI tried out Cloud Run's service meshI will omit it as it is the same as this blog.
Terraform Sample
Here is a sample terraform code to build Bookinfo:
Please modify the example as appropriate.
Consideration result: If we could specify Cloud Run tags in NEG...
INTERNAL_SELF_MANAGED

This constraint required us to create a Cloud Run service for each release version.
Other dreams for Cloud Service Mesh for Cloud Run
I would be happy if LB could access Cloud Run directly via Cloud Service Mesh. This feature seems to have been introduced to improve traffic between Cloud Run services, so this feature is a bit off-topic.
Conclusion
In this article, we tried to see if Cloud Service Mesh for Cloud Run could solve the following issues, and concluded that it could not.
- Unifying BackendServices across multiple dev environments in gRPC-based applications
- Blue/Green deployment for instant traffic switching in gRPC-based applications
SRG is looking for people to work with us.
If you're interested, please contact us here.