Multi-tenant design and reflections on Ameba Platform
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article summarizes the design for multi-tenant support carried out on Ameba Platform from 2023 to 2024 and a review of it.
Background and IssuesReasons why migration was not possibleTime to rethink the designClarification of absolute requirementsDesign PolicySecurity Isolation PerspectiveSecurity LevelException handlingApproach and details1. Authentication and Authorization Integration Strategy2. Implementing Network Security3. Protecting shared resources, volumes and backups4. Monitoring and APM SecurityLooking backFine-grained is not availableThe operational reality of each tenant is more complicatedNetwork Policy is very usefulConclusion
Background and Issues
Ameba Platform is a platform with AmebaBlog and the infrastructure (mainly EKS) of peripheral services at its core. It was launched around 2020 with the aim of unifying the development deployment flow and simplifying the technology stack, and the central goal of the project was to integrate many services into a more efficient and manageable infrastructure.
Although the core parts were successfully migrated in the early stages of the project, authentication services and services with high security levels could not be migrated to Ameba Platform due to EKS and other security issues. These services had to continue to be operated on the existing infrastructure or in a separate EKS environment.
Reasons why migration was not possible
Istio Authorization Policy
- The challenge of integrating authentication and authorization systems The ideal design would have been to use our in-house authentication and authorization infrastructure to fully integrate Kubernetes RBAC, AWS IAM, and all developer tools and monitoring and operations tools, but we did not have the leeway to do so in 2020-2021, when we were in the early stages of platform development.
- Istio and EKS compatibility issues
Security Groups for Pods(SGP)
Time to rethink the design
In 2023, about three years after the start of the platformization, we had more human resources and more technical options, so we had the opportunity to fundamentally reconsider the multi-tenant design and restart the migration process.
First, the stability of the VPC CNI and SGP operations was proven, making it possible to use SGP. In addition, VPC CNI began native support for NetworkPolicy from 2022, minimizing vendor dependency.
In 2023, I joined this project immediately after joining CyberAgent, and designed authentication integration and multi-tenancy on AWS and EKS.
Clarification of absolute requirements
The following requirements were established as absolute non-negotiables in the project redesign:
- Complete communication isolation according to security level
Pod ↔ Pod, Pod ↔ AWS resource
- Centralized authentication of all communications through a common authentication infrastructure
AWS、EKS、Datadog、ArgoCD、Github Teams
- Strict access control for AWS and Kubernetes resources
Utilizing RBAC, ABAC, etc.
Design Policy
In addition, the following principles were used as design guidelines:
- Minimize dependency on specific vendor products
- Aim to achieve this using Kubernetes and AWS default functions as much as possible
- Ensuring simplicity and maintainability of authentication and authorization processes
Security Isolation Perspective
Security Level
AWS services are independent of each other, but have an equal relationship. In order to achieve security separation, it is important to use IAM ABAC to identify the security level of each service and control communication based on that. Resource tags are one of the best ways to identify the security level of a service.
On the Ameba platform, we took into account the characteristics of microservices and categorized services as follows:
- Protected Services: Services that have high security requirements and require strict management
- Non-Protected Service: Services with relatively low security requirements
The following principles have been established for communication control:
- Services with high security level (Protected)
- Inbound communication is strictly restricted
- Outbound communication is relatively free
- Services with low security level (Non-Protected)
- Relatively loose restrictions for both inbound and outbound
Exception handling
In a traditional multi-tenant model, it is common to set strict restrictions on each tenant and restrict access to only their own resources. However, in real-world operations, there are more complex requirements. For example, a team managing authentication services must deal with services with different security levels on a daily basis.
Even for services with high security levels, it is necessary to allow exceptional inbound communication, such as by publishing some APIs. When allowing exceptional communication, we have adjusted the security level of some of the targets.
The following communication restrictions have been set on the Ameba platform.
- Non-Protected services cannot access Protected services
- Protected services can access Non-Protected services
- Protected services that expose specific endpoints will be demoted to non-protected services.
Approach and details
The specific implementation approach for the multi-tenant design evolved around four key areas:
1. Authentication and Authorization Integration Strategy
Unification of authentication infrastructure
Ameba Platform adopted the following integrated approach to achieve centralized management of authentication and authorization:
- AWS, Datadog, and Github authentication: Leveraging our in-house SAML infrastructure
- Node SSH access: Using the company's LDAP infrastructure
- ArgoCD: Using OAuth2 and OIDC on Github Teams
Authentication integration was one of the most complex aspects of this project due to the limitations of our in-house authentication infrastructure. While some things could have been centralized with OIDC, we had to adopt a wide variety of methods due to the lack of OIDC functionality in our in-house authentication infrastructure.
In particular, in the case of ArgoCD, we were unable to directly integrate with the in-house authentication infrastructure due to security concerns over SAML in dex, so we integrated OIDC via GitHub Teams. Since GitHub Teams is already integrated with SAML, there is no need to take inventory of users.

Please see our previous article regarding issues with ArgoCD SAML integration.
ABAC: Role
developer、admin
<product>-<tenant>-<role>
ameba-A-developer
ameba-A-secure
We have also created a system that makes it possible to add other roles and corresponding attributes depending on operational circumstances.
ABAC: Policy
In order to achieve advanced access control, we have introduced Resource Tag-based attribute management.
ameba.jp/protected=true
ameba.jp/sensitive=true
ameba.jp/exposed=true
StringNotEquals
- Use NotActions to distinguish between Admin and Developer
StringNotEquals Condition
StringEquals Condition
Although ABAC can control most AWS services, there are some services that cannot be controlled with Resource Tags. In such cases, you will need to handle them individually with the corresponding Condition.
For example, the following services and APIs:

More information can be found in the Service Authorization documentation.
EKS RBAC
developer/admin
ClusterRoleBinding
RoleBinding
2. Implementing Network Security
Pod ↔ Inter-Pod communication
ConfigurationValues
PodSelector
When applying a tagging strategy similar to IAM ABAC, please note the following:
ameba.jp/protected=true
- An Expose Tag is required when exposing a Pod in a Protected Namespace to the outside world.
Hierarchical Namespace
namespaceSelector
ameba.jp/exposed: "true"
Also, by using the inheritance feature of Hierarchical Namespace, it is no longer necessary to create it in each Child Namespace as shown above.
Communication between Pods and AWS resources
SecurityGroupsForPod(SGP)
There are two steps to using SGP.
- Change the following settings in vpccni:
- Use SGP's CustomResource
SGPs carry several risks.
- There is a limit to the number of Pods that can be applied.
Branch ENI
Trunk ENI
Branch ENI
- Pod startup speed will be slower.
Branch ENI
- Potential for conflict with other network vendors
This has now been resolved, but in the past there was a conflict with Istio (although this is unconfirmed). AWS support recommends IAM authentication, so SGP should be considered as a last resort.
If you are interested in the details of SGP, please refer to our previous article.
3. Protecting shared resources, volumes and backups
Shared Resources
For shared resources such as ECR and S3, which are managed centrally in a Shared account, we have implemented access control using ResourceTag in the Shared account. For services where it is difficult to control Resource Tags (such as S3), we also use an identification method using the prefix of the resource name.
Storage Tier
All EBS Volume operations can be controlled by Resource Tags, with one exception when used with EKS:
Kubernetes PersistentVolume (PV)
backup
All AWS Backup APIs such as create/copy are controlled by Resource Tags.
4. Monitoring and APM Security
When integrating with monitoring tools, especially Datadog, we tried to integrate with the authentication infrastructure, but there were issues with APM's permission control.
Although there are restrictions on APM itself, we found that fine-grained permission control is difficult. If you divide it into granularities like the one in the figure below, you can only handle it by blocking everyone without specific permissions from seeing APM, or by allowing everyone to see it.

Therefore, we adopted an approach that involves masking sensitive data before it enters APM.
Looking back
It has been about half a year since the entire Ameba Platform environment was updated, but due to a lack of human resources, the migration has not yet begun.Trials so farI will summarize what I felt and what I thought after looking at examples from other companies.
Fine-grained is not available
The two separate developer and admin roles do not allow detailed permission settings for users who can only access certain services and resources.Such a powerful roleI wonder if it's okay to give it to them.
Since there is a one-to-one relationship between IAM roles and roles in the company's authentication infrastructure, it is practically difficult to increase the number of roles as needed. I am still thinking about what to do. If you have any good ideas, please let me know.
The operational reality of each tenant is more complicated
<product>-<tenant>-<role>
For example, the roles of the authentication infrastructure used by some tenants were used for multiple purposes, and each tenant managed both member management and collaboration management. Since such roles could not be integrated into the Ameba Platform, it would likely become an incomplete multi-tenant system.
Furthermore, our authentication infrastructure has a reference limit between roles, so even if we wanted to integrate member management into the Ameba Platform, it is unclear at this time to what extent this would be possible.
Network Policy is very useful
Cloudflare Tunnel
Cloudflare Tunnel
Conclusion
I wrote this article while recalling the whole process of multi-tenant support at Ameba. Looking back, my memory is hazy in many places, and the carefully written documentation at the time saved me many times.
This article is just one example within CyberAgent, but we hope it will be of use to you.
SRG is looking for people to work with us. If you are interested, please contact us here.