Multi-tenant design and reflections on Ameba Platform
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article summarizes the design for multi-tenant support carried out on Ameba Platform from 2023 to 2024 and a review of the work.
Background and IssuesReasons why migration was not possibleTime to rethink the designClarification of absolute requirementsDesign PolicySecurity isolation perspectiveSecurity LevelException handlingApproach and details1. Authentication and Authorization Integration Strategy2. Implementing Network Security3. Protecting shared resources, volumes and backups4. Monitoring and APM SecurityLooking backFine-Grained is not availableThe operational reality of each tenant is more complicatedNetwork Policy is very usefulConclusion
Background and Issues
Ameba Platform is a platform with the infrastructure (mainly EKS) for AmebaBlog and related services at its core. It was launched around 2020 with the aim of unifying the development and deployment flow and simplifying the technology stack, and the central goal of the project was to integrate many services into a more efficient and manageable infrastructure.
Although the core components were successfully migrated in the early stages of the project, authentication services and other high-security services could not be migrated to Ameba Platform due to EKS and other security issues. These services had to continue to operate on the existing infrastructure or in a separate EKS environment.
Reasons why migration was not possible
Istio Authorization Policy
- The challenge of integrating authentication and authorization systems The ideal design would have been to use our in-house authentication and authorization infrastructure to fully integrate Kubernetes RBAC, AWS IAM, and all developer tools and monitoring and operations tools, but we did not have the luxury of doing so in 2020-2021, when we were in the early stages of platform development.
- Compatibility issues between Istio and EKS
Security Groups for Pods(SGP)
Time to rethink the design
In 2023, about three years after the start of the platformization, we had more spare human resources and a wider range of technical options, so we had the opportunity to completely reassess the multi-tenant design and restart the migration process.
First, the stability of VPC CNI and SGP operation was proven, making it possible to use SGP. Also, from 2022, VPC CNI began native support for NetworkPolicy, minimizing vendor dependency.
I joined this project in 2023, shortly after joining CyberAgent, and worked on authentication integration and multi-tenant design on AWS and EKS.
Clarification of absolute requirements
The following requirements were established as absolute non-negotiables in the project redesign:
- Complete communication isolation according to security level
Between Pods, between Pods and AWS resources
- Centralized authentication of all communications through a common authentication infrastructure
AWS、EKS、Datadog、ArgoCD、Github Teams
- Strict access control for AWS and Kubernetes resources
Utilizing RBAC, ABAC, etc.
Design Policy
Furthermore, the following policies were used as design guidelines:
- Minimize dependency on specific vendor products
- Aim to achieve this using the default functions of Kubernetes and AWS as much as possible
- Ensuring simplicity and maintainability of authentication and authorization processes
Security isolation perspective
Security Level
Services on AWS are independent of each other, yet have an equal relationship. To achieve security separation, it is important to use IAM's ABAC to identify the security level of each service and control communication based on that. Resource tags are one of the best ways to identify the security level of a service.
On the Ameba platform, we have taken into account the characteristics of microservices and categorized services as follows:
- Protected Services: Services with high security requirements and strict management
- Non-Protected Services: Services with relatively low security requirements
The following principles have been established for communication control:
- High security level service (Protected)
- Inbound communication is strictly restricted
- Outbound communication is relatively free
- Services with low security level (Non-Protected)
- Relatively loose restrictions for both inbound and outbound
Exception handling
In a traditional multi-tenant model, it is common for each tenant to have strict restrictions and limit access to only their own resources. However, in real-world operations, more complex requirements exist. For example, a team managing authentication services must deal with services with different security levels on a daily basis.
Even for services with high security levels, it is necessary to allow exceptional inbound communication, such as publishing some APIs. When allowing exceptional communication, we have adjusted the security level of some of the targets.
The following communication restrictions have been set on the Ameba platform.
- Non-Protected services cannot access Protected services
- Protected services can access non-protected services
- Protected services that expose specific endpoints can be demoted to non-protected services.
Approach and details
The specific implementation approach for the multi-tenant design was developed by focusing on four main areas:
1. Authentication and Authorization Integration Strategy
Unification of authentication infrastructure
Ameba Platform has adopted the following integrated approach to achieve centralized management of authentication and authorization:
- AWS, Datadog, and Github authentication: Leveraging our internal SAML infrastructure
- Node SSH access: Using internal LDAP infrastructure
- ArgoCD: Using OAuth2 and OIDC together on Github Teams
Due to the limitations of our internal authentication infrastructure, integrating authentication was one of the most complex aspects of this project. While some things could have been centralized with OIDC, we had to adopt a variety of approaches due to the lack of OIDC functionality in our internal authentication infrastructure.
In particular, in the case of ArgoCD, we were unable to directly integrate with the in-house authentication infrastructure due to security concerns regarding SAML for dex, so we integrated OIDC via GitHub Teams. Since GitHub Teams is already SAML-integrated, there is no need to inventory users.

Please refer to our previous article regarding the issue of ArgoCD SAML integration.
ABAC: Role
developer、admin
<product>-<tenant>-<role>
ameba-A-developer
ameba-A-secure
We have created a system that allows us to add attributes corresponding to other roles depending on the actual operational situation.
ABAC: Policy
Resource Tag-based attribute management has been introduced to achieve advanced access control.
ameba.jp/protected=true
ameba.jp/sensitive=true
ameba.jp/exposed=true
StringNotEquals
- Use NotActions to distinguish between Admin and Developer
StringNotEquals Condition
StringEquals Condition
While ABAC can control most AWS services, there are some services that cannot be controlled with Resource Tags. In such cases, you will need to handle them individually using the corresponding Condition.
For example, the following services and APIs:

More information can be found in the Service Authorization documentation.
EKS RBAC
developer/admin
ClusterRoleBinding
RoleBinding
2. Implementing Network Security
Pod ↔ Pod communication
ConfigurationValues
PodSelector
When applying a tagging strategy similar to IAM ABAC, there are a few things to keep in mind:
ameba.jp/protected=true
- An Expose Tag is required when exposing a Pod in a Protected Namespace to the outside world.
Hierarchical Namespace
namespaceSelector
ameba.jp/exposed: "true"
Also, by using the inheritance feature of Hierarchical Namespace, it is no longer necessary to create it in each Child Namespace as shown above.
Communication between Pods and AWS resources
SecurityGroupsForPod(SGP)
There are two steps to using SGP.
- Change the following settings in vpccni:
- Use SGP CustomResource
SGP carries several risks.
- There is a limit to the number of Pods that can be applied.
Branch ENI
Trunk ENI
Branch ENI
- Pod startup speed will be slower.
Branch ENI
- Potential conflict with other network vendors
This has now been resolved, but in the past there was a conflict with Istio (though this has not been confirmed). AWS support recommends IAM authentication, so SGP should be considered as a last resort.
If you are interested in the details of SGP, please refer to our previous article.
3. Protecting shared resources, volumes and backups
Shared Resources
For shared resources such as ECR and S3, which are centrally managed in a Shared account, we have implemented access control using Resource Tags in the Shared account. For services where it is difficult to control Resource Tags (such as S3), we also use an identification method using resource name prefixes.
Storage Tier
All EBS Volume operations can be controlled by Resource Tags, with one exception when used with EKS:
Kubernetes PersistentVolume (PV)
backup
All AWS Backup APIs such as create/copy are subject to the control of Resource Tags.
4. Monitoring and APM Security
When integrating with monitoring tools, particularly Datadog, we tried to integrate with the authentication infrastructure, but there were issues with APM's permission control.
Although APM restrictions exist, we found that fine-grained permission control is difficult. If you divide it into granularities like the one in the diagram below, you can only handle it by either blocking everyone without specific permissions from seeing APM, or allowing everyone to see it.

Therefore, we adopted an approach that masks sensitive data before it enters APM.
Looking back
It has been about six months since the entire Ameba Platform environment was updated, but due to a lack of human resources, the migration has not yet begun.Trial operations so farI will summarize what I felt and what I thought after looking at examples from other companies.
Fine-Grained is not available
The two separate developer and admin roles do not allow for detailed permission settings for users who only see certain services and resources.Such a powerful roleI wonder if it's okay to give it to them.
Because there is a one-to-one relationship between IAM roles and roles in the company's authentication infrastructure, it is practically difficult to increase the number as needed. I'm still thinking about what to do. If anyone has any good ideas, please let me know.
The operational reality of each tenant is more complicated
<product>-<tenant>-<role>
For example, some tenants use authentication infrastructure roles for multiple purposes, and each tenant manages both member and collaboration management independently. Since these roles cannot be integrated into Ameba Platform, it seems that the multi-tenancy system will be incomplete.
Furthermore, our authentication infrastructure has a limit on the number of roles that can be referenced, so even if we wanted to integrate member management into the Ameba Platform, it is unclear at this time to what extent this would be possible.
Network Policy is very useful
Cloudflare Tunnel
Cloudflare Tunnel
Conclusion
I wrote this article while recalling the entire process of supporting multi-tenancy at Ameba. Looking back, my memory is hazy in many places, and the carefully written documentation at the time has saved me many times.
This article is just one example from within CyberAgent, but we hope it will be of some use to you.
SRG is looking for people to work with us.
If you're interested, please contact us here.