EKS Auto Mode Verification

Suzuki (@sZma5a)is.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article shares the results of technical testing of EKS's operational automation feature, "EKS Auto Mode," and specifically explains its benefits and the technical constraints that led us to decide not to implement it.
 

Introduction


Amazon EKS, a managed Kubernetes service, simplifies the deployment and management of containerized applications, but still incurs significant operational costs, such as add-on updates and node management. To further reduce this operational burden and create an environment where developers can focus on more essential tasks, we considered implementing "EKS Auto Mode," a new feature that significantly automates EKS operations. In this article, we share the results of our technical testing, the benefits we gained, and the reasons we decided not to implement it.

What is EKS Auto Mode?


EKS Auto Mode is a feature designed to reduce the operational burden of Amazon EKS. You can delegate node management, such as node provisioning, scaling, and OS patching, as well as middleware management, such as updating certain official add-ons, to AWS. This abstracts the complexity of cluster operation and is expected to significantly reduce the effort required for infrastructure management. With EKS Auto Mode, the following components are primarily managed by AWS:
  • Amazon VPC CNI (Container Network Interface)
  • AWS Load Balancer Controller
  • CoreDNS (Cluster DNS)
  • kube-proxy (network proxy)
  • Karpenter (node provisioner)
  • AWS EBS CSI driver (Container Storage Interface driver for EBS)

Reduction of operational man-hours through implementation


The biggest benefit of introducing EKS Auto Mode is the significant reduction in operational effort. It eliminates the need to manually update official EKS add-ons. In our environment, updating each add-on took approximately two weeks of effort, including research and verification.
Adding all this work together and taking into account the update frequency of each add-on, it works out to about 26 to 30 days of labor per month. By introducing EKS Auto Mode, these add-ons are managed and verified by AWS before being updated, significantly reducing the amount of labor required for verification. Another major benefit is that security fixes are applied automatically, reducing the workload of responding to vulnerabilities.

Cost Trade-Offs


While it reduces operational workload, using EKS Auto Mode incurs additional charges. Based on calculations in our environment, adoption was estimated to increase overall AWS usage fees by approximately 3.5%.
You need to weigh this cost increase against the labor-hour reduction mentioned above to decide whether to implement it. In our environment, we determined that the cost increase was within an acceptable range compared to the labor hours required.

How to apply EKS Auto Mode


When you enable EKS Auto Mode from the dashboard, managed add-on controllers such as Karpenter and AWS Load Balancer Controller are first added to the EKS control plane managed by AWS.
In addition, a NodeClass is provided for creating nodes that support Auto Mode. If you use this to place a Node Pool under the management of managed Karpenter, the nodes belonging to that pool will be automatically provisioned in a way that allows them to use the EKS Auto Mode functions.
The important thing is that although custom resources and controllers are prepared at the stage of just enabling it, no functional differences in actual operation will become apparent unless you switch or add node pools. Billing is also incurred on a per-node basis, and applies to nodes under managed Karpenter.
Furthermore, because the add-ons provided through managed services use different API versions than those provided through self-managed services, they can coexist within the same cluster, making it easier to adopt flexible deployment patterns such as gradual migration or adopting only some workloads.
 

Technical limitations and challenges


During our testing, we discovered that EKS Auto Mode has some important technical limitations.

Add-ons cannot be downgraded

Managed add-ons are updated in conjunction with the cluster version, so you cannot arbitrarily downgrade the version.

Fixing the network interface

The container network interface (CNI) will be fixed to the AWS VPC CNI. In our environment, we were originally using the AWS VPC CNI, so there was no problem, but if you are using a different CNI, you will need to migrate.

TargetGroupBinding Migration Notes

When migrating from an existing self-managed AWS Load Balancer Controller to EKS Auto Mode, reusing the same target group can cause conflicts and lead to downtime. To migrate safely, you will need to create a new target group and transfer traffic to it.

Node Lifecycle

terminationGracePeriodSeconds

Node OS fixation

The node OS is fixed to "Bottlerocket," a Linux distribution optimized for container execution. Therefore, if you have installed OS-level customizations or kernel-level security tools, it may not be possible to support these.

.local

.local

Security Groups for Pods (SGP) is unavailable

The biggest issue we encountered during testing was the inability to use the "Security Groups for Pods" feature, which allows you to assign individual security groups to individual pods. In our environment, which operates as a multi-tenant environment, we make extensive use of this feature, and we came to the conclusion that it would be difficult to find an alternative means to meet our security requirements.

Summary of verification results and future prospects


EKS Auto Mode is a very attractive feature, with the potential to reduce add-on management efforts by up to 30 days per month. However, it also became clear that there were technical constraints that we could not compromise on in our environment, such as the inability to use .local domains and the lack of support for Security Groups for Pods. In particular, the lack of support for Security Groups for Pods, which is related to security, was a decisive factor in postponing its implementation. In conclusion, we decided to postpone the implementation of EKS Auto Mode for now, but EKS Auto Mode itself is still new, and we expect these issues to be resolved in the future. Therefore, while it is difficult for our team to migrate at the moment, we have decided to actively move forward with its implementation once these constraints are resolved. Therefore, we will be closely monitoring support for Security Groups for Pods in particular, and will move forward with its implementation once the issue is resolved.

SRG is looking for people to work with us.
If you are interested, please contact us here.