Deep Dive into VPC CNI: An in-depth look at IPAMD and Security Groups For Pods

Kumo Ishikawa (Service Reliability Group (SRG) of the Technology Headquarters)@ishikawa_kumo)is.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
add cmd: failed to assign an IP address to container
 

AWS VPC Add-on


The AWS VPC CNI plays a central role in the networking configuration of EKS, and its native integration with AWS VPC streamlines network management on EKS.
The AWS VPC CNI consists of the following components:
  • CNI Plugin
    • According to the CNI specification, communicate with IPAMD via gRPC to obtain an IP and perform network settings.
  • IPAMD Plugin
    • Manage AWS ENI on each node and pre-assign IP addresses for quick Pod startup.
Each relationship isProposal(The names are slightly different from the actual names.)

structure

The VPC CNI mainly provides the functions of Pod IP allocation and network routing.
  • Pod IP allocation
    • IP address assignment to a Pod is done using EC2 ENI and IPAMD. The general flow is as follows:
  • Network Routing
    • Network routing between Pods and from Pods to the outside world is configured using multiple routing tables and IPTABLE. The differences in their usage are as follows:
    • Pod ↔ External network: IPTABLE
      • Like regular SNAT, this is performed as packets traverse virtual network interfaces (veth) and EC2 ENIs (eth, etc.).
    • Pod ↔ Pod: Routing table
      • Communication between Pods and from Pods to the outside world is mainly carried out via veth and eth, and various routing tables are implicitly applied.

Life Cycle

There are seven main types of VPCCNI binaries:
Binaryexplanation
aws-k8s-agentIPAMD gRPC Server
aws-cniImplementing CNI Plugin, mainly managing settings such as ENI/IP and Linux Network within VPC
grpc-health-probeHealth Probe Check Tool for IPAMD
cni-metrics-helperSupport tool for collecting metrics and putting them into CloudWatch
aws-vpc-cni-init() An initialization component to ensure that prerequisites are in place, such as system parameters, IPV6, and duplication of each binary file.
aws-vpc-cniIn the pastIt exists as a tool to launch and manage the lifecycle of IPAMD.
egress-cniImplementing the CNI Plugin, primarily using SNAT to manage specific routing rules and policies for outbound traffic
entrypoint.sh
Execution Order
  1. /host/opt/cni/bin
  1. Start IPAMD
  1. grpc-health-probe
  1. RunCurrently it is commented out because it is used as an InitContainer.
    1. InitContainer
  1. Copy the CNIConfig file to the specified directory on the kubelet.
  1. Wait for IPAMD to start
grpc-health-probe

CNI Plugin Details

aws-cni
plugins/routed-eni/cni.go
File structure
Registering the CNI Plugin
cmdDel
  • PluginMainWithError
Add Command
  1. Loading the configuration
    1. POD_SECURITY_GROUP_ENFORCING_MODE
      • add cmd: error loading config from args
  1. Reading Kubernetes arguments
    1. cniTypes.LoadArgs
  1. Setting up a gRPC connection to IPAMD
  1. Add Network API Request
    1. AddNetwork
      AddNetwork
  1. Implementing network settings
    1. driverClient.SetupBranchENIPodNetwork
      PodVlanId
Del command
  1. Loading the configuration
    1. Same as Add command
  1. Reading Kubernetes arguments
    1. Same as Add command
  1. Delete attempt due to previous result
    1. tryDelWithPrevResult
  1. Setting up a gRPC connection to IPAMD
  1. Network Delete API Request
    1. DelNetwork
  1. Cleaning up network resources
    1. It will release the Pod IP, etc. If SGP is used, it will also delete the Branch ENI.

More about IPAMD Plugin

pkg/ipamd/*
File structure
IPAMD processing overview
  1. The initialization process involves:
      • Checking the connection to the k8s API Server
      • k8s API Client initialization
      • Initializing the Recorder to emit k8s Events
      • Initializing the IPAMD Client
  1. NodeIPPoolManager
    1. ipamd.log
  1. Run Prometheus Metrics API Server in a separate goroutine
    1. /metrics:61678
  1. Run the introspection API Server in a separate goroutine
    1. It will also be started if not disabled by an environment variable. This Introspection API is used to monitor and diagnose the operation of IPAMD. Currently, the Introspection API provides four main functions:
      • Get ENI information from IPAMD Context
      • Get a specific ENI setting name from node information
      • Get network configuration debug information
      • Get IPAMD configuration debug information
  1. Starting the gRPC Server

big picture

failed to assign an IP address to container

AddNetwork
AddNetwork processing overview
  1. Processing when PodENI is enabled (using SGP)
    1. vpc.amazonaws.com/pod-eni
      1. Pods that are SGP-targeted will be modified by VPCCNI in the following two places:
    2. "vpc.amazonaws.com/pod-eni"
    1. If the IP does not exist in the annotation, get the IP from the EC2 ENI Datastore
    1. If an IP exists in the annotation, add that IP to the existing VPC IP Pool.
    1. Respond to IP addition to CNI Plugin
    In the above flow, if an error occurs at any point, the Pod startup is delayed,The error may appear repeatedly.
    WARN
    Log messagesexplanationRelevance to SGP activationGuessing the cause
    The version information held by the CNI side and the IPAMD side does not match.noneDaemonSet update timing issue
    The Pod with the name received by IPAMD does not exist, or annotation information acquisition failed.can beBugs in Annotation and Resource Profile management
    There is no Trunk ENI installed in the node, or LinkIndex information cannot be obtained.noneSome kind of bug
    Annotation information parsing failurecan beParser and usage bugs
    More than 50 similar issues have been reported from 2020 to 2022. The main cause is believed to be a bug in the AWS VPC CNI itself.
    There are many other reasons as well:
    • Using an EC2 Instance Type that does not support SGP (e.g. using a t-series instance)

    Security Groups For Pods


    This feature allows you to apply Security Groups at the Pod level instead of the Node level, which enhances the security of communication between EKS Pods and other services in your VPC.
    A typical use case is EKS Pod ↔ RDS/ElastiCache communication control.

    structure

    aws-node

    Setting method and ENV explanation

    For VPCCNI v1.14 and later, the required procedure for enabling SGP is to set the following settings on the Container target.
    DaemonSet 
    • ENABLE_POD_ENI=true
    • POD_SECURITY_GROUP_ENFORCING_MODE=standard
    Prior to 1.10, when using Liveness/Readiness Probes, you also need to set the following settings for initContainers:
    • DISABLE_TCP_EARLY_DEMUX=true
    POD_SECURITY_GROUP_ENFORCING_MODE
    About ENFORCING_MODE
    • Strict Mode
      • All inbound/outbound traffic to a pod with a SG is controlled only by the SG of the branch ENI, while all inbound/outbound traffic between pods is inside the VPC.
    • Standard Mode
      • (Within the VPC) All communications are subject to the SG of both the Primary ENI and Branch ENI.
      • All inbound/outbound traffic to Pods with SG is controlled only by the Branch ENI SG. However, inbound/outbound traffic from the kubelet is controlled by the Node SG and the Branch ENI SG rules do not apply.When using SGP, you must configure the NodeSG and Branch ENI SG on the Pod at the same time.
      • For outbound traffic to outside the VPC (External VPN/Direct Connection/External VPC), the following conditions apply:
        • AWS_VPC_K8S_CNI_EXTERNALSNAT
        • AWS_VPC_K8S_CNI_EXTERNALSNAT
    About WARM_* and IP_COOLDOWN_PERIOD
    IP_COOLDOWN_PERIOD
    Any of the WARM targets do not impact the scale of the branch ENI pods so you will have to set the WARM_{ENI/IP/PREFIX}_TARGET based on the number of non-branch ENI pods. If you are having the cluster mostly using pods with a security group consider setting WARM_IP_TARGET to a very low value instead of default WARM_ENI_TARGET or WARM_PREFIX_TARGET to reduce wastage of IPs/ENIs.

    constraints

    • As mentioned above, when creating a SecurityGroupPolicy, the SecurityGroup set for the Pod must include both the Branch ENI SG and the Node SG.
    • Since trunk ENIs are included in the number of ENIs that a node can have, if the number of ENIs on a node reaches the number of ENIs supported by the instance type being used, trunk ENIs will not be created and Pods with SG applied on the node will not be created.
    • There is a limit to the number of branch ENIs (i.e. the number of Pods) that can be created with a runk ENI. This is not listed in the documentation, so please refer toSource codePlease calculate the actual number from the map. The calculation method is as follows.
      • IsTrunkingCompatible: true
      • Since restarting a Pod after applying SGP takes longer than usual, please use the most appropriate Deployment Update Strategy.

      References


      Conclusion


      VPCCNI has finally become available as a Network Policy Engine recently. It has continued to evolve since 2019, and has recently become more user-friendly. I look forward to the future of VPCCNI.
      The official SGP tutorial document shows the following in a red frame:Importantcontains important information and constraints that are worth reviewing.
      Understanding the source code related to IPAMD and SGP has greatly improved the accuracy of my troubleshooting. I would like to continue to try this method of learning in the future.
       
      SRG is looking for people to work with us. If you are interested, please contact us here.
       
      SRG runs a podcast where we chat about the latest hot topics in IT and books. We hope you will enjoy listening to it while you work.