Switching Amazon EC2 to Arm has been nothing but happiness
SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
IntroductionWhat's great about AWS Graviton?Starting to support ArmFrom x86_64 to AArch64 AnsibleNginxtd-agentAbout entropyhavegedresultperformanceALB Response TimeLoad Average
Introduction
Many people have heard the word "Arm" in relation to the Apple M1 and have been surprised by its performance.Here
Amazon EC2 (hereinafter referred to as EC2) also has instances equipped with Arm.https://aws.amazon.com/jp/ec2/graviton/
This time, I was very happy after migrating all the EC2 instances in the development environment of a certain service from m5.large to t4g.medium, so I hope that reading this article will help you understand at least a little of the benefits of Arm instances.
What's great about AWS Graviton?
Arm-equipped EC2 instances are cheaper and more powerful than traditional Intel/AMD-equipped instances, making them very cost-effective.
Currently (as of November 19, 2020), in addition to EC2, it is also deployed on Amazon RDS and Amazon ElastiCache.https://aws.amazon.com/jp/about-aws/whats-new/2020/10/achieve-up-to-52-percent-better-price-performance-with-amazon-rds-using-new-graviton2-instances/https://aws.amazon.com/jp/about-aws/whats-new/2020/10/amazon-elasticache-now-supports-m6g-and-r6g-graviton2-based-instances/
We also wrote an article six months ago comparing the Arm-equipped m6g.large with the conventional m5.large.https://blog.luispc.com/entry/2020/05/18/190613
Starting to support Arm
The Arm-compatible services in this case are managed by Ansible and Terraform, so we will start by making Ansible Arm-compatible.
From x86_64 to AArch64 Ansible
First of all, the key is whether your middleware supports AArch64. Please note that some middleware has different versions for x86_64 and AArch64.
The middleware that needed to be changed in this case was the following:
- Nginx
- td-agent
The Arm-compatible service we made this time had Ansible well-maintained, so we were able to make it Arm-compatible with Ansible in less than one business day. I am incredibly grateful to the previous person in charge.
Nginx
The official Nginx repository has packages for AArch64 from CentOS 8, but since we are using CentOS 7 this time, we did not use them and instead used epel's.
td-agent
The official td-agent repository provides packages for AArch64 on CentOS 7 (starting from v4.x).
We were able to support AArch64 without any configuration changes to the middleware.
About entropy
On EC2 Arm, the entropy for random devices is very low, which can sometimes cause errors or significant performance degradation.
This time, the performance of the random number generation part using Apache Tomcat (java) dropped significantly. This is because /dev/random is used for random number generation, and there are few resources for this.
Checking the entropy of t4g.medium
There were only 11.
haveged
Haveged is the solution to this.https://wiki.archlinux.jp/index.php/Haveged
Installation is easy with yum, and after installation it starts with systemd.
After activation, the entropy number was 1401.
Once Ansible is ready, simply create an Arm EC2 instance, run Ansible to check that it works, and add it to the target group. If there are no problems, you can simply remove the x86_64 EC2 instance and switch over without any maintenance.
result
Although the production environment is yet to be implemented, we have switched all of our development environments to Arm. This time, we switched from m5.large to t4g.medium.
In terms of price per hour
- m5.large: $0.124/h
- t4g.medium: $0.0432/h
Note that the price difference is about three times as much.
performance
The red vertical line is after the switch.
ALB Response Time
The grey line is m5.large

Load Average
Blue: 1 minute / Purple: 5 minutes / Yellow: 15 minutes

Although the price difference between the m5.large and the t4g.medium is three times that of the m5.large, it is safe to say that there is no difference in performance. The development environment has 26 EC2 servers, so if we simply calculate the cost of switching to Arm,
0.124 * 720 * 26 = $2321.28/month0.0432 * 720 * 26 = $808.704/month
It will be.
Switch only the development environment without performance degradationOver 1.8 million yen per yearIn a production environment, we will be switching from m5.(x)large to m6g.(x)large, so the price difference will be smaller, but as this article explains, the performance difference is even greater, so I think it is well worth it.https://blog.luispc.com/entry/2020/05/18/190613
SRG is looking for people to work with us! If you're interested, please contact us here!