Switching to ARM on Amazon EC2 has brought nothing but happiness.

Hasegawa from the Service Reliability Group (SRG) of the Technology Division@rarirureluis is
SRGThe Service Reliability Group primarily provides comprehensive support for the infrastructure surrounding our media services, focusing on improving existing services, launching new ones, and contributing to open-source software (OSS).
 

Introduction


Many people have heard the term "ARM" frequently in relation to the Apple M1, and were surprised by its performance. The benchmark results for Macs equipped with the Apple M1 areHere
And Amazon EC2 (hereinafter: EC2) also has instances equipped with Arm.https://aws.amazon.com/jp/ec2/graviton/
This time, I was able to migrate the entire development environment of a certain service from m5.large to t4g.medium EC2 instances, and it made me very happy. I hope that by reading this article, you will be able to understand the advantages of Arm instances, even just a little.

What's so great about AWS Graviton?


EC2 instances powered by Arm processors offer better performance at a lower price than instances powered by traditional Intel/AMD processors, making them very cost-effective.
I also wrote an article six months ago comparing the m6g.large, which features an ARM processor, with the older m5.large.https://blog.luispc.com/entry/2020/05/18/190613

Making it compatible with ARM, etc.


Since the Arm-compatible service is managed by Ansible and Terraform, we'll start by enabling Arm compatibility in Ansible.

From x86_64 to AArch64 Ansible


First, the key is whether the middleware and other components support AArch64. Note that some middleware has version differences between x86_64 and AArch64.
The middleware that needed modification in this project was the following:
  • Nginx
  • td-agent
The service we recently made ARM compatible had a well-maintained Ansible setup, so we were able to implement ARM compatibility using Ansible in less than a business day. I'm incredibly grateful to the previous person in charge.

Nginx

The official Nginx repository provides packages for AArch64 starting from CentOS 8, but since this is CentOS 7, I used the EPEL version instead.

td-agent

The official td-agent repository includes an AArch64 package for CentOS 7 (from v4.x onwards).
We were able to implement AArch64 support without making any configuration changes to the middleware.

About entropy


In EC2's Arm architecture, very low entropy for random devices can sometimes lead to errors or significant performance degradation.
In this instance, the performance of the random number generation part using Apache Tomcat (Java) significantly decreased. This is because the random number generator uses `/dev/random`, and there are insufficient resources allocated to it.
When we check the entropy of t4g.medium
There were only 11.

haveged

The solution to this is `haveged`.https://wiki.archlinux.jp/index.php/Haveged
Installation is easy with yum, and after installation, it starts with systemd.
The entropy count after activation was 1401.
Once you've prepared Ansible, simply create an ARM-based EC2 instance, run Ansible to verify its operation, and add it to the target group. If there are no problems, you can simply remove the x86_64 EC2 instance and switch over without any maintenance.

result


We haven't set up the production environment yet, but we've switched our entire development environment to Arm. This time, we switched from m5.large to t4g.medium.
In terms of the price per hour
  • m5.large: $0.124/h
  • t4g.medium: $0.0432/h
Please note that the price difference is about three times.

performance


The red vertical line indicates the point after the switch.

ALB Response Time

The gray line is m5.large

Load Average

Blue: 1 minute / Purple: 5 minutes / Yellow: 15 minutes
Despite the price difference, with m5.large costing three times as much as t4g.medium, I think it's safe to say there's no performance difference. Since we have 26 EC2 instances in our development environment, a simple calculation of the Arm migration costs would be...
0.124 * 720 * 26 = $2321.28/month0.0432 * 720 * 26 = $808.704/month
This is the result.
Switching development environments only without performance degradationOver 1.8 million yen per yearWe were able to reduce costs. In the production environment, we will switch from m5.(x)large to m6g.(x)large, so the price difference will be smaller, but as this article explains, the performance difference is even greater, so I think it is well worth doing.https://blog.luispc.com/entry/2020/05/18/190613
SRG is looking for new team members! If you're interested, please contact us here!