Terraform: 84 tfstates and 5 repositories turned into a monorepo, and how to operate Terraform with Atlantis

Mr. Hasegawa (@rarirureluis)is.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article is an attempt to reconsider Terraform operations.
 
 

Before becoming a monorepo


The Terraform for this service has a completely separated structure for each environment and resource, as shown below.
 
There are five repositories with the directory structure above, and five AWS accounts.
We won't go into the merits and demerits of this directory structure here, but our goal is to reduce toil by consolidating all of this into a single repository and managing CI in a single place.
 

How do you operate Terraform?


This service operated Terraform using GitHub Actions or CodeBuild.
In this operation, when a PR is created, Plan is run, and if there are no problems, it is merged and Apply is executed.

Patterns where Apply fails

It is possible for the Plan to be successful but the Apply to fail.
For example, if the ECR repository is not empty or if deletion protection is enabled on EC2, ALB, etc.
When this situation occurred, I would repeatedly create a PR → Plan → Approve → Merge (Apply) to Plan/Apply again.
This is a toil that SREs cannot ignore.
 

Atlantis should have been introduced sooner...


Atlantis is a CI tool for Terraform.
 

The problem Atlantis solves

  • Prevents patterns where Apply fails
    • IssueOps, Branch DeploymentThe merged changes have been applied correctly.Can be created
  • Prevent conflicting state updates
    • You can see a list of state changes being made from currently open PRs.
Introducing Atlantis can solve many problems.
 

Prevents patterns where Apply fails

In Atlantis, all operations are done through comments on issues (PRs).
atlantis apply
This means that you will not encounter issues such as Apply failing after merging and having to create a PR to Plan/Apply again.
atlantis apply
 
This means that the merged changes are correctly applied, and in technical terms this is called "branch deployment."
The other day, GitHub was officially introduced on a blog.
 

Prevent conflicting state updates

Atlantis allows you to take a lock on state changes made through PRs.
This way, if another PR detects a change to that state, it will return an error stating that it is already locked.
The locked state can be viewed on the Atlantis web GUI, where you can also view the Terraform execution logs.
 

How to build Atlantis

This time we used the official Atlantis Terraform module.
 

How Atlantis works


Here's how Atlantis works.
Atlantis is triggered by webhooks from GitHub (it also supports other services), and communication between Atlantis and GitHub is carried out via an API.
Therefore, you will need a GitHub App or a Personal Access Token to build Atlantis.
 
In the Git flow, Plan runs when you create a PR.
At least one Approve is required before a PR can be applied, and once the Apply is complete, the PR will be automatically closed and the branch will be automatically deleted.
These policies can be specified in detail in atlantis.yaml.
 

One Atlantis to multiple AWS environments

Terraform supports Assume Role, so I configured it as follows.
 

Tips


Here are some other good points besides those already mentioned.

You can specify the Terraform version per directory.

This is a very good feature.
Even before the monorepo, the situation was completely separate, but thanks to this feature, there is no need to unify Terraform versions when migrating to a monorepo.

parallel_plan and parallel_apply

Atlantis automatically marks changed directories as changes, but if you add 40 directories at once, as in this case, it will take a considerable amount of time to Plan and Apply.
However, the option to run in parallel reduces latency considerably.
💡
If you try to run 40 or more at once, a timeout will occur. In this case, try increasing the vCPU or memory of the task.
 

Conclusion


Atlantis is open source, the Terraform used to build it is flexible, and Atlantis itself is simply designed so that it can be easily recreated even when in production.
We already had a Terraform CI environment in place, so we migrated to Atlantis, but there were only advantages and disadvantages.Monthly costs are slightly higher than GitHub Actions/CodeBuildThe process of migrating to a monorepo was hellish.
In terms of infrastructure costs, the Fargate 0.5vCPU/1GB specs are absolutely fine, so I think the cost is negligible.
Although we've introduced Atlantis a little late, why not take this opportunity to consider introducing it to your business?
SRG is looking for people to work with us. If you are interested, please contact us here.
 
SRG runs a podcast where we chat about the latest hot topics in IT and books. We hope you will enjoy listening to it while you work.