This is a story about creating a monorepo with 84 tfstate files and 5 repositories using Terraform, and my thoughts on Terraform operation with Atlantis.
#SRGThe Service Reliability Group primarily provides comprehensive support for the infrastructure surrounding our media services, focusing on improving existing services, launching new ones, and contributing to open-source software (OSS).
This article is a reflection on how to use Terraform effectively.
This is a story from before it turned into a product review.How do you manage your Terraform deployments?Patterns where Grant failsWe should have implemented Atlantis much sooner...Problems that Atlantis solvesThis prevents patterns in which Apply will fail.This prevents conflicts in updating the state.How to build AtlantisHow Atlantis worksFrom one Atlantis instance to multiple AWS environmentsTipsYou can specify the Terraform version for each directory.parallel_plan and parallel_applyIn conclusion
This is a story from before it turned into a product review.
In this service, Terraform has a completely isolated structure for each environment and resource, as shown below.
There are five repositories with the directory structure described above, and five AWS accounts.
I won't go into the pros and cons of this directory structure, but the goal this time is to reduce toil by consolidating all of these into a single repository and managing them with a single CI.
How do you manage your Terraform deployments?
This service was running Terraform using either GitHub Actions or CodeBuild.
In this workflow, when a pull request (PR) is created, a plan is run, and if there are no problems, it is merged and applied.
Patterns where Grant fails
The Plan may succeed, but the Apply operation may fail.
For example, this could happen if the ECR repository is not empty, or if deletion protection is enabled on EC2, ALB, etc.
In this situation, we repeatedly created a pull request (PR) to plan, approved, and merge (apply) the changes again.
This is a toil that SREs cannot ignore.
We should have implemented Atlantis much sooner...
Atlantis is a CI tool for Terraform.
Problems that Atlantis solves
- This prevents patterns in which Apply will fail.
- IssueOps, branch deploymentThe merged items are correctly applied.You can make
- This prevents conflicts in updating the state.
- You can see a list of states that are currently being modified by open pull requests.
Many problems can be solved by introducing Atlantis.
This prevents patterns in which Apply will fail.
In Atlantis, all operations are performed through comments on Issues (PRs).
atlantis applyIn other words, this eliminates the problem of the Apply command failing after a merge, which would then require creating a new pull request for Plan/Apply.
atlantis applyThis means that the merged changes are correctly applied, and in terms of technique, this is called a "branch deployment."
GitHub recently featured it in an official blog post.
This prevents conflicts in updating the state.
Atlantis allows you to acquire locks on state changes made by pull requests.
This means that if another pull request detects a change in that state, it will return an error indicating that it is already locked.
The locked state can be viewed on the Atlantis web interface, and the Terraform execution logs can also be viewed through this web GUI.
How to build Atlantis
This time, I used the official Terraform module from Atlantis.
How Atlantis works
Here's an explanation of how Atlantis works.
Atlantis is triggered by webhooks from GitHub (and other services), and communication between Atlantis and GitHub takes place via API.
Therefore, when building Atlantis, you will need either a GitHub App or a Personal Access Token.
In the Git flow, the Plan is run when a pull request is created.
We've made it so that at least one "Approve" is required to apply a change, and once the apply is complete, the PR is automatically closed and the branch is automatically deleted.
You can define these policies in detail in atlantis.yaml.
From one Atlantis instance to multiple AWS environments
Since Terraform supports Assume Role, I configured it as follows.
Tips
In addition to what I've already mentioned, here are some other good points.
You can specify the Terraform version for each directory.
This is a very good feature.
Even before the monorepo, the systems were completely separated, but thanks to this feature, there's no need to unify the Terraform versions when migrating to a monorepo.
parallel_plan and parallel_apply
Atlantis automatically updates changed directories, but when you add 40 directories at once like in this case, the Plan and Apply processes take a considerable amount of time.
However, the availability of parallel execution options significantly reduces waiting times.
When I tried to do 40 tasks at once, a timeout occurred.
In this case, try increasing the vCPU or memory allocated to the task.
In conclusion
Atlantis is open-source software, its Terraform architecture is flexible, and Atlantis itself has a simple design, making it easy to recreate even in production environments.
We migrated to Atlantis from an already established Terraform CI environment, and there were only advantages. The only downsides were that the monthly cost was slightly higher than GitHub Actions/CodeBuild, and the migration to a monorepo was a nightmare.
In terms of infrastructure costs, the Fargate 0.5vCPU/1GB specs are perfectly adequate, so I think the cost is negligible.
I've introduced Atlantis a bit late, but perhaps this is a good opportunity to consider adopting Atlantis?
SRG is looking for new team members.
If you are interested, please contact us here.
SRG runs a podcast where we chat about the latest hot IT technologies and books. We hope you'll enjoy listening to it while you work.
