Adding an index to Aurora MySQL failed due to local storage exhaustion
This is Onkai Yuta (@fat47) from the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article summarizes an issue where adding an index using ALTER online DDL (INPLACE) in an Aurora MySQL Version 2 environment failed due to the RDS instance's local storage being depleted.
Adding an index using online DDLHowever, ALTER execution failed due to an error.Why it failsCountermeasuresIncrease the instance size to increase the local storage capacitySpecify ALGORITHM=COPY as the algorithm when executing ALTER TABLEUse the RDS B/G Deployments feature to add an index with ALGORITHM=COPY only to the Green cluster.Conclusion
Adding an index using online DDL
In an environment running Aurora MySQL Version 2, I added an index to a large table.
Adding a secondary index can be performed using online DDL (ALGORITHM=INPLACE), so ALTER can be performed without locking writes.
However, ALTER execution failed due to an error.
However, after waiting for a while, the following error message was displayed without completing the process.
When I check the error log of the RDS instance, I see the following output:
I am getting an error saying there is not enough space.
Why it fails
The cluster volume used to store Aurora DB data will automatically expand in size up to 128 TiB.
However, this time the disk space shortage is not in the cluster volume, but in the area called "local storage" that the RDS instance has.
This is a storage area with a fixed capacity for each RDS instance size.
Here is a table from the above document:
DB instance class | Maximum temporary/local storage available (GiB) |
db.r6g.8xlarge | 640 |
db.r6g.4xlarge | 320 |
db.r6g.2xlarge | 160 |
db.r6g.xlarge | 80 |
db.r6g.large | 32 |
The environment used this time is db.r6g.large, so the local storage size is32GiBIt will be.
When you run the command, check the "FreeLocalStorage" metrics in CloudWatch.

We then discovered that an error occurred when the value approached 0 to a certain extent.
*Since ALTER was executed twice, this metric shows two times when it approaches 0.
When creating a secondary index with ALGORITHM=INPLACE in online DDL, a log file is created that records the DML that was performed during the DDL execution,
Temporary files such as sort files are created. These temporary files use local storage, and it seems that the query error occurred when there was insufficient space.
Countermeasures
There are three possible ways to deal with this:
- Increase the instance size to increase the local storage capacity
- Specify ALGORITHM=COPY as the algorithm when executing ALTER TABLE
- Use the RDS B/G Deployments feature to add an index with ALGORITHM=COPY only to the Green cluster.
Increase the instance size to increase the local storage capacity
This time I dealt with it this way.
By increasing the size from db.r6g.large to db.r6g.xlarge, the local storage capacity80GiBIt has become.
When I ran the same ALTER again in that environment, it completed without running out of local storage.

The algorithm for ALTER TABLE execution isALGORITHM=COPYDesignated as
Running ALGORITHM=COPY will help prevent local storage exhaustion.
However, since a table lock is required until execution is complete, update DML cannot be accepted at the same time.
I decided to postpone this this time because I needed to put the system into maintenance mode, etc., so that it would be okay if other update queries were stopped.
Use the RDS B/G Deployments feature to deploy only to the Green cluster.ALGORITHM=COPYAdd INDEX
This method involves applying the above method to a Green cluster created using the RDS Blue/Green Deployments function, and then switching between Blue and Green.
In this case, there is no data reference from the service in the Green cluster, so it is not a problem if the table is locked until execution is complete.
However, please note that there will be a downtime of about one minute when switching between B/G.
Conclusion
Online DDL may fail due to limitations imposed by the Aurora MySQL specifications.
It is important to be familiar with the limitations of local storage and learn to live with them.
SRG is looking for people to work with us.
If you're interested, please contact us here.