[Additional Note] Pitfalls of Online DDL in Amazon Aurora MySQL Version 3 and How to Avoid Them
This is Yuta Onkai (@fat47) from the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article explains the behavior of Amazon Aurora MySQL Version 3 where, when using Online DDL (INPLACE) to add a column, for example, the table becomes invisible from the reader endpoint while the DDL is being executed, and how to avoid this behavior.
Summary and conclusion firstAbout the incidentWhen DDL is being executed on a table, the corresponding table becomes invisible from the reader instance.What we know so farReproduction verificationReproduction confirmed with Aurora MySQL Version 3.04.1 (compatible with MySQL 8.0.28)Workaround 1: Upgrade to Aurora MySQL 3.05 or later.Workaround 2: Specify ALGORITHM = COPY when executing ALTER.Workaround 3: Use pt-online-schema-changeConclusionReference URL
Summary and conclusion first
・Aurora MySQL Version 33.04.1In the following environment:When you execute DDL such as ALTER TABLE、The table may not be visible from the reader instance until execution is complete.
- There are three ways to avoid this.
A. Upgrade to Aurora MySQL Version 3.04.2 or higher.
- Or upgrade to Version 3.05 or higher.B. When executing DDLSpecify
C. Use Percona's pt-online-schema-change
[Update] Aurora MySQL Version 3.04.2 will be released on March 15, 2024.
The behavior of the above online DDL isThis has been fixed by the release of 3.04.2.。
About the incident
When DDL is being executed on a table, the corresponding table becomes invisible from the reader instance.
ALTER TABLE
Once execution of ALTER or other operations on the writer side is completed, the table will be visible from the reader endpoint again.
I tried the same query with Aurora MySQL Version 2.11.4, and there were no problems with SELECTing from the reader endpoint even while DDL was being executed.
Similar incidents have been observed, as reported on GMO's blog.
According to this blog post, this issue occurred in an Aurora MySQL 3.02.2 environment.
What we know so far
- Aurora MySQL Version 3 series 3.04.1 series or lowerOccurs in some versions of
- The occurrence cannot be confirmed in Version 2.11.4.
- This has been resolved in Version 3.05.
- Occurs during INPLACE operations such as OPTIMIZE TABLE and ALTER TABLE.
- For DDL executed INPLACE, please refer to the official document below.
- https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html
Reproduction verification
Reproduction confirmed with Aurora MySQL Version 3.04.1 (compatible with MySQL 8.0.28)
Create a suitable table
Insert about 100,000 to 1 million rows into the table.
OPTIMIZE TALBE
While the above is running, start another terminalReader EndpointWhen I run a SLECT query on this, I get an error that the table doesn't exist.
If you cancel an ongoing OPTIMIZE TABLE or wait for it to complete before executing it on the reader endpoint,
Again the SELECT query succeeds.
This can also be caused by OnlineDDL for ALTER TABLE.
In this queryis specified.
As before, while the ALTER is in progress, if you run a SELECT on the reader endpoint, the table will not be visible.
Workaround 1: Upgrade to Aurora MySQL 3.05 or later.
First, as a workaround, use the latest version of Aurora MySQL.3.05 or higherThere is a way to upgrade to.
The release notes for Aurora MySQL 3.05 (compatible with MySQL 8.0.32) state that this issue has been fixed.
INPLACE
Fixed an issue where a reader instance was unable to open a table resulting in ERROR 1146. This issue occurs when executing certain types of online Data Definition Language (DDL) when the INPLACE algorithm is used on a writer instance.
The actual environment where the problem occurredv3.05.1After upgrading to
OPTIMIZE TABLE
However, care must be taken when using v3.05.
The standard support period for v3.05 isAbout 1 year laterofJanuary 2025Therefore, you will need to upgrade to the next version (not yet available) within one year.
The standard support period for v3.04, which is provided as the LTS for Aurora MySQL, isOctober 2026Therefore, I would have liked to choose this option if possible, but if you cannot accept the current behavior, v3.05 may be an option.

Workaround 2: When executing ALTERSpecify
Another workaround is to execute ALTER. We confirmed that by specifying this and executing the command, the INPLACE method is not used, thereby avoiding the issue.
There are three methods for ALTER.
- INSTANT
- INPLACE
- COPY
Each method has its own characteristics as shown in the table below.
method | An example of possible operations | rock |
---|---|---|
INSTANT | Add Column | Exclusive metadata lock only. Only metadata updates, so execution is immediate |
INPLACE | Create secondary index, rename index | Exclusive metadata lock, momentary only at start and completion |
COPY | Changing the data type of a column | Exclusive metadata lock from start to finish |
If ALGORITHM is not specified for the ALTER statement, the available execution method will be automatically selected in the following order: INSTANT → INPLACE → COPY.
Please refer to the official documentation here to see which methods are available for ALTER statement operations.
OPTIMIZE TABLE is executed INPLACE except for temporary conditions.
In this verification example, we verify by adding a column, but in reality, adding a column can be avoided by executing it with INSTANT except for some conditions.
However, if you want to execute a type of DDL that cannot be executed in INSTANTMust be specified.
As shown in the table above,The fatal drawback of this is that a lock is placed on the table from the start of ALTER execution until its completion.
This means that during that time, a write lock will be placed on the relevant table, and unless you stop write queries to MySQL by putting the system into maintenance mode, users will be affected.
If you absolutely cannot upgrade Aurora MySQL from your current environment, please use thisI think I'll have to use the following method.
In addition,However, depending on the load of the environment, problems may arise due to the acquisition of an exclusive metadata lock when deleting old tables.
Workaround 3: Use pt-online-schema-change
「pt-online-schema-change(hereinafter referred to as pt-osc).
A detailed explanation of this tool was provided by me a few years ago.Write for Tech BlogSo, please refer to that.
When you execute ALTER on table A in pt-osc, the behavior is as follows:
- Create a working table "_tableA_new" with the same structure as "tableA".
- Execute ALTER TABLE on "_tableA_new"
- Create a trigger and insert/delete/update operations in "Table A" are reflected in "_Table A_new"
- Copy all records from "Table A" to "_Table A_new"
- Rename a table to replace it
- "Table A" to "_Table A_old"
- "_tableA_new" to "tableA"
- Delete the created trigger and "_table A_old"
Below is the execution log that is output when you actually run pt-osc.
You can see that the process is exactly as explained.
The reason why this issue can be avoided by using pt-osc is because of the mechanism of pt-osc.
1. Create a working table "_tableA_new" with the same structure as "tableA". 2. Execute ALTER TABLE on "_tableA_new".
The process will be as follows:Because ALTER is executed on a different table called "_tableA_new" instead of "tableA".is.
Therefore, even if INPLACE processing is running in the background, it will not affect the existing table A.
Also, executing ALTER will not work with table definitions that have no data._table A_new", and is completed instantly.
Because pt-osc is a third-party tool, I don’t think AWS will recommend it as a workaround, but it’s good to keep it in mind as a possible solution if you are unable to upgrade your Aurora MySQL version.
However, even when using this pt-osc, a metadata lock will be generated on the original table when creating a trigger or swapping tables.
This issue may occur depending on the load on your environment.
Conclusion
Until recently, I had only been using the Aurora MySQL Version 2 environment, so I was very surprised by the behavior of Online DDL in Version 3.
I think there will be some people who will be upgrading from Version 2, so I think it's important to pay close attention to this behavior.
Hopefully, an LTS version of Aurora MySQL will come with an improvement to this behavior!
AWS, please take care of this!!!
SRG is looking for people to work with us. If you are interested, please contact us here.