[Addendum] Pitfalls of Online DDL in Amazon Aurora MySQL Version 3 and How to Avoid Them

This is Onkai Yuta (@fat47) from the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article explains the behavior of Amazon Aurora MySQL Version 3 where, when using online DDL (INPLACE) to add a column, for example, the table becomes invisible from the reader endpoint while the DDL is being executed, and how to avoid this issue.
 
 

Summary and conclusion first


💡
・Aurora MySQL Version 33.04.1In the following environment:When you execute DDL such as ALTER TABLEThe reader instance may not be able to see the table until execution is complete. - There are three workarounds. A. Upgrade to Aurora MySQL Version 3.04.2 or later.
- Or upgrade to Version 3.05 or higher. B. When executing DDLSpecify C. Use Percona's pt-online-schema-change
 
[Update] Aurora MySQL Version 3.04.2 will be released on March 15, 2024
The behavior of the above online DDL isThis has been fixed with the release of 3.04.2

About the incident


When DDL is being executed on a table, the table becomes invisible to the reader instance.

ALTER TABLE
Once the ALTER or other operations on the writer side are completed, the table will again be visible from the reader endpoint.
 
I also tried the same query with Aurora MySQL Version 2.11.4, and there were no problems with SELECTs to the reader endpoint even while DDL was being executed.
 
Similar incidents have been observed, as reported on GMO's blog.
According to this blog post, this issue occurred in an Aurora MySQL 3.02.2 environment.

What we currently know

  • Aurora MySQL Version 3 series 3.04.1 series or belowOccurs in some versions of
    • The occurrence cannot be confirmed in Version 2.11.4
    • This has been resolved in Version 3.05.
 

Reproduction verification


Reproducible with Aurora MySQL Version 3.04.1 (compatible with MySQL 8.0.28)

Create a suitable table
 
Insert approximately 100,000 to 1 million rows into the table
 
OPTIMIZE TALBE
 
While the above is running, open another terminalReader EndpointWhen I run a SELECT query on the table, I get an error that the table does not exist.
 
If you stop the execution of an ongoing OPTIMIZE TABLE or wait for it to complete before executing it on the reader endpoint,
Again the SELECT query is successful.
 
This can also occur due to OnlineDDL for ALTER TABLE.
In this queryis specified.
 
As before, while the ALTER is being executed, the table will not be visible when you execute a SELECT on the reader endpoint.

Workaround 1: Upgrade to Aurora MySQL 3.05 or later


First, as a workaround, use the latest version of Aurora MySQL.3.05 or higherThere is a way to upgrade to.
The release notes for Aurora MySQL 3.05 (compatible with MySQL 8.0.32) state that this issue has been fixed.
INPLACE
Fixed an issue where a reader instance could not open a table, resulting in ERROR 1146. This issue occurs when executing certain types of online Data Definition Language (DDL) when the writer instance uses the INPLACE algorithm.
 
The actual environment where the problem occurredv3.05.1After upgrading to
OPTIMIZE TABLE
 
However, you need to be careful when using v3.05.
The standard support period for v3.05 isAbout 1 year laterofJanuary 2025Therefore, you will need to upgrade to the next version (not currently available) within one year.
The standard support period for v3.04, which is provided as an LTS for Aurora MySQL, isOctober 2026So, I would have preferred to choose this option if possible, but if you cannot accept the current behavior, v3.05 may be an option.
AWS公式ドキュメント Aurora Version Policyより引用
AWS official documentation Aurora Version PolicyQuoted from

Workaround 2: When executing ALTERSpecify


Another workaround is to execute ALTER. We confirmed that by specifying this and executing it, the issue can be avoided because the INPLACE method is not used.
 
There are three methods for ALTER.
  1. INSTANT
  1. INPLACE
  1. COPY
Each method has the following characteristics as shown in the table below.
methodAn example of possible operationsrock
INSTANTAdd columnExclusive metadata lock only. Since it is only a metadata update, execution is completed immediately.
INPLACESecondary index creation, index renamingExclusive metadata lock, momentary only at start and completion
COPYChanging the data type of a columnExclusive metadata lock from start to finish
If no ALGORITHM is specified for the ALTER statement, the executable method will be automatically selected in the order INSTANT → INPLACE → COPY.
Please refer to the official documentation here to see which methods are available for ALTER statement operations.
 
OPTIMIZE TABLE is performed inplace except for temporary conditions.
 
In this verification example, we are verifying the addition of a column, but in reality, when adding a column, this can be avoided by running it as INSTANT, excluding some conditions.
However, if you want to execute a type of DDL that cannot be executed in INSTANT,must be specified.
As shown in the table above,The fatal drawback of this is that a lock is placed on the table from the start of ALTER execution until it is completed.
This means that a write lock will be placed on the relevant table during that time, so unless you stop write queries to MySQL by putting the system into maintenance mode, users will be affected.
If you absolutely cannot upgrade Aurora MySQL from your current environment, please try thisI think I'll have to use the method below.
 
In addition,However, depending on the load on the environment, problems may arise due to the acquisition of an exclusive metadata lock when deleting the old table.

Workaround 3: Use pt-online-schema-change


The next workaround provided by Percona isPercona ToolkitIncluded in
pt-online-schema-change(hereinafter referred to as pt-osc) is used.
A detailed explanation of this tool was given by me a few years ago.Write on a tech blogSo, please refer to that.
 
When you execute ALTER on table A in pt-osc, the behavior is as follows:
  1. Create a work table "_tableA_new" with the same structure as "tableA".
  1. Execute ALTER TABLE on "_tableA_new"
  1. Create a trigger so that inserts, deletes, and updates to "Table A" are reflected in "_Table A_new"
  1. Copy all records from "Table A" to "_Table A_new"
  1. Rename a table to replace it
    1. "Table A" to "_Table A_old"
    2. "_TableA_new" to "TableA"
  1. Delete the created triggers and "_tableA_old"
 
Below is the execution log that is output when pt-osc is actually executed.
You can see that the process is performed exactly as described.
The reason why this issue can be avoided by using pt-osc is that the pt-osc mechanism
1. Create a work table "_tableA_new" with the same structure as "tableA." 2. Execute ALTER TABLE on "_tableA_new."
The processing flow is as follows:Because ALTER is executed on a different table called "_tableA_new" instead of "table A".is.
Therefore, even if INPLACE processing is running in the background, it will not affect the existing table A.
Also, executing ALTER will return "_tableA_new", and is completed instantly.
 
Since pt-osc is a third-party tool, AWS probably won't recommend it as a workaround, but it might be a good idea to keep it in mind as a workaround if you can't upgrade your Aurora MySQL version.
 
However, even when using this pt-osc, a metadata lock will be placed on the original table when creating a trigger or swapping tables.
This issue may occur depending on the load on the environment.

Conclusion


Until recently, I had only been using the Aurora MySQL Version 2 environment, so I was very surprised by the behavior of Online DDL in Version 3.
I think there will be some people who will be upgrading from Version 2 in the future, so I think it is important to be very careful about this behavior.
 
Hopefully, an LTS version of Aurora MySQL will come with an improvement to this behavior!
Thank you AWS!!!
 
SRG is looking for people to work with us. If you're interested, please contact us here.

Reference URL