A memo on changing the character encoding of Aurora MySQL from utf8 to utf8mb4.
This is Yuta Kikai (@fat47) from the Service Reliability Group (SRG) of the Media Management Division.
#SRGThe Service Reliability Group primarily provides comprehensive support for the infrastructure surrounding our media services, focusing on improving existing services, launching new ones, and contributing to open-source software (OSS).
This article describes how to convert character encoding from utf8 to utf8mb4 using the BlueGreen Deployments feature of RDS.
I hope this is of some help.
utf8 ( utf8mb3 )Identify tables and columns that are set to utf8.Using ALTER TABLE in the usual way will block the table.Let's use the RDS Blue/Green Deployment feature.slave_type_conversions that allow type conversionsThe official documentation contains some unsettling statements...In any case, I will verify whether enabling this setting will stop the replication.What happens if I change the character encoding of Green without configuring any settings?Reference URLIn conclusion
utf8 ( utf8mb3 )
Aurora MySQL Version 3 (MySQL 8.0 compatible) uses utf8mb4 as the default character encoding, while Version 2 (MySQL 5.7 compatible) uses latin1.
I don't think many places use latin1, but I suspect there are quite a few places that specify utf8 (an alias for utf8mb3).
In MySQL 8.0, the use of utf8 aliases is deprecated.
The alias for utf8 may be changed to utf8mb4 in the future.
Identify tables and columns that are set to utf8.
If any of the above queries are set to utf8_**, those will be the targets.
Using ALTER TABLE in the usual way will block the table.
When converting character encoding using ALTER TABLE, you would use a query like the one below.
However, this query will lock the table, so caution is advised.
For tables with a huge number of records, the process can sometimes take more than a full day.
During this time, updates to the table will be locked, so it will be necessary to put the service into maintenance mode or similar to stop writing.
Let's use the RDS Blue/Green Deployment feature.
I've covered the Blue/Green Deployment feature several times on this blog, so please refer to those posts for more details.
Simply put, as shown in the diagram below, this feature creates a Green cluster based on the currently running cluster (Blue), and manages the establishment of replication between Blue and Green, as well as the rewriting of endpoints through switchover operations.

By performing a character encoding change on the Green cluster generated using this feature, it is possible to make changes without affecting the original Blue environment.
However, there is a setting here that you need to be careful about.
If we proceed with changing the character encoding for Green, the character encoding of the tables will end up being different for Blue and Green.
ROW The data types of the columns on the Blue and Green sides must be exactly the same.
*AWS official documentation recommends ROW
If replication is performed with the default settings, where the column data types are different, a replication error like the one below will occur, and replication will stop.
*This error occurred in the case of Aurora, but in the case of regular MySQL replication, it should stop due to a type conversion error.
slave_type_conversions that allow type conversions
In Aurora MySQL version 2, slave_type_conversions,
In version 3, replica_type_conversions
The value is specified in the cluster parameter group to control whether type conversions are allowed.
The following are the modes of type conversion:
| mode | Acceptable content | example |
|---|---|---|
| ALL_LOSSY | Allow irreversible conversions (conversions that result in smaller data types). | INT → TINYINT |
| ALL_NON_LOSSY | Allow irreversible conversions (conversions that increase the type). | TINYINT → INT |
| ALL_LOSSY、ALL_NON_LOSSY | All type conversions are allowed. | - |
| ALL_SIGNED | Treat promoted integer types as signed values (default). | - |
| ALL_UNSIGNED | Treat promoted integer types as unsigned values. | - |
| ALL_SIGNED、ALL_UNSIGNED | Treat promoted integer types as signed if possible, and unsigned otherwise. | - |
The official documentation contains some unsettling statements...
The official document mentioned earlier contains the following statement:
Supported conversions. Supported conversions between different but similar data types are shown in the following list:
- Between any of the string types , , and , including conversions between different widths.
TEXT
ImportantReplication between columns using different character sets is not supported.Supported conversions. The following list shows the supported conversions between different but similar data types:Conversion between any of the string types CHAR, VARCHAR, or TEXT, including conversions between different widths.Converting from CHAR, VARCHAR, or TEXT to CHAR, VARCHAR, or TEXT columns of the same or greater size is never irreversible. The irreversible conversion is handled by inserting only the first N characters of the string on the replica.
importantReplication between columns using different character sets is not supported.
Replication between columns using different character sets is not supported...? Is that not possible...?
In any case, I will verify whether enabling this setting will stop the replication.
ALL_NON_LOSSY

- Create a test table using UTF-8 character encoding.
- I'll try inserting some random data.
- Create a Green cluster using the B/G Row function.
- Execute ALTER TABLE on the generated Green endpoint.
- Let's check the Blue and Green tables.
- Blue
- Green (it has been correctly changed to utf8mb4)
- Let's try running an INSERT statement in the Blue environment.
- Let's check that the replication status in the Green environment is set to Yes for both.
- Check if the data is reflected in the Green environment.
This confirms that replication has not stopped.
All that's left is to switch between Blue and Green, and the process will be complete.
What specific patterns are not allowed according to the single sentence in the official MySQL documentation?
As an extra test, I tried setting the table character encoding on the Blue side to latin1 and the Green side to utf8mb4, and the replication worked without any problems.
What happens if I change the character encoding of Green without configuring any settings?
replica_type_conversions
- Send an INSERT to Blue.
- Checking the Green replication status shows that it has stopped due to an error.
Reference URL
In conclusion
There was a worrying sentence in the official MySQL documentation, but I was able to successfully convert the character encoding.
I tried changing the character encoding by specifying the column, and also changed the type size at the same time as changing the character encoding, and then tested replication, but...
All patterns were successful without any replication errors.
Once the upgrade to Aurora MySQL Version 3 is complete, I would like to implement this character encoding support as well.
SRG is looking for new team members.
If you are interested, please contact us here.
