A memo on changing the character code of Aurora MySQL from utf8 to utf8mb4

This is Onkai Yuta (@fat47) from the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article describes how to convert the character code from utf8 to utf8mb4 using the RDS BlueGreen Deployments function.
I hope this helps in some way.
 

utf8 ( utf8mb3 )


The default character code for Aurora MySQL Version 3 (compatible with MySQL 8.0) is utf8mb4, but for Version 2 (compatible with MySQL 5.7) it is latin1.
I don't think there are many places that use latin1, but I think there are actually quite a lot of places that specify utf8 (an alias for utf8mb3).
 
In MySQL 8.0, the use of utf8 aliases is deprecated.
In the future, the alias for utf8 may be changed to utf8mb4.
 

Identify tables and columns that are set to utf8


If there is anything set as utf8_** in the above query, it will be targeted.

If you use ALTER TABLE normally, it will take a table block.


If you want to convert the character code using ALTER TABLE, you will need to use a query like the one below.
However, be careful with this query as it will lock the table.
For tables with a huge number of records, it may take more than a whole day to run.
During this time, updates to the table will be locked, so you will need to put the service into maintenance mode or similar to stop writes.

Use the RDS Blue/Green Deployment feature


We have covered the Blue/Green Deployment feature several times on this blog, so please refer to those for more details.
 
To put it simply, this function creates a Green cluster based on the currently running cluster (Blue) as shown in the diagram below, establishes replication between Blue and Green, and manages the rewriting of endpoints through switchover operations.
AWS公式ドキュメントより引用
Quoted from the official AWS documentation
 
By changing the character code for the Green cluster generated with this function, you can make the change without affecting the original Blue environment.
However, there are some settings you need to be careful of here.
 
If you change the character code for Green as is, the character codes for the tables in Blue and Green will be different.
ROW
The data types of the columns on the Blue and Green sides must match exactly.
*The official AWS documentation recommends ROW
 
If replication is performed with the default settings and column data types are different, the following replication error will occur and replication will be stopped.
*This error occurred in the case of Aurora, but in the case of normal MySQL replication, it should stop due to a type conversion error.
 

slave_type_conversions allows type conversion


In Aurora MySQL version 2, slave_type_conversions,
replica_type_conversions in version 3
You can specify the value in your cluster parameter group to control whether type conversions are allowed.
 
The following types of conversion modes are available:
modeAcceptable contentexample
ALL_LOSSYAllow lossy conversions (smaller types)INT → TINYINT
ALL_NON_LOSSYAllow lossy conversions (growth conversions)TINYINT → INT
ALL_LOSSY、ALL_NON_LOSSYAll type conversions are allowed-
ALL_SIGNEDTreat promoted integer types as signed values (default)-
ALL_UNSIGNEDTreat promoted integer types as unsigned values-
ALL_SIGNED、ALL_UNSIGNEDTreat promoted integer types as signed if possible, unsigned otherwise-
 

The official documentation contains some disturbing statements...


The official documentation states the following:
Supported conversions.  Supported conversions between different but similar data types are shown in the following list:
  • Between any of the string types , and , including conversions between different widths.
    • TEXT

      Important
      Replication between columns using different character sets is not supported.
Supported Conversions. The following list shows the supported conversions between different but similar data types:
Between CHAR, VARCHAR, or TEXT string types, including conversions between different widths.
Conversions from CHAR, VARCHAR, or TEXT to CHAR, VARCHAR, or TEXT columns of the same or larger size are never lossy. Lossy conversions are handled by inserting only the first N characters of the string on the replica.

important Replication between columns using different character sets is not supported.
Replication between columns that use different character sets is not supported...? No...?

Anyway, verify whether replication stops when this setting is enabled.


  1. ALL_NON_LOSSY
  1. Create a test table with utf8 character encoding.
    1. I'll try to insert it properly.
      1. Create a Green cluster using the B/G Deployment feature
      1. Execute ALTER TABLE on the generated Green endpoint.
        1. Let's check the tables for Blue and Green.
          1. Blue
            1. Green (properly changed to utf8mb4)
            1. Let's try inserting in the Blue environment.
              1. Verify that the replication status for the Green environment is Yes for both instances.
                1. Check if the data is reflected in the Green environment
                  This confirms that replication is not stopped.
                  All you have to do is switch between Blue and Green and the solution will be complete.
                   
                  What specific patterns are not allowed according to the sentence in the official MySQL documentation?
                  As a side test, I also tried changing the table character code on the Blue side to latin1 and the Green side to utf8mb4, and this also replicated without any problems.
                   

                  What happens if you change the character code of Green without entering any settings?


                  1. replica_type_conversions
                  1. Insert into Blue
                    1. When I check the replication status of Green, it has stopped with an error.

                      Reference URL


                      Conclusion


                      There was a disturbing sentence in the official MySQL documentation, but I was able to successfully convert the character code.
                      I tried changing the character code by specifying the column, and also changed the type size at the same time as changing the character code, and then tested replication, but
                      All patterns were successful with no replication errors.
                      Once the upgrade to Aurora MySQL Version 3 is complete, we would like to support this character code as well.
                       
                      SRG is looking for people to work with us. If you're interested, please contact us here.