Considering a rollback after upgrading the Aurora cluster version using the RDS B/G Deployments feature.

This is Yuta Kikai (@fat47) from the Service Reliability Group (SRG) of the Media Management Division.
#SRGThe Service Reliability Group primarily provides comprehensive support for the infrastructure surrounding our media services, focusing on improving existing services, launching new ones, and contributing to open-source software (OSS).
This article investigates whether it's possible to revert to a previous version of Aurora after upgrading using RDS Blue/Green Deployments.
I hope this is of some help.
 
 

First summary


  • Is there a way to revert to a previous version even after upgrading with Blue/Green Deployments?
    • In this test, we were able to successfully revert to the original pattern in 2 out of 3 cases.
  • Regardless of the rollback method, service maintenance that stops writing to the database is essential.
    • [September 2024 Update] Some methods have been updated and maintenance is no longer required.
  • It's more practical to fix the issue so that it can be handled in the new version rather than rolling back to an earlier version.
 

[Added September 2024]


Our article was featured on Timee's tech blog.
 
According to this article, AWS posted a blog about RDS Blue/Green Deployment in August 2024, and in it...The binlog position at the time of B/G switching is now being output.It has come to light that...

Version upgrade via Blue/Green Deployments


This is a managed feature that allows you to create a copy of the cluster, link it to the replication, and switch between clusters with a single button click.
For more details, please refer to the following blog post from the time of the Blue/Green Deployments release.
 
This feature allows you to create a Green cluster (Aurora MySQL 3) from an existing cluster (Aurora MySQL 2) and switch over to enable version upgrades.
 
However, if you use this feature to upgrade, it will be difficult to revert to the original Aurora Version 2 later if a critical issue is discovered.
 
Nevertheless, we investigated whether there was any way to prune it back.

We will test three pruning patterns.


  • How to perform reverse replication from Green (MySQL 8.0) to Blue (MySQL 5.7)
  • How to take a logical backup from Green (MySQL 8.0) and apply it to Blue (MySQL 5.7)
  • How to generate the difference from the binlog of Green (MySQL 8.0) at a quiescent point and apply it to Blue (MySQL 5.7)
 

Creating a verification environment


  • Creating a cluster in Auora Version 2.0
  • Create a table for verification and add records.
    • Database & Table Creation
      • Add record
      • Creating a Green (MySQL 8.0) cluster using the Blue/Green Deployments feature
      • Extend the binlog retention period to 7 days in both the Green (MySQL 8.0) and Blue (MySQL 5.7) clusters.

        How to perform reverse replication from Green (MySQL 8.0) to Blue (MySQL 5.7)


        overview

        Even after switching between B/G, the old Blue (MySQL 5.7) cluster will not be automatically deleted.
        We will use the old Blue (MySQL 5.7) as a rollback environment.
        This is how to manually establish replication using Green (MySQL 8.0) as the source after the switchover.

        Verification Procedure

        • Stop writing to the database by putting it into service maintenance mode.
          • [Update] Due to a feature update, stopping database writes is no longer necessary.
        • B/G switching executed.
        • [Update] Check the position when switching.
          • Select B/G
          • Sort the recent events section by time to check the binlog position.
        • Creating a replica user in Green (MySQL 8.0)
          • Starting replication from the old Blue (MySQL 5.7) to Green (MySQL 8.0)
            • Checking replication status with the old blue (MySQL 5.7)
              • Adding a record for testing in Green (MySQL 8.0)
                • Checking replication status with the old blue (MySQL 5.7)
                  • I checked the records in the old blue (MySQL 5.7) to confirm that replication was working correctly.

                    How to take a logical backup from Green (MySQL 8.0) and apply it to Blue (MySQL 5.7)


                    overview

                    This method involves taking a logical backup using Green (MySQL 8.0) and then restoring it to the old Blue (MySQL 5.7) cluster. It's the most time-consuming but reliable method.

                    Verification Procedure

                    • B/G switching executed.
                    • Deleting B/G roles
                    • After switching, test adding records to Green (MySQL 8.0).
                    • Disable database writing by entering service maintenance mode, etc.
                    • Restored using the old Blue (MySQL 5.7)
                      • I checked the record contents of the old Blue (MySQL 5.7) database. I was able to confirm that it contains the content added to Green (MySQL 8.0).
                        • Delete Green (MySQL 8.0) instances and clusters.
                        • -old1
                        • Service resumes after exiting service maintenance mode.

                        How to generate the difference from the binlog of Green (MySQL 8.0) at a quiescent point and apply it to Blue (MySQL 5.7)


                        ⚠️
                        ↓↓ Please note that this method is unlikely to be feasible ↓↓

                        overview

                        During maintenance when database writes are stopped, we will switch between B/G and, after the switch, check the binlog position on the Green (MySQL 8.0) side.
                        During a rollback, you could generate an update difference SQL file covering the position from the current position to the latest position, and then apply that to the Blue (MySQL 5.7) cluster.That method seems difficult.

                        Verification Procedure

                        • Stop database writing in service maintenance mode, etc.
                        • B/G switching executed.
                        • Deleting B/G roles
                        • Check and note down the current position in Green (MySQL 8.0).
                          • Assuming that service maintenance has been lifted, I will run some update queries.
                            • To perform a rollback, stop writing to the database again using service maintenance mode or similar methods.
                            • Check Green's current position after writing has stopped.
                              • Obtain the Green binlog (operate on the operator server where MySQL Client is installed).
                                • The above script outputs a binlog file to /tmp on the operation server.
                                • Use the mysqlbinlog command to generate a recovery SQL file.
                                  • Checking the status of tables in the old Blue (MySQL 5.7)
                                    • Of course, I confirmed that the three INSERT statements were not reflected.
                                  • When I tried to run recovery.sql into the old Blue (MySQL 5.7), I got an error saying that SUPER privileges were required.
                                    • Line 9 of recovery.sql, which is causing the error, is the part that updates using ROW.
                                      • --base64-output=DECODE-ROWS
                                        • If the number of cases is small,You can remove the commented-out INSERT and UPDATE statements by editing or replacing the differential SQL.However, using rollback in a production environment seems difficult due to the large number of differences.
                                       

                                      Side note

                                      • An error occurs when trying to retrieve the binlog of an Aurora MySQL 3 (MySQL 8.0) cluster using a MySQL 5.7 client.
                                       
                                      • Attempting to forcibly generate SQL using the mysqlbinlog of a MySQL 5.7 client with the binlog of an Aurora MySQL 3 (MySQL 8.0) cluster, obtained from a MySQL 8.0 client, will result in an error.
                                       
                                      • SUPER privileges were requested in three places in this update differential SQL file when restoring to Aurora Version 2 (MySQL 5.7).
                                       
                                      • In Aurora Version 3 (MySQL 8.0), the SUPER privilege is deprecated, and the required privileges have changed.
                                        • commandRequired permissions
                                          BINLOG 'xxxxxxxxxx';SUPER, BINLOG_ADMIN or REPLICATION_APPLIER
                                          SET @@SESSION.GTID_NEXT='AUTOMATIC'SUPER, SYSTEM_VARIABLES_ADMIN, SESSION_VARIABLES_ADMIN or REPLICATION_APPLIER
                                          SET @@session.pseudo_thread_id=xxxxx/SUPER, SYSTEM_VARIABLES_ADMIN or SESSION_VARIABLES_ADMIN
                                        • SESSION_VARIABLES_ADMIN
                                       

                                      Reference URL


                                      In conclusion


                                      I've discovered that if we can put the service into maintenance and stop writing to the database, there's a way to somehow roll back the service.
                                      We want to propose appropriate upgrade plans tailored to your service requirements and other factors.
                                       
                                      SRG is looking for new team members. If you are interested, please contact us here.