Considering rollback after upgrading an Aurora cluster using the RDS B/G Deployments feature

This is Onkai Yuta (@fat47) from the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article examines whether it is possible to roll back an Aurora upgrade using RDS Blue/Green Deployments.
I hope this helps in some way.
 
 

Summary first


  • Is there a way to roll back an upgrade using Blue/Green Deployments?
    • In this test, two out of three patterns were successfully reverted.
  • Regardless of the rollback method, service maintenance that stops DB writing is necessary
    • [Updated September 2024] Some methods have been updated and maintenance is no longer required.
  • It is more practical to fix it so that it can be handled in the new version rather than rolling back the version.
 

[Updated September 2024]


Our article was featured on Timee's tech blog.
 
This article led to an AWS blog post on RDS Blue/Green Deployment in August 2024, in whichThe binlog position at the time of B/G switching is now output.It was discovered that...

Version upgrades via Blue/Green Deployments


This is a managed feature that allows you to create a copy of a cluster, replicate it, and switch between clusters with the push of a button.
For more information, please refer to the blog below, which was released at the time of the Blue/Green Deployments release.
 
Using this function, you can create a Green cluster (Aurora MySQL 3 series) from an operating cluster (Aurora MySQL 2 series) and switch over to it, enabling you to upgrade.
 
However, if you use this function to upgrade, it will be difficult to revert to the original Aurora Version 2 if you later discover a critical issue.
 
However, we still checked to see if we could somehow cut it back.

Three cutting back patterns were tested


  • How to reverse replicate from Green (MySQL 8.0) to Blue (MySQL 5.7)
  • How to take a logical backup from Green (MySQL 8.0) and apply it to Blue (MySQL 5.7)
  • How to generate differentials from the binlog at the rest point of Green (MySQL 8.0) and apply them to Blue (MySQL 5.7)
 

Creating a verification environment


  • Creating a cluster for Auora Version 2
  • Creating a table for verification and adding records
    • Database and table creation
      • Add Record
      • Creating a Green (MySQL 8.0) cluster using the Blue/Green Deployments feature
      • Increase the binlog retention period to 7 days on both the Green (MySQL 8.0) and Blue (MySQL 5.7) clusters.

        How to reverse replicate from Green (MySQL 8.0) to Blue (MySQL 5.7)


        overview

        Even if you switch B/G, the old Blue (MySQL5.7) cluster will not be deleted automatically.
        The old Blue (MySQL 5.7) was used as a rollback environment.
        This is a method to manually replicate using the switched Green (MySQL 8.0) as the source.

        Verification Procedure

        • Stop writing to the database by entering service maintenance mode, etc.
          • [Additional note] Due to a functional update, it is no longer necessary to stop writing to the database.
        • B/G switching execution
        • [Additional note] Check the position when switching.
          • Select B/G
          • Sort the Recent Events column by time to see the binlog position
        • Creating a replica user on Green (MySQL 8.0)
          • Starting replication from old Blue (MySQL 5.7) to Green (MySQL 8.0)
            • Check replication status with old blue (MySQL5.7)
              • Add a record to Green (MySQL8.0) to check operation
                • Check replication status with old blue (MySQL5.7)
                  • Check the records on the old blue (MySQL 5.7) and confirm that replication is working properly

                    How to take a logical backup from Green (MySQL 8.0) and apply it to Blue (MySQL 5.7)


                    overview

                    This is the most time-consuming but reliable method, which involves taking a logical backup of the Green (MySQL 8.0) cluster and then re-installing it into the old Blue (MySQL 5.7) cluster.

                    Verification Procedure

                    • B/G switch execution
                    • Delete B/G roll
                    • After switching, test adding records to Green (MySQL8.0)
                    • Stop writing to the database by entering service maintenance mode, etc.
                    • Restore with old Blue (MySQL5.7)
                      • Check the record contents of the old Blue (MySQL 5.7) and confirm that the contents added to Green (MySQL 8.0) are included.
                        • Delete the Green (MySQL 8.0) instance and cluster.
                        • -old1
                        • Service maintenance mode is released and service resumes

                        How to generate differentials from the binlog at the rest point of Green (MySQL 8.0) and apply them to Blue (MySQL 5.7)


                        ⚠️
                        ↓↓ Please be careful as this method is unlikely to be feasible ↓↓

                        overview

                        During maintenance when DB writing is stopped, switch B/G, and after the switch, check the binlog position on the Green (MySQL 8.0) side.
                        When rolling back, you can generate an update differential SQL file from that position to the latest position and apply it to the Blue (MySQL 5.7) cluster to roll back.This is a method that seems difficult.

                        Verification Procedure

                        • Stop writing to the database in service maintenance mode, etc.
                        • B/G switch execution
                        • Delete B/G roll
                        • Check the current position of Green (MySQL8.0) and make a note of it.
                          • Assuming the service maintenance has been lifted, run some update queries.
                            • To perform the rollback, stop DB writing again using service maintenance mode, etc.
                            • Check the current position of Green after writing has stopped
                              • Obtain the Green binlog (operate on an Ope server with MySQL Client installed)
                                • The above script outputs a binlog file to /tmp on the ope server.
                                • Generate a recovery SQL file using the mysqlbinlog command
                                  • Check the status of tables in the old Blue (MySQL 5.7)
                                    • Of course, I confirmed that the three INSERTs were not reflected.
                                  • When I input recovery.sql into the old Blue (MySQL 5.7), an error occurred saying that SUPER privileges were required.
                                    • The 9th line of the error recovery.sql is the ROW update part
                                      • --base64-output=DECODE-ROWS
                                        • If the number is small,You can just remove the commented out INSERT and UPDATE statements by editing or replacing the differential SQL.However, there are many differences and it seems difficult to roll back to a production environment.
                                       

                                      A side note

                                      • When trying to obtain the binlog of an Aurora MySQL 3 series (MySQL 8.0) cluster using a MySQL 5.7 client, an error occurs.
                                       
                                      • Even if you try to forcefully generate SQL using the binlog of an Aurora MySQL 3 series (MySQL 8.0) cluster obtained with the MySQL 8.0 client using the mysqlbinlog of the MySQL 5.7 client, an error occurs.
                                       
                                      • There are three places in this update differential SQL file where the SUPER privilege is required when restoring to Aurora Version 2 (MySQL 5.7).
                                       
                                      • In Aurora Version 3 (MySQL 8.0), the SUPER privilege has been deprecated and the required privileges have changed.
                                        • commandRequired privilege
                                          BINLOG 'xxxxxxxxxx';SUPER, BINLOG_ADMIN or REPLICATION_APPLIER
                                          SET @@SESSION.GTID_NEXT='AUTOMATIC'SUPER, SYSTEM_VARIABLES_ADMIN, SESSION_VARIABLES_ADMIN or REPLICATION_APPLIER
                                          SET @@session.pseudo_thread_id=xxxxx/SUPER, SYSTEM_VARIABLES_ADMIN or SESSION_VARIABLES_ADMIN
                                        • SESSION_VARIABLES_ADMIN
                                       

                                      Reference URL


                                      Conclusion


                                      We found that if we could put the service into maintenance and stop database writes, there was a way to somehow restore it.
                                      We would like to propose an appropriate upgrade plan based on service requirements, etc.
                                       
                                      SRG is looking for people to work with us. If you're interested, please contact us here.