Struggling with timestamp column mismatch during DB migration using DMS

My name is Ohara and I work in the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article explains the timestamp-type time discrepancy problem that occurred during database migration using DMS and how to deal with it.
 

Time Difference - Origin


When migrating a certain on-premises MySQL database to AWS Aurora using DMS, we discovered that the time in a table containing the Timestamp data type was incorrect.
The configuration is as follows: a new replica server is created on the on-premises side to meet the DMS requirements, and it is used as the DMS source server.
All servers use row-based replication and the timezone is set to Asia/Tokyo (UTC+9).
The time difference was that the data registered on the source server plus 9 hours was being replicated to Aurora.
serverTime data
on-prem Source10:00:00
on-prem New Replica10:00:00
Aurora19:00:00
serverTimezone
Here too, I standardized it to Asia/Tokyo without giving it much thought.
 

Remove serverTimeZone setting


serverTimeZone=Asia/Tokyo
At a glance, it looked like the time difference had disappeared.
But the problem still persists.
 

Time Difference - Full Load Edition


Since the DMS settings have been changed, we will start over with a full load.
Once the sync is complete, you will notice something.
The time is off again!!!
This time, data was found on the Aurora side that was 9 hours behind compared to the source server.
After investigating the situation, it seemed that the records being fully loaded were out of sync, but the replication data updated during synchronization was not out of sync.
serverFully loaded time data (with discrepancies)Replicated time data (no lag)
on-prem Source10:00:0010:05:00
on-prem New Replica10:00:0010:05:00
Aurora01:00:0010:05:00
This doesn't seem like an issue that can be solved by changing the serverTimeZone value.
 

ServerTimeZone setting and DMS behavior


Based on the results so far, let's summarize the behavior of DMS depending on whether serverTimeZone is set or not.
with serverTimeZoneserverTimeZone
Fully loadedNo misalignmentThere is a misalignment
ReplicationThere is a misalignmentNo misalignment
The behavior of time skew occurring during full load and CDC replication is reversed depending on whether serverTimezone is enabled or not.
In other words, if you perform full load + continuous replication as a DMS task, a timestamp-type time lag will always occur during one of the synchronization processes.
I imagine this difference occurs because during full load, SQL Read is used, and during replication, the actual updated values are referenced using ROW-based Binlog Read.
When reading from SQL, the session timezone depends on the DMS settings and is UTC by default, so if serverTimeZone is set, the difference calculation will result in the correct value.
During replication, the value including the timezone recorded in the Binlog on the replica server is referenced, and by default the data is referenced and reflected as UTC+9, but if serverTimeZone is set, it means that an additional 9 hours are added.
 
I've come up with a few ways to solve this situation.
  • Change Aurora's timezone server variable to UTC (currently UTC+9)
  • Tasks are divided into full load (with serverTimeZone) and continuous replication (without serverTimeZone)
  • Change the type from Timestamp to Datetime
 
We considered and verified each method and adopted a method to convert it to Datetime.
meritDisadvantages
Change Aurora's timezone variable to UTCNo impact on existing databasesWe would like to eventually return to UTC+9, but the timing for doing so is limited to when we switch over (it is assumed that the switch will be made while the service is running, making it difficult to implement).
Split the taskNo impact on existing databasesReplication tasks seem to require definition for each table, which is not realistic.
Convert to DatetimeDMS synchronization of date and time literals does not cause time skew, regardless of full load or replication. This solves the 2038 problem for timestamps.I can't read the time for Alter execution
The reasons for adopting this format include the fact that the Datetime format does not cause discrepancies due to replication, and by converting only the on-premises replica server, which is the DMS source, to Datetime, it was possible to eliminate any impact on running services.
Also, although it is impossible to predict the execution time of Alter to convert to Datetime, we predicted that it would not take several days since the scale of the DB data is not that large.
One concern is that the Timestamp type is stored internally in UTC and converted to the session's time zone when retrieved and displayed, so it was necessary to check what would happen to the value when converted to Datetime.
As a result of the verification, we found that if the timezone of the session when Alter is executed is UTC+9, existing records can be correctly converted to Datetime from UTC+9 to UTC+9, and the time data replicated from the source server during the conversion is also reflected as UTC+9.
 

Final configuration


By converting to Datetime type, we were able to successfully perform full load + replication without any inconsistencies.
 

Conclusion


It's difficult to explain replication inconsistencies to a third party.
When migrating a database using DMS, please check whether there are any tables that contain the Timestamp data type.
Please let me know if there is another easy way to solve this
SRG is looking for people to work with us. If you're interested, please contact us here.