A story about struggling with time skew in Timestamp columns during DB migration using DMS

My name is Ohara, and I work in the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article explains the timestamp-type time discrepancy problem that occurred during database migration using DMS and how to deal with it.
 

Time Difference - Beginning


When migrating an on-premises MySQL database to AWS Aurora using DMS, we discovered that the time in a table containing a Timestamp data type was incorrect.
The configuration is as follows, with a new replica server being created on the on-premises side to meet the DMS requirements, and used as the DMS source server.
All servers use row-based replication and the timezone is set to Asia/Tokyo (UTC+9).
The time difference was that the data registered on the source server plus 9 hours was being replicated to the Aurora side.
serverTime Data
on-prem Source10:00:00
on-prem New Replica10:00:00
Aurora19:00:00
serverTimezone
Here too, I standardized it as Asia/Tokyo without giving it much thought.
 

Remove serverTimeZone setting


serverTimeZone=Asia/Tokyo
At a glance, it looked like the time discrepancy had disappeared.
But the problem still persists.
 

Time Difference - Full Load Edition


Since the DMS settings have been changed, we will start over with a full load.
Once the sync is complete, you will suddenly notice something.
The time's wrong again!!!
This time, data was found on the Aurora side that was 9 hours behind compared to the source server.
After investigating the situation, it seemed that the records to be fully loaded were out of sync. Also, the replication data updated during synchronization was not out of sync.
serverFully loaded time data (with discrepancies)Replicated time data (no discrepancy)
on-prem Source10:00:0010:05:00
on-prem New Replica10:00:0010:05:00
Aurora01:00:0010:05:00
This doesn't seem like an issue that can be solved by changing the serverTimeZone value.
 

About the serverTimeZone setting and DMS behavior


Based on the results so far, let's summarize the behavior of DMS depending on whether serverTimeZone is set or not.
with serverTimeZoneserverTimeZoneNo
Fully loadedNo misalignmentThere is a misalignment
ReplicationThere is a misalignmentNo misalignment
The behavior of time skew during full load and CDC replication is reversed depending on whether serverTimezone is enabled or not.
In other words, if you perform full load + continuous replication as a DMS task, a timestamp-type time skew will always occur during one of the synchronization processes.
I imagine this difference occurs because during full load, SQL Read refers to the actual updated values based on row, while during replication, Binlog Read refers to the actual updated values based on row.
When reading from SQL, the session Timezone depends on the DMS settings and is UTC by default, so if serverTimeZone is set, the value will be correct if you calculate the difference.
During replication, the value including the timezone recorded in the binlog on the replica server is referenced, and by default the data is referenced and reflected as UTC+9, but if serverTimeZone is set, an additional 9 hours are added.
 
I've come up with a few ways to solve this situation.
  • Change Aurora's timezone server variable to UTC (currently UTC+9)
  • Split the tasks into full load (with serverTimeZone) and continuous replication (without serverTimeZone)
  • Change the type from Timestamp to Datetime
 
We considered and verified each method and adopted a method of converting to Datetime.
meritDisadvantages
Change Aurora's timezone variable to UTCNo impact on existing databaseWe want to eventually return to UTC+9, but the timing is limited to when we can do so, and it will only be possible during the transition (this is difficult to implement since we plan to switch over while the service is running).
Split the taskNo impact on existing databaseReplication tasks require definition for each table, which is not realistic.
Convert to DatetimeDMS synchronization of date and time literals does not cause time skew regardless of full load or replication It can solve the 2038 problem of timestampsI can't read the time for Alter execution
The reasons for adopting this format include the fact that with the Datetime format no discrepancies occur due to replication, and by converting only the on-premises replica server, which is the DMS source, to Datetime, it was possible to eliminate any impact on running services.
Also, although it is impossible to predict the execution time of Alter to convert to Datetime, we estimated that it would not take many days since the scale of the database data is not that large.
One concern is that the Timestamp type is stored internally in UTC and converted to the session's timezone when retrieved and displayed, so it was necessary to check what would happen to the value when it was converted to Datetime.
As a result of the verification, we found that if the timezone of the session when Alter is executed is UTC+9, the existing records can be correctly converted to Datetime from UTC+9 to UTC+9, and the time data replicated from the source server during the conversion is also reflected as UTC+9.
 

Final Configuration


By converting to a Datetime type, we were able to successfully perform a full load and replication without any inconsistencies.
 

Conclusion


It's difficult to explain replication inconsistencies to a third party.
When migrating a database using DMS, please check that there are no tables that contain the Timestamp data type.
Please let me know if there is another easy way to solve this
SRG is looking for people to work with us. If you are interested, please contact us here.