Create a secure load testing environment with AWS Database Migration Service (DMS)'s new Data Masking feature

Yuta Kikai of the Service Reliability Group (SRG) of the Media Headquarters@fat47)is.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article summarizes how to build a MySQL load testing environment that masks personal information using the new data masking feature of AWS DMS.
I hope this helps in some way.
 

AWS Database Migration Service Adds New Data Masking Feature


Data masking is a new feature of Database Migration Service (DMS) released on November 25, 2024.
 
The "Conversion Rules" in the "Database Migration Task" of DMS has traditionally allowed you to convert data so that it can be used for migration to other databases.
 
With this release, the following actions have been added:
  • Number masking
  • Randomize numbers
  • Hash Masking
 

Try masking in practice


DB advance preparation

First, create a source Aurora cluster A and a target cluster B from the RDS screen.
 
Connect to cluster A and create a verification table like the one below.
Create the person table
Add a record to the person table
 

Creating a DMS

Next, create a replication instance to relay data from the DMS screen.
"Replication Instance" → "Create a Replication Instance"
 
3.5.4
Please note that as of January 2025, the default selection is 3.5.3.
 
Next, select "Database Migration Tasks" → "Create Data Migration Task".
 
Give the task identifier a suitable name.
For the replication instance, select the relay instance you created earlier.
Select the writer on cluster A for the source database and the writer on cluster B for the target database.
 
Scroll down and click "Add New Selection Rule".
Click on "Add Conversion Rule" that appears.
Enter the rule as follows:
Rule Target: Column
Source name: Enter schema (test)
Source table name: person
Column name: tel
Action: Number Masking
 
ハッシュマスキング
Now when you finish creating the task, it will automatically start full loading and conversion from Cluster A to Cluster B.
 

Check the data

First, connect to Cluster A and look at the record list to see the original data.
 
Next, connect to Cluster B and check the record list.
As per the conversion rules, we were able to confirm that the name and address were hashed and the tel digits were replaced with x.
 
By connecting from the load testing application to the endpoint of Cluster B, it becomes possible to handle data volumes equivalent to those used in production while still protecting personal information.

Conclusion


Until now, data masking processes had to be implemented using Lambda or similar, but this new DMS feature makes it much easier to perform the conversion process.
 
This is extremely helpful as it allows us to quickly build a safe load testing environment!
 
SRG is looking for people to work with us. If you're interested, please contact us here.