Be aware of network IP ranges before creating a MongoDB Atlas cluster
My name is Ohara and I work in the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article explains the IP address conflict issue in MongoDB Atlas VPC peering and how to migrate to it with minimal downtime.
IntroductionIP address conflict error in VPC peering configurationWhy did the IP address get duplicated?Atlas default CIDRVPC peering constraintssolution1. Delete and recreate the shared cluster2. Remove the conflicting VPC peering with stg and compromise on public access3. Set the appropriate CIDR in the new project and migrate data from shared to shared_v2 (adopted)Live Migration for data migration and VPC peering reconfigurationThe importance of design
Introduction
MongoDB Atlas is a fully managed cloud database service that automates database management, allowing developers to focus on building applications. By using the VPC peering function from cloud environments such as GCP and AWS, you can securely connect using private IP addresses.
This time, we will explain the network IP address range conflict issue that occurred when setting up VPC peering between MongoDB Atlas and GCP, and how to resolve it.
IP address conflict error in VPC peering configuration
stg
common vpc
Why did the IP address get duplicated?
After investigation, it was determined that the cause of the problem was related to the specifications used when creating the MongoDB Atlas cluster.
Atlas default CIDR
192.168.0.0/16
shared
My first thought was to change the CIDR defined in this network container, but I found that due to the specifications of Atlas, changing the VPC CIDR of a cluster once it has been created would require recreating the cluster.
M10
VPC peering constraints
Of course, when peering VPCs, the IP ranges of the subnets must not overlap.
192.168.0.0/16

solution
The following three proposals were considered:
Reconfiguring the Atlas network will require downtime, so you should consider less disruptive methods if possible.
1. Delete and recreate the cluster
mongorestore
Furthermore, due to the Atlas specifications, existing snapshots must also be deleted when they are recreated, limiting the means for quick recovery.
2. CompetingRemove VPC peering with and compromise on public access
Access to stg is less frequent than access to shared, and there is little access that is important to the product, so it was a candidate.
shared
3. Set the appropriate CIDR in the new project, → Migrate data with (recruit)
We used Atlas' Live Migration feature for data migration. This feature allows us to synchronize two databases and run them in parallel while the application is running, enabling us to switch over with almost no downtime. We can also verify the validity of the infrastructure by checking whether CIDR changes and peering settings can be made as expected without affecting the existing system.
This plan was adopted as it was deemed the safest and least impactful.
Live Migration for data migration and VPC peering reconfiguration
The specific steps of the solution adopted are as follows:
shared_v2
172.16.0.0/18
- Check the CIDR of the new network
common vpc
shared_v2
shared_v2

The importance of design
While managed services are very easy to build and manage, there are many things to consider when designing them, such as provider-specific specifications and limitations.
By carefully designing not only the network but also the entire system at the early stages of construction, you can prevent large-scale rework and complex data migration work that may occur in the future.
SRG is looking for people to work with us.
If you are interested, please contact us here.