Pre-splitting: A must-do before launching a Spanner-based service

This is Masaya Matsuda (@mm_matsuda816) from the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
 
This article isCyberAgent Group SRE Advent Calendar 2025This is the article for the 16th day.
This article introduces the "Pre-splitting" feature of Spanner, which prevents latency degradation immediately after launch. We provide a detailed explanation of the design guidelines and configuration methods, and summarize key points for smoothly launching services even during traffic surges.

Spanner "clogging" shortly after launch


Google Cloud Spanner is a powerful database that combines the consistency of a relational database with the horizontal scalability of NoSQL.
However, immediately after a large-scale service launch or campaign begins, expected performance may not be achieved and latency may worsen, resulting in "blockages."
Much of this problem stems from the behavior of Split, Spanner's data distribution mechanism.
Spanner typically detects the amount of data (size) and load and automatically splits and distributes the data across multiple nodes.
This is called automatic splitting in this article, but there is a time lag between detecting load and completing the split.
In other words, in a scenario where traffic suddenly jumps from 0 to 100, as occurs immediately after launch, automatic splitting cannot keep up, and access becomes concentrated on a specific node, creating a hotspot.
The feature introduced to solve this issue was made generally available on April 28, 2025.Pre-splitting(split points)is.

Pre-splitting (split points) mechanism and design


Pre-splitting is a feature that allows you to manually create split points in advance before traffic arrives and distribute data across multiple nodes.
This will enable multiple nodes to handle spikes in access immediately after launch from the start.

How much should you split?

In the official documentation, the number of divisions is as follows:10 split points per nodeIt is said that...
For example, if you are using a five-node instance in a production environment, we recommend creating approximately 5 × 10 = 50 split points.
Also, automatic splitting may be sufficient for small-scale instances, so it is not necessarily required, but it should be implemented if large amounts of traffic are expected.

Where to split

The location of the split (key range) depends on the key design of your data.
  1. When keys are evenly distributed (e.g. UUIDs or hash values)
Divide the entire key space evenly.
30
  1. When a specific range becomes hot
If you know that certain user IDs or categories have a high volume of traffic, segment that range.
It is also recommended to create split points not only for tables but also for indexes that are heavily accessed.

Creating and Validating Split Points


We will explain the steps to actually create split points and check whether they have been applied correctly.
The most common way is to use the Google Cloud CLI (gcloud).

1. Creating Split Points

gcloud spanner databases splits add
Prepare a file containing the division points, or specify them on the command line.
The API limitations are:You can add up to 100 points per request.Please note that this is only up to.
Below is an image of the command:
--split-expiration-date
Split points are not permanent and should be reverted to Spanner's automatic management once traffic stabilizes.
The default is 10 days, and it can be set to a maximum of 30 days.

Counting the number of splits

Check that the split points you created are reflected correctly.
jq

Operational Notes


Impact on latency

More divisions are not necessarily better.
Splitting it too much can increase the number of nodes (Transaction Participants) that a single transaction spans, potentially permanently worsening read/write latency.
It may also increase the query resource usage (Compute/Query usage).

Monitoring and Expiration

After launch, monitor the Latency Profile and Key Visualizer in the Cloud Console.
If excessive splitting is causing performance degradation (for example, overall increased latency rather than hotspots), you may want to manually expire (invalidate) the split and return to Spanner's automatic management.
Once expired, split points will no longer be displayed and may be automatically merged depending on traffic conditions.
This is normal behavior and means that Spanner has transitioned to an autonomous split.

Pre-Launch Checklist


Finally, we've summarized the items you should check before launching a service that uses Spanner.
  1. Ensuring the number of nodes
Pre-splitting is not a capacity-increasing feature.
First, make sure you have provisioned enough nodes to handle the expected traffic.
  1. Is the number of Split Points appropriate?
Please check that the setting is based on "number of nodes x 10".
  1. Is the timing of pre-splitting appropriate?
The official recommendation is "7 days to 12 hours before launch."
If you post too early, you risk it expiring before launch, and if you post too close to launch, it may not be reflected in time.
  1. Check the restrictions
Search Index and Vector Index are not subject to pre-splitting.
Workloads that rely on these may require additional load management.
  1. Monitoring during launch
Use Key Visualizer to monitor for hotspots or latency degradation due to over-splitting, and be prepared to expire settings if necessary.
To maximize Spanner performance and ensure a smooth launch, make effective use of pre-splitting.
 
SRG is looking for people to work with us.
If you are interested, please contact us here.