Spanner Region Partitioning Primer

Masaya Matsuda (Service Reliability Group (SRG) of the Media Headquarters)@mm_matsuda816)is.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article isCyberAgent Group SRE Advent Calander 2024This is the article for the second day.
We will explain how to use Spanner's regional partitioning and its costs.
I hope this will be helpful to those who are considering implementing regional partitioning.
 

What is geographic partitioning?


This feature was released on 2024/07/16 and is a preview feature as of 2024/12/02.
Until now, Spanner has been able to handle data transparently across geographically separated locations by using multi-region configuration instances, but by utilizing regional partitioning, individual rows within a database table can be stored in instances in different regions.
 
Please refer to this page for the release details.

Key Benefits

  • By storing data in partitions geographically close to the region where you are running queries, you can expect to see shorter write latencies and significantly faster read latencies compared to traditional multi-region configurations.
  • This makes it easier to optimize when workload trends differ between regions compared to traditional multi-region configurations.
    • If the load ratio between asia-northeast1 and us-east1 is 10:2, you can create a 10-node partition in asia-northeast1 and a 2-node partition in us-east1.

Main limitations

These are limitations as of the preview release on December 2, 2024. They may change in the future.
  • Available in Spanner Enterprise Plus edition.
  • You can create up to 10 partitions per instance.
  • Does not support PostgreSQL language databases.
  • You cannot create a dual-region instance partition.
  • Within one instance, you cannot create different partitions with the same base instance configuration.
  • A maximum of 20 million placeholder rows can be placed per node in a partition.
    • The English documentation states it is 100 million, but I have not confirmed the actual behavior.
  • When adding placement rows, you can move about 10 per second per node in the partition.
  • Partitions cannot be created on instances with less than 1 node (1,000 Processing Units).
  • You cannot create a backup of an instance that has partitions.
  • Customer-managed encryption keys cannot be used for instances with partitions.
  • You cannot create partitions on instances that have Managed Autoscaler enabled.
  • An instance with a partition cannot be moved. Individual rows can be moved to different partitions.
  • The use of partitions does not guarantee compliance with regulatory requirements.
  • Change Streams does not support partitioned data.
  • To use geographic partitioning, you need to create a new empty database and enable the `opt_in_dataplacement_preview` option.
  • If you use an INSERT or DELETE DML statement on a table with a placement key, that statement must be the only statement in the transaction.
 
When considering using it in a production environment, the inability to create backups may be the biggest obstacle. I hope that it will be possible to create backups by the time of GA.

How to use


We will actually create and operate a table with a placement key that uses regional partitioning.

Create an instance

Create a database

Create Partition

Create a partition for us-east1 only and a partition for asia-northeast1 only.
As explained later, creating partitions incurs additional costs because you must allocate a minimum of 1,000 PUs per partition.

Create a placement

Once created, placements cannot be deleted individually; they are deleted when the database is deleted.
Set the partition name created in the previous section to instance_partition.
 

Create a table with a placement key

Create a table similar to the example in the official documentation.
The placement key column cannot be dropped, nor can it be added to an already created table.

INSERT

Other than specifying the location, it is the same as usual.

SELECT

UPDATE

The query must target only one location.
If you want to operate on data from multiple locations, you will need to separate the queries for each.

DELETE

As with UPDATE, the query must target only one Location; if you want to operate on data from multiple Locations, you will need to create separate queries for each.

cost

Enterprise Plus Edition is required.
As of 12/02/2024, it's $5.13 per hour.
https://cloud.google.com/spanner/pricing?hl=ja から抜粋
https://cloud.google.com/spanner/pricing?hl=jaExcerpt from

About Partitions

When you create a database, a default partition is created and it has the same multi-region configuration as the instance.
Google Cloud Consoleより
From the Google Cloud Console
Adding a partition will incur additional costs as it will create a dedicated node.
Google Cloud Console より
From the Google Cloud Console
 

About placing read-only replicas

The criterion for deciding whether to create a read-only replica is how much data is acceptable for a stale read request to reference timestamp data from n seconds ago when accessed from outside the region where the partition is located.
For details, please refer to the following page.
 
If you use geo-partitioning for GDPR compliance, you will only have replicas in the appropriate regions.
Please refer to the following pages when deciding on a policy.
 

Conclusion


We explained how to use and the costs of regional partitioning in Spanner.
 
tomorrowCyberAgent Group SRE Advent Calander 2024On the third day, Tsuge will continue to be the guest.
 
SRG is looking for people to work with us. If you're interested, please contact us here.