Getting started with Spanner region partitioning

Masaya Matsuda (Service Reliability Group (SRG) of the Media Headquarters)@mm_matsuda816)is.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article isCyberAgent Group SRE Advent Calander 2024This is the article for the second day.
We will explain how to use Spanner's regional partitioning and the costs involved.
I hope this will be helpful to those who are considering implementing regional partitioning.
 

What is geographic partitioning?


This feature was released on 2024/07/16 and is a preview feature as of 2024/12/02.
Until now, Spanner has been able to handle data transparently across geographically separated locations by using multi-region instances, but by leveraging regional partitions, individual rows within a database table can be stored in instances in different regions.
 
Please refer to this page for the release details.

Key Benefits

  • By storing data in partitions geographically close to the region where queries are executed, you can expect to see reduced write latency and stronger read latency compared to traditional multi-region configurations.
  • This makes it easier to optimize when workload trends vary by region, compared to traditional multi-region configurations.
    • If the load ratio between asia-northeast1 and us-east1 is 10:2, you can create a 10-node partition in asia-northeast1 and a 2-node partition in us-east1.

Main limitations

These are limitations as of the preview release date of December 2, 2024. They may change in the future.
  • Available in Spanner Enterprise Plus edition.
  • A maximum of 10 partitions can be created per instance.
  • PostgreSQL language database is not supported.
  • You cannot create a dual-region instance partition.
  • Within one instance, you cannot create different partitions with the same base instance configuration.
  • A maximum of 20 million placemant rows can be placed per node in a partition.
    • The English documentation states it is 100 million, but I have not confirmed the actual behavior.
  • When adding placement rows, you can move roughly 10 per second per node in a partition.
  • Partitions cannot be created on instances with less than 1 node (1,000 Processing Units).
  • You cannot create a backup of an instance that has partitions.
  • Customer-managed encryption keys cannot be used for instances with partitions.
  • You cannot create partitions on instances that have Managed Autoscaler enabled.
  • An instance with a partition cannot be moved. Individual rows can be moved to a different partition.
  • The use of partitions does not ensure compliance with regulatory requirements.
  • Change Streams does not support partitioned data.
  • To use geographic partitioning, you need to create a new empty database and enable the `opt_in_dataplacement_preview` option.
  • If you use an INSERT or DELETE DML statement on a table with a placement key, that statement must be the only statement in the transaction.
 
When considering using it in a production environment, the biggest obstacle may be the inability to create backups. I hope that it will be possible to create backups by the time of GA.

How to use


We will actually create and operate a table with a placement key that utilizes regional partitioning.

Create an instance

Create a database

Create partition

Create a partition for us-east1 only and a partition for asia-northeast1 only.
As we will explain later, creating partitions incurs additional costs because you must allocate a minimum of 1,000 PUs per partition.

Create a placement

Created placements cannot be deleted individually; they will be deleted when the database is deleted.
Set the partition name created in the previous section to instance_partition.
 

Create a table with a placement key

Create a table similar to the example in the official documentation.
The placement key column cannot be dropped, nor can it be added to a table that has already been created.

INSERT

Other than specifying the location, it is the same as usual.

SELECT

UPDATE

The query must target only one Location.
If you want to operate on data from multiple locations, you will need to split the query for each one.

DELETE

As with UPDATE, the query must target only one Location; if you want to operate on data from multiple Locations, you will need to create separate queries for each.

cost

Enterprise Plus Edition is required.
As of 12/02/2024, it is $5.13 per hour.
https://cloud.google.com/spanner/pricing?hl=ja から抜粋
https://cloud.google.com/spanner/pricing?hl=jaExcerpt from

About Partitions

When you create a database, a default partition is created and has the same multi-region configuration as the instance.
Google Cloud Consoleより
From the Google Cloud Console
Adding a partition will incur additional costs as a dedicated node will be created.
Google Cloud Console より
From the Google Cloud Console
 

About placement of read-only replicas

The criterion for deciding whether or not to create a read-only replica is how much data is available that can tolerate a stale read request referencing timestamp data from n seconds ago when accessed from outside the region where the partition is located.
For details, please refer to the following page.
 
If you use geo-partitioning for GDPR compliance, you will only have replicas in the appropriate regions.
Please refer to the following pages when deciding on your policy.
 

Conclusion


We explained how to use and the costs of regional partitioning of Spanner.
 
tomorrowCyberAgent Group SRE Advent Calander 2024On the third day, we will continue with Tsuge-san from the first day.
 
SRG is looking for people to work with us. If you are interested, please contact us here.