About the OpenSearch OR1 Instance Family Announced at re:Invent

Mr. Matsuda (@mm_matsuda816)is.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article summarizes the advantages of the OpenSearch OR1 instance family over existing instance families. In writing this article, we received information from AWS SA. Thank you for your cooperation.
CyberAgent Group SRE Advent Calendar 2023This is the 6th day's article.
 

What is the OR1 instance family?


A new OpenSearch instance family announced at re:Invent.
The key point is that the design is significantly different from conventional instance families. By using S3 for primary storage, it achieves eleven 9s of durability, and benchmark results have confirmed a 30% improvement in cost efficiency over existing instance families.
Supports OpenSearch 2.11 and above.
 
reference

Cost comparison with existing instance families


First, let's compare the number of vCPUs/Memory.
The vCPU:Memory ratio is 1:8, so we will compare it with the R series. The price is about 1.25 times higher than the R series.
or1 インスタンスファミリー料金(ap-northeast-1 12/18時点) https://aws.amazon.com/opensearch-service/pricing/?nc1=h_ls より引用
or1 Instance Family Pricing (ap-northeast-1 as of December 18th)https://aws.amazon.com/opensearch-service/pricing/?nc1=h_lsQuoted from
 
r6g インスタンスファミリー料金(ap-northeast-1 12/18時点) https://aws.amazon.com/opensearch-service/pricing/?nc1=h_ls より引用
R6G Instance Family Pricing (AP-NorthEast-1 as of December 18th)https://aws.amazon.com/opensearch-service/pricing/?nc1=h_lsQuoted from
 
Additionally, there will be an additional charge for S3 backup storage.
 
Simply replacing it like this will not result in cost savings.
It is necessary to tune the OR1 to take advantage of its characteristics.

Tuning that utilizes the features of OR1


Reducing the number of replicas

Since data durability is guaranteed by S3, the number of replicas can be reduced to 1.
Reducing the number of replicas provides the following benefits:
  • Reduce indexing load
  • Decreasing the EBS size
  • In cases where multiple nodes are created for data redundancy, reduce the number of nodes.

Reducing the number of CPUs

As mentioned above, the indexing load can be reduced, which means you can reduce the number of CPUs by reducing the number of nodes and scaling down the instance size.

Points to note about OR1


You cannot change to a different instance family.

OpenSearch built with OR1 cannot be changed to another instance family later.
Similarly, you cannot change to OR1 from any other instance family.

The size of a shard must be less than 100GB.

The shard size is limited to a quota of 100GB, and if it exceeds this limit, writing will not be possible.
It seems possible to apply for an upper limit increase through AWS Support.

Since synchronization to S3 is performed on a segment-by-segment basis, inconsistencies in search results may occur for a certain period of time after the update.

Replication has changed from the conventional document-based replication to segment-based replication, and a replication delay occurs due to the time it takes to create a segment. This can be set with the refresh_interval value, with the minimum value being 10 seconds.
For this reason, the consistency of written data and read data cannot be guaranteed for a certain period of time.
Segment replication https://opensearch.org/blog/segment-replication/ より印象
Segment replication https://opensearch.org/blog/segment-replication/More impressive

The number of replicas is not automatically reduced

The advantage is that the number of replicas can be reduced to 1 because S3 guarantees durability, but it is up to the user to reduce the number of replicas themselves.
Set the number of replicas to 1 in your index template.
 

Workloads that OR1 is suited for


The major benefit is that it reduces the number of replicas and reduces the CPU load on indexing, making it suitable for write-heavy workloads.
I think it would be suitable for log searches, etc.
 
On the other hand, because the number of replicas will be reduced, read-heavy workloads may experience slightly slower response times than before.
 
Also, as mentioned above, in cases where strict consistency in read/write is required, it seems that its use should be avoided because it may cause inconsistent search results.

Reference materials


Conclusion


Our company has some services for which we are increasing the number of units and CPUs due to the high indexing load of OpenSearch, so we plan to use the OR1 instance family to reduce costs.
 
SRG is looking for people to work with us. If you are interested, please contact us here.
 
SRG runs a podcast where we chat about the latest hot topics in IT and books. We hope you will enjoy listening to it while you work.