About the OpenSearch OR1 instance family announced at re:Invent

Matsuda from the Service Reliability Group (SRG) of the Technology Division@mm_matsuda816)is.
#SRGThe Service Reliability Group primarily provides comprehensive support for the infrastructure surrounding our media services, focusing on improving existing services, launching new ones, and contributing to open-source software (OSS).
This article summarizes the advantages of the OpenSearch OR1 instance family over existing instance families. We received information from AWS SAs during the writing process. Thank you for your cooperation.
CyberAgent Group SRE Advent Calendar 2023This is the article for day 6.
 

What is the OR1 instance family?


This is a new family of OpenSearch instances announced at re:Invent.
The key point is that the design is significantly different from conventional instance families. By using S3 as the primary storage, it achieves 119 durability, and benchmark results show a 30% improvement in cost efficiency compared to existing instance families.
We support OpenSearch 2.11 and later.
 
reference

Cost comparison with existing instance families


Let's start by comparing the number of vCPUs/memory.
The vCPU:Memory ratio is 1:8, so we'll compare it to the r series. The pricing is approximately 1.25 times higher than the r series.
or1 インスタンスファミリー料金(ap-northeast-1 12/18時点) https://aws.amazon.com/opensearch-service/pricing/?nc1=h_ls より引用
or1 instance family pricing (ap-northeast-1 as of 12/18)https://aws.amazon.com/opensearch-service/pricing/?nc1=h_ls(Quoted from)
 
r6g インスタンスファミリー料金(ap-northeast-1 12/18時点) https://aws.amazon.com/opensearch-service/pricing/?nc1=h_ls より引用
r6g instance family pricing (ap-northeast-1 as of 12/18)https://aws.amazon.com/opensearch-service/pricing/?nc1=h_ls(Quoted from)
 
Additionally, there will be an extra charge for S3 backup storage.
 
Simply replacing it in this way will not result in cost reduction.
Tuning needs to be done in a way that takes advantage of OR1's characteristics.

Tuning that utilizes the characteristics of OR1


Reducing the number of replicas

Because data durability is guaranteed by S3, the number of replicas can be reduced to 1.
Reducing the number of replicas offers the following benefits:
  • Reduced indexing load
  • EBS size reduction
  • In cases where multiple nodes are created for data redundancy, reduce the number of nodes.

Reducing the number of CPUs

As mentioned above, this reduces the indexing load, allowing you to reduce the number of CPUs by decreasing the number of nodes or scaling down the instance size.

Points to note about OR1


Cannot change to another instance family

OpenSearch instances built with OR1 cannot be changed to other instance families later.
Similarly, you cannot change from other instance families to OR1.

Each shard must be 100GB or less in size.

Shard size is limited to 100GB or less by quota; writing will be disabled if it exceeds this limit.
It appears that you can also request an increase in the limit through AWS Support.

Because synchronization to S3 is done on a segment-by-segment basis, inconsistencies in search results may occur for a certain period after an update.

The replication method has changed from the traditional document-based replication to a segment-based replication method, which results in replication delays due to segment creation time. This can be configured using the `refresh_interval` value, but the minimum value is 10 seconds.
Therefore, consistency between written and read data cannot be guaranteed for a certain period of time.
Segment replication https://opensearch.org/blog/segment-replication/ より印象
Segment replication https://opensearch.org/blog/segment-replication/More impressive

The number of replicas is not automatically reduced.

The advantage of S3 is that it guarantees durability, allowing you to reduce the number of replicas to 1, but users themselves need to reduce the number of replicas.
Set the number of replicas to 1 in the index template.
 

OR1 is suitable for workloads


A major advantage is that it reduces the number of replicas and thus the CPU load during indexing, making it suitable for write-heavy workloads.
It would probably be suitable for log search applications.
 
Conversely, reducing the number of replicas may result in a slight decrease in response time for read-heavy workloads.
 
Furthermore, as mentioned earlier, in cases where strict consistency in read/write operations is required, it seems that this method should be avoided because it may lead to inconsistencies in search results.

Reference materials


In conclusion


Due to the high indexing load on OpenSearch, we have increased the number of instances and CPUs for some of our services. Therefore, we plan to utilize the OR1 instance family to reduce costs.
 
SRG is looking for new team members. If you are interested, please contact us here.
 
SRG runs a podcast where we chat about the latest hot IT technologies and books. We hope you'll enjoy listening to it while you work.