About the OpenSearch OR1 instance family announced at re:Invent
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article summarizes the advantages of the OpenSearch OR1 instance family over existing instance families. We received information from AWS SAs during the writing of this article. Thank you for your cooperation.
CyberAgent Group SRE Advent Calendar 2023This is the 6th day's article.
What is the OR1 instance family?Cost comparison with existing instance familiesTuning that takes advantage of OR1's featuresReducing the number of replicasReducing the number of CPUsPoints to note about OR1Cannot change to a different instance familyThe size of a single shard must be less than 100GB.Since synchronization to S3 is performed on a segment-by-segment basis, inconsistencies in search results may occur for a certain period of time after the update.The number of replicas is not automatically reducedWorkloads that OR1 is suited toReference materialsConclusion
What is the OR1 instance family?
A new OpenSearch instance family announced at re:Invent.
The key point is that the design is significantly different from previous instance families. By using S3 as primary storage, it achieves 11 9s of durability, and benchmark results have confirmed a 30% improvement in cost efficiency over existing instance families.
Supports OpenSearch 2.11 and later.
reference
Cost comparison with existing instance families
First, let's compare the number of vCPUs/Memory.
The vCPU:Memory ratio is 1:8, so compared to the R series, the pricing is approximately 1.25 times higher than the R series.


Additionally, there will be an additional charge for S3 backup storage.
https://aws.amazon.com/opensearch-service/pricing/?nc1=h_lsSee Pricing Example 4
Simply replacing it like this will not result in cost savings.
It is necessary to tune the OR1 to take advantage of its characteristics.
Tuning that takes advantage of OR1's features
Reducing the number of replicas
Since data durability is guaranteed by S3, the number of replicas can be reduced to 1.
Reducing the number of replicas provides the following benefits:
- Reduced indexing load
- Decreasing the EBS size
- In cases where multiple nodes are created for data redundancy, reduce the number of nodes.
Reducing the number of CPUs
As mentioned above, the indexing load can be reduced, so the number of CPUs can be reduced by reducing the number of nodes and scaling down the instance size.
Points to note about OR1
Cannot change to a different instance family
OpenSearch instances built with OR1 cannot be changed to other instance families later.
Similarly, you cannot change from other instance families to OR1.
The size of a single shard must be less than 100GB.
The shard size is limited to a quota of 100GB or less, and if it exceeds this limit, writing will not be possible.
It seems possible to request an increase in the limit through AWS Support.
Referencehttps://docs.aws.amazon.com/opensearch-service/latest/developerguide/or1.html#or1-limitations
Since synchronization to S3 is performed on a segment-by-segment basis, inconsistencies in search results may occur for a certain period of time after the update.
Replication has been changed from the conventional document-based replication to segment-based replication, which causes a replication delay due to the time it takes to create segments. This can be set using the refresh_interval value, with the minimum value being 10 seconds.
Therefore, consistency between written data and read data cannot be guaranteed for a certain period of time.

The number of replicas is not automatically reduced
The advantage is that the number of replicas can be reduced to 1 because S3 guarantees durability, but the user must reduce the number of replicas themselves.
Set the number of replicas to 1 in the index template.
Workloads that OR1 is suited to
The major benefit is that it reduces the number of replicas and reduces the CPU load on indexing, making it suitable for write-heavy workloads.
It may be suitable for log search purposes.
On the other hand, because the number of replicas will be reduced, read-heavy workloads may experience a slight decrease in response time.
Also, as mentioned above, in cases where strict consistency in read/write is required, it seems that it should be avoided because there is a possibility of inconsistent search results.
Reference materials
Conclusion
Our company has a service where we are increasing the number of units and CPUs due to the high indexing load of OpenSearch, so we plan to use the OR1 instance family to reduce costs.
SRG is looking for people to work with us.
If you're interested, please contact us here.
SRG runs a podcast where we chat about the latest hot topics in IT technology and books. We hope you'll listen to it while you work.