Elasticsearch vs. OpenSearch: Uncovering the Performance Gap

作地:George Kobar , Ugo Sangiorgi

 

A powerful, fast and efficient search engine is a vital element for any organization that relies on fast, accurate searches for data. For developers and architects, choosing the right search platform can greatly impact your organization's ability to deliver fast and relevant results. In our comprehensive performance tests, Elasticsearch emerged as the clear choice. Elasticsearch is 40%-140% faster than OpenSearch while using fewer computing resources.

In this article, we compare the performance of Elasticsearch 8.7 and OpenSearch 2.7 (the latest versions of both at the time of testing) in six main areas: text queries, sorting, date histograms, ranges, and terms, including resource utilization. Our goal is to provide fair, practical technical insights to help you make informed decisions, whether you're optimizing an existing system or designing a new one. This comparison is also intended to clearly highlight the performance differences between Elasticsearch and OpenSearch, showing that the two are completely different.

We will first review the results of the performance comparison, followed by our testing methodology and testing environment.

result

The performance comparison results (focusing on the p90 (90th percentile) of the request) were cross-validated using t-tests to ensure that the latency measurements were statistically different between the two solutions. The relative change (expressed as a percentage) is calculated for each query type. We also show the latency distribution of 100% of requests using a box plot showing the minimum, maximum, median, mean, and outliers. The actual boxes show the lower and upper quartiles, where 25% and 75% of the observations fall, respectively. In this way, we can get an idea of ​​the actual distribution of these values.

Text Queries — 76% faster

Show me all data that has [email protected].

 

Elasticsearch showed a significant lead, executing text queries 76% faster than OpenSearch.

Text query is the foundation and key of full-text search, and full-text search is the main function of Elasticsearch. Text field queries allow users to search for specific phrases, single words, or even parts of words in text data. Users are able to perform complex searches through text data - it enhances the overall search experience and supports a wide range of applications and solutions.

to sort

"Show me the most expensive products first."

Elasticsearch outperformed OpenSearch by a whopping 140% when sorting the results of simple text queries. Additionally, Elasticsearch performed 24%, 97%, and 53% faster execution times for timestamp, keyword, and numeric sort queries, respectively.

Sorting is the process of arranging data in a specific order, such as alphabetical, numerical, or chronological. Sorting is useful for search results based on specific criteria to ensure the most relevant results are presented to customers. This is an important feature that enhances the user experience and increases the overall efficiency of the search process.

date histogram

Show me a bar chart ordered in time for all the data.

For date histogram aggregation, Elasticsearch is 81% faster than OpenSearch, demonstrating its power. The speedup in processing time facilitates the generation of ordered bar charts from time series data.

Date histogram aggregation can be used to aggregate and analyze data by dividing time-based data into intervals or buckets. This feature enables users to visualize and better understand trends, patterns, and anomalies over time.

range query

Show me just the price of your products between 0-25.

Elasticsearch is 40% faster for range queries and 68% faster for range aggregations.

Searching range queries on test or keyword fields is another core parameter for performance and scalability. Range queries are useful for filtering search results based on a specific range of values ​​in a given field. This feature allows users to narrow down search results and quickly find more relevant information.

Faster faceting creation is critical because it involves classifying data into groups (facets) based on specific attributes, and then performing aggregation operations within each group. This process makes analysis, filtering, and visualization easier by providing a structured view of data frequently used in e-commerce applications.

term query

Group the data by what products were bought together.

 

Elasticsearch demonstrated its superiority with 108% faster term lookup and 103% faster composite term aggregation compared to OpenSearch. These advantages make Elasticsearch an even more attractive choice for tasks involving data grouping and filtering.

The " Significant Terms " aggregation in Elasticsearch automatically excludes common or uninteresting terms, such as stop words ("and", "the", "a") or terms that occur frequently in the index from the results. This is based on a statistical analysis of term frequency and distribution in the indexed data.

resource utilization

Not only does Elasticsearch outperform OpenSearch in various search-related tasks, but it is also proven to be more resource-efficient. By default, OpenSearch uses the best_speed codec for data streams (prioritizing query speed over storage efficiency), while Elasticsearch uses best_compression. Using the default out-of-the-box settings, Elasticsearch uses 37% less disk space, and when using best_compression (the codec used for this benchmark) on both, Elasticsearch is still 13% more space efficient.

Time Series Data Streaming (TSDS)

We went a step further by re-indexing the data into the time series data stream, which further compressed the data - the average document size dropped from 218 kb to 124 kb, a reduction of 54.8% , as shown in the table below.

Average Document Size

Difference from OpenSearch

OpenSearch Datastream

249 kb

-

Elasticsearch Datastream

218 kb

13%

Elasticsearch TSDS

124 kb

54.8%

third party verification

Our performance testing methodology and results have been independently verified by TechTarget's Enterprise Strategy Group, a respected third-party vendor . Validation by Tech Target Enterprise Strategy Group ESG adds credibility and impartiality to our findings, ensuring that the testing methodology and subsequent results maintain the highest standards of accuracy and integrity. Their endorsement reaffirms the robustness and reliability of our comparisons, enabling you to make informed decisions based on our benchmark results.

Test Methods

how we came to these results

In the spirit of a fair and accurate comparison of Elasticsearch and OpenSearch, we created two equivalent 5-node clusters, each with 32GB of memory, 8 CPU cores, and 300GB of disk per node. For each product, we extract the same randomly generated 1TB log file, which contains 22 fields (more details below).

Testing was done on separate pools of Kubernetes nodes, ensuring each product had dedicated resources. We follow best practices for Elasticsearch and OpenSearch, including policies to force indexes to be merged before issuing queries and policies to prevent caching requests from impacting, thereby ensuring the integrity of test results.

To ensure the transparency of the Elasticsearch and OpenSearch comparison, we provide the full benchmarking pipeline as an open source project. Repositories accessible here include Terraform configurations for configuring Kubernetes clusters and Kubernetes manifests for creating Elasticsearch and OpenSearch clusters . Also, the queries used in the benchmarks are available in the repository.

Not only can you test yourself, but you can also use this repository to do your own research and improve the performance of your Elasticsearch project.

what we tested

Our testing between Elasticsearch and OpenSearch was conducted in key usage areas, including:

  • Search - eCommerce use case with a typical search bar
  • Observability - Extensive system telemetry data such as logs, metrics, and application traces
  • Security—Real-time analysis of security events

The upcoming comparison will provide an in-depth analysis of each platform's performance in these areas, including text query, sorting, data histograms, scope, and terminology.

Datasets and Ingestion

A 1TB dataset was generated using this open source tool and uploaded to a GCP bucket. Logstash® is used to ingest datasets in GCP buckets into Elasticsearch and OpenSearch. Instructions for generating a similar dataset are also included in the repository, in case you want to replicate the benchmark.

All logs composed of each field are shown in the table below. The values ​​for all events are random except for @timestamp, which is continuous and unique per event.

Field

Value

@timestamp

Jan 3, 2023 @ 18:59:58.000

agent.id

baac7358-a449-4c36-bf0f-befb211f1d38

agent.name

fernswisher

agent.type

filebeat

agent.version 8.8.0
aws.cloudwatch.ingestion_time 2023-05-01T20:49:30.820Z
aws.cloudwatch.log_group /var/log/messages
aws.cloudwatch.log_stream northcurtain
cloud.region ap-southeast-3
data_stream.dataset benchmarks
data_stream.namespace day3
data_stream.type logs
event.dataset generic
event.id seriously
input.type aws-cloudwatch
log.file.path /var/log/messages/northcurtain
message 2023-05-01T20:49:30.820Z May 01 20:49:30 ip-106...
meta.file 2023-01-03/1682974095-gotext.ndjson.gz
metrics.size 408
metrics.tmin 238
process.name systemd
tags preserve_original_event

Benchmarks

A total of 35 query types were considered across five key domains, totaling 387,000 requests. After 100 warmup queries, each query type is executed 100 times, and the process is repeated 50 times per query.

Rally is an open source tool developed by Elastic® for benchmarking and performance testing of Elasticsearch and other components of the Elastic Stack. It allows users to simulate various types of workloads, such as indexing and searching, against an Elasticsearch cluster and measure their performance in a reproducible manner. Although Rally was developed by Elastic and designed primarily for benchmarking Elasticsearch, it is a flexible tool that can be adapted for use with OpenSearch.

Elastic runs benchmarks daily to ensure that any new code in Elasticsearch performs as well or better than it did yesterday. We also use our own machine learning to identify performance anomalies or inefficient resource utilization. We provide performance and size testing in a transparent and open manner to benefit everyone who uses our products. It's worth noting that other vendors don't offer this feature, which can help users monitor changes of their interest over time.

Conclusion: Elasticsearch — the clear winner

Considering the results of various tests, it is clear that Elasticsearch is consistently better than OpenSearch. Whether it's handling simple queries, sorting data, generating histograms, handling term or range queries, or even resource optimization, Elasticsearch leads the pack.

When choosing a search engine platform, businesses should prioritize speed, efficiency, and low resource utilization—attributes that Elasticsearch excels at. This makes it a compelling choice for organizations that depend on fast and accurate search results. Whether you're an e-commerce platform sorting search results, a security analyst identifying threats, or simply need to effectively observe critical applications, Elasticsearch emerges as the clear leader in this comparison.

Ready to test Elasticsearch yourself?

Start a free 14-day trial of Elastic Cloud and see how Elasticsearch performance can help you with your projects.

The release and timing of any features or functionality described in this article is at the sole discretion of Elastic. Any features or functionality not currently available may not be delivered on time or at all.

Guess you like

Origin blog.csdn.net/UbuntuTouch/article/details/132208255