How many shards does Elasticsearch need to set?

0 Preface

This article is translated from the official blog of Elasticsearch20170918, the original author: Christian Dahlqvist. In the early stage of building an Elasticsearch cluster, if the cluster sharding settings are unreasonable, performance problems may occur in the middle and later stages of the project.

Elasticsearch is a very versatile platform that supports a wide variety of use cases and offers great flexibility in data organization and replication strategies. This flexibility makes it difficult for you as a newbie to ELK to organize your data into indexes and shards. While not necessarily a problem on first boot, it can cause performance issues due to the amount of data over time. The more data a cluster has, the more difficult it is to correct the problem, and it may even sometimes require reindexing large amounts of data.

When we encounter users experiencing performance issues, it's not uncommon to trace back to issues regarding data indexed and the number of clusters. This is especially true for users involved in multi-tenancy or using time-based indexing. When discussing this issue with users (conference, forum format), some of the most common questions that come up are:

1)“我应该有多少个分片?”
2)“我的分片应该有多大”?
  • 1
  • 2

This blog post aims to help you answer these questions and provide practical guidance for use cases (logging or security analysis) using time-based indexing.

1. What is sharding?

Before we start, let's agree on some concepts and terminology used in the article. 
Data in Elasticsearch is organized into indexes. Each index consists of one or more shards. Each shard is an instance of a Luncene index. You can think of an instance as a self-managing search engine for indexing a portion of data and processing queries in an Elasticsearch cluster.

[Flush] When data is written to a shard, it is periodically published to a new immutable Lucene segment on disk, at which point it is available for querying. - This is called a refresh. For more detailed interpretation, please refer to: 
http://t.cn/R05e3YR

[Merging] As the number of segments (segments) grows, these segments are periodically merged into larger segments. This process is called merging.

Since all segments are immutable, as new merged segments need to be created, old ones are deleted, which means that the disk space used often fluctuates while indexing. Merging can be quite resource intensive, especially in terms of disk I/O.

A shard is the unit by which Elasticsearch distributes data around a cluster. The speed at which Elasticsearch moves shards when rebalancing data (such as after a failure) depends on the size and number of shards, as well as network and disk performance.

Tip: Avoid having very large shards, as large shards may negatively impact the cluster's ability to recover from failures. There is no fixed limit on how large a shard can be, but a shard size of 50GB is generally defined as a limit that applies to various use cases.

2. Index validity period (retention period)

Since segments are immutable, updating a document requires Elasticsearch to first find an existing document, then mark it as deleted, and add the newer version. Deleting a document also requires locating the document and marking it as deleted. Therefore, deleted documents will continue to occupy disk space and some system resources until they are merged, which will consume a lot of system resources.

Elasticsearch allows the full index to be deleted directly from the filesystem without explicitly having to delete all records individually. This is by far the most efficient way to delete data from Elasticsearch.

Tip: Use time-based indexes to manage data whenever possible. The data is grouped according to the retention period (which can be understood as the validity period). Time-based indexes also make it easy to change the number of primary and replica shards over time (for the next index to be generated). This simplifies adapting to changing data volumes and demands.

3. Are indexes and shards not free?

[Cluster state] For each Elasticsearch index, its mapping and state information are stored in the cluster state. This cluster state information is kept in memory for fast access. Therefore, having a large number of indexes in the cluster can lead to large cluster states (especially if the map is large). All update cluster state operations need to be done by a single thread in order to ensure consistency across the cluster, so the update speed will be slower.

Tip : To reduce the number of indexes and avoid large or even very large mappings, consider storing data with the same index structure in the same index, rather than splitting the data into separate indexes based on the source of the data. It is important to find a good balance between the number of indexes per index and the size of the map. **

Each shard has data that needs to be kept in memory and uses heap space. This includes data structures that hold information at the shard level, but also data structures at the segment level to define where data resides on disk. The size of these data structures is not fixed and will vary depending on the use case.

However, an important feature of segment-related overhead is that it is not proportional to the size of the segment. This means that larger segments have less overhead per amount of data than smaller segments, and the difference is large.

[Importance of Heap Memory] In order to be able to store as much data as possible per node, it is important to manage heap memory usage as much as possible and reduce its overhead. The more heap space a node has, the more data and shards it can process.

Therefore, indexes and shards are not idle from a cluster's perspective, as each index and shard has a certain level of resource overhead.

Hint 1 : Small shards lead to small segments, which increases overhead. The goal is to keep the average shard size between a few gigabytes and tens of gigabytes. For use cases with time-based data, it is common to see shards between 20GB and 40GB in size.

Hint 2 : Since the overhead per shard depends on the number and size of the segments, forcing smaller segments into larger segments can reduce overhead and improve query performance. This should be ideal once no more data is written to the index. Note that this is a resource-intensive (expensive) operation, and ideal processing hours should be performed during off-peak hours.

Tip 3 : The number of shards you can hold on a cluster node is proportional to the size of heap memory you have available, but this is a fixed limit that Elasticsearch does not have. A good rule of thumb is: make sure the number of shards per node is kept below 20-25 shards per 1GB of heap memory for the corresponding cluster. So a node with 30GB heap memory can have up to 600-750 shards, but further below this limit you can stay better. This usually helps the group stay healthy.

4. How does the size of the shard affect performance?

In Elasticsearch, each query is executed in a single thread per shard. However, multiple shards can be processed in parallel, and multiple queries and aggregations can be performed on the same shard.

[Pros and cons of small shards] This means that when no cache is involved, the minimum query latency will depend on the data, the type of query, and the size of the shard. Querying a large number of small shards will allow each shard to be processed faster, but as more tasks need to be queued and processed in sequence, it will not necessarily be faster than querying a smaller number of larger shards. Having many small shards also reduces query throughput if there are multiple concurrent queries.

Tip: The best way to determine the maximum shard size from a query performance perspective is to benchmark with realistic data and queries (real data, not simulated data). Always benchmark with query and index loads representative of what a node needs to process in production, as optimizations for individual queries can produce misleading results.

5. How to manage fragment size?

When time-based indexes are used, each index is traditionally associated with a fixed period of time. Daily indexes are very common and are often used to hold data with short time periods or large daily volumes. These allow fine-grained management of data age periods, and volumes can be easily adjusted for daily replacement.

Data with long time periods, especially if daily index data is not saved daily, will typically use weekly or monthly increases in the saved shard size. This reduces the number of indexes and shard sizes that need to be stored in the cluster over time (literally a bit laborious here).

Tip: If you are indexing data with a fixed period of time, you can adjust the time range covered based on the time period and expected data volume to reach the target shard size.

[Comparison of evenly updated & rapidly changing index data] Time-based indexing with fixed time intervals works well when the amount of data is reasonably predictable and changing slowly. It is difficult to maintain a uniform target shard size if the indexing rate can change rapidly.

To be able to better handle this situation, Rollover and S**hrink API** were introduced. These add flexibility in how indexes and sharding are managed, especially for time-based indexes.

The introduction of Rollover and Shrink API is omitted here. (It is recommended to check the official website to complete the concept and go deeper)

6 Conclusion

This blog post provides tips and practical guidance on how to best manage data in Elasticsearch. If you're interested in learning more, I recommend reading a Google search for "Elasticsearch: the definitive guide" (a bit old, but worth reading).

However, many decisions about how to best distribute data across indexes and shards will depend on use-case specifics, and it can sometimes be difficult to determine how best to apply the available recommendations.

Below is a list of some of the core recommendations mentioned in the article, in response to the question at the beginning of the article.

1) “我应该有多少个分片?”
答: 每个节点的分片数量保持在低于每1GB堆内存对应集群的分片在20-25之间。
2) “我的分片应该有多大”?
答:分片大小为50GB通常被界定为适用于各种用例的限制。
  • 1
  • 2
  • 3
  • 4

It's still a bit confusing to read, and some concepts are still not deep enough to explain in a simple way. 
Let me update it in practice. For more details, welcome to discuss!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324618574&siteId=291194637