E Going Forward|Analysis of Tencent Cloud Big Data ES Indexing Principles and Best Practices of Writing Performance Optimization

guide

After summarizing a large number of cases and reviewing pitfalls, this article summarizes some commonly used optimization techniques and pitfall avoidance guidelines for Elastisearch clusters in terms of write performance optimization.

In the process of serving Tencent Cloud ES customers, we often receive feedback from some customers that the read and write performance of ES clusters on the cloud has not met expectations, and we hope that we can cooperate to do some performance pressure testing and tuning. After combining the customer's business scenarios and in-depth analysis of cluster performance bottlenecks, we can basically give customers some suggestions and optimization measures that can significantly improve read and write performance.

Based on the practical experience of many large customer clusters, the Tencent Cloud Big Data ES team summarized some commonly used optimization techniques and pitfall avoidance guidelines for Elastisearch clusters in terms of write performance optimization.

Analysis of the principle of ES cluster index

Before introducing ES cluster write performance optimization, let's briefly review the basic process of Elastisearch cluster indexing. As shown in Figure 1 below.

b5ddf49626ad9bc2610c19de1f62c9e6.png

Figure 1. Schematic diagram of ES cluster index writing process

By analyzing the left half of Figure 1, it can be seen that the basic flow of ES writing is as follows:

1. The client sends an index document request to Node1;

2. Node1 determines that the document belongs to shard 0 according to doc_id, and forwards the request to Node3

3. Node3 executes the request on the primary shard, and forwards the request in parallel to the node where the replica shard is located after success. Once all replica fragments are successfully written, Node3 returns the write success to the coordinating node, and the coordinating node reports to the client that the doc document is successfully written.

By analyzing the right half of Figure 1, we can see that the specific execution logic of documents written to the ES cluster on a shard is mainly divided into four key steps as shown in Figure 2 below.

f47e8bee6a7c33a321fc1fb3c9c88562.png

Figure 2. Schematic diagram of ES cluster write core process

1. Wirte process:

After receiving the write request from the coordinating node, the node where the shard is located will first write the doc into an In-memory Buffer, also known as Indexing Buffer. At this time, the doc is still unsearchable. When the In-Menory Buffer space is full, the doc in the In-Memory Buffer will be automatically generated into a segment file, and then the doc will be searched.

2. Refresh process:

In the above write process, doc will be written into the In-Memory Buffer first, and the data in the In-Memory Buffer will be cleared in two cases: the first case is when the In-_memory Buffer space mentioned above is cleared After it is full, ES will automatically clear it and generate a segment. The second case is the Refresh operation triggered by ES timing, which is executed every 1s by default. The Refresh process is actually the process of writing the data in the In-Memory Buffer to the Filesystem Cache, and this process will also generate a Segment file.

3. Flush process:

The above Write process is to write the doc into the In-Memory Buffer, and the Refresh process is to refresh the Doc into the file system cache. The data of these two processes are still in the memory. Once the machine fails or restarts, there may be a risk of data loss. For data loss, ES uses the translog mechanism to ensure data reliability. The implementation mechanism is to write the doc in the form of key-value to the translog file at the same time after receiving the write request. When the data in the Filesystem Cache is flushed to the disk, the It will be cleared, and the process of flushing the data in the Filesystem Cache is the Flush operation. It can also be seen from Figure 2 that after the Flush operation is executed, the data in the In-Memory Buffer and the translog will be cleared.

4. Merge process:

Every Refresh operation of ES will generate a new Segment segment file. This will lead to a skyrocketing number of files in a short period of time. Too many segments will bring a series of problems, such as consuming a lot of file handles, consuming memory and cpu operating cycles. And each search request must check each segment file in turn, resulting in a significant drop in search performance. In order to solve this problem, Elasticearch regularly merges these segment files by maintaining a thread in the background, merges small segments into large segments, and then merges large segments into larger segments. This merging process is the process of Segment Merge.

ES cluster write performance optimization

1. Timing scenarios combined with ILM dynamic scrolling index writing

For scenarios such as logs, monitoring, and APM, it is recommended to combine index lifecycle management (ILM) and snapshot lifecycle management (SLM). Use ILM to flexibly control the size of the written index, especially when encountering a sudden increase in traffic, so that the write performance will not be affected due to the excessive capacity of a single index.

Some customers on our cloud often organize some operational activities, such as e-commerce shopping festivals, etc. During the operational activities, the amount of logs generated by the business is often several times or even ten times the normal amount. If you create an index every day, the index on that day will become very large during the event, and it will easily exceed the officially recommended design principle of "controlling the size of a single shard within 30-50GB". This will trigger a series of problems, and even affect performance. For example, when a single shard is too large, it will affect the speed of shard relocation and the efficiency of shard recovery after abnormal restart of nodes, and even trigger the upper limit of 2.1 billion docs in a single shard in advance, resulting in data loss.

Combined with index lifecycle management, we can flexibly design index rolling strategies, such as starting to roll after one day, starting to roll when the index reaches 1T, and starting to roll when the number of docs reaches 10. Any condition that triggers first will roll to the next new index to write enter. In this way, the size of each index and fragment can be well controlled within a stable range. For more details about index lifecycle management, please refer to "Principles and Practices of Tencent Cloud Elasticsearch Index Lifecycle Management".

2. Do not specify doc_id when writing data, let ES automatically generate

Specifying the doc_id to write will check whether the doc exists before writing. If it exists, it will perform an update operation, and if it does not exist, it will perform an insert operation. Therefore, specifying a doc_id to write will consume more CPU, load, and disk IO of the cluster. big. We have previously compared and analyzed the write performance data of a large group buying customer in the community when specifying doc_id and not specifying doc_id. In the case of specifying doc_id to write, the CPU usage has increased by 30%, and IOutils has increased by 42%.

3. Use the bulk interface to write data in batches, and the size of each bulk data is about 10M

In order to improve the writing performance, ES provides an API called bulk writing. The writing of this API allows the client to send multiple doc documents to the coordinating node for writing. The doc size or the number of docs written in batches directly determines the key factor of the write performance. After a lot of testing and official recommendations, it is ideal to control the amount of data written in each bulk to 10M. If calculated by one doc1k, the number of docs in a batch of bulk is recommended to be controlled between 8000-20000. We can use the _tasks API to check whether the number of docs set is within a reasonable range.

GET_tasks?detailed=true&human&group_by=parents&actions=indices:data/write/bulk

bc1fff7f95ba3d679d2335d70242d8ae.png

Figure 3. _tasks API to view bulk writing details

4. Enable bulk_routing to forward requests to fewer fragments

By default, after a batch of bulk writes from the client are sent to the coordinating node, the coordinating node first divides the batch of docs into several parts according to the number of shards according to a certain strategy, and sends them in parallel to the nodes where the main shards are located. , and then the coordinating node will return to the client after waiting for all the nodes where the shards are located to return a successful write ack. Therefore, the writing time of this batch of bulk will directly depend on the node with the slowest response time, which is a typical long tail effect.

In order to solve this problem, the Tencent Cloud ES team developed the advanced feature of bulk_routing. By randomly generating a routing on the coordinating node, each batch of bulk writes is only forwarded to a specific shard node. In this way, the network overhead and CPU usage during the writing process are reduced, and long-tail partitions are avoided. Slices affect overall write performance. We can enable bulk_routing through the following API. After observing the log cluster of a large customer who enabled bulk_routing, we found that compared with not enabling bulk_routing, the peak value of writing increased by 25%, and the CPU usage decreased by 20%. And when the number of nodes is more, the performance improvement is more obvious.

PUT my-index/_settings
{
    "index.bulk_routing.enabled": true
}

5. For large-scale clusters, create indexes in advance and use fixed Index mapping

The scale of the cluster is large, usually the index capacity and the total number of shards will be relatively large. Since the update of meta metadata is involved when creating an index and updating the mapping, it will temporarily block writing during the process of updating metadata on the master. If at this time If the amount of writing is large, it may cause writing to drop zero. Figure 4 below is a screenshot of the log where zeros were written due to the timeout of updating the mapping when a customer of our cloud switched indexes at 8:00 every morning. However, if the index can be created in advance and a fixed index template is used, a large number of metadata update operations can be avoided during index switching, thereby ensuring the stability and write performance of the cluster.

17626e8c9d310ea766e044f60f5cb4b7.png

Figure 4. Screenshot of cluster update mapping timeout

6. Set the mapping in advance and reduce the number of index fields

For fields that will not participate in retrieval, especially binary fields and super-long text fields, you can set the index attribute of the index field to not analyzed or no, that is to say, we let es not perform word segmentation and index construction on these fields. In this way, unnecessary operations and CPU performance overhead can be reduced, thereby improving the write performance of the cluster. Figure 5 below is a performance test we did. When we set all the field attributes to index and perform the write pressure test again, we can see that the CPU usage has climbed all the way to 90%.

7430292702d8099a17fb4f63c8ac3ed0.png

Figure 5. The CPU usage of the cluster after word segmentation and indexing are enabled for all fields

7. In the pursuit of writing performance, you can set the index being written as a single copy, and open the copy after the writing is completed

We often encounter some customers who do some data synchronization between two clusters or two indexes, such as data migration between clusters through logstash, and index reconstruction between indexes through reindex API. In this fast data import scenario, you can first close the index copy on the target side, because in the data migration scenario, there is already a copy of the original data in the source index, so there is no need to worry about data loss after closing the copy. Wait until the migration is complete and then open the copy. At the same time, we can also set the refresh_interval time to be larger, such as 30s. The purpose is to reduce the generation of segment files. And the number of merges of segment files, thereby reducing the performance overhead of CPU and IO, and ensuring the write performance of the cluster.

Above, we have deeply analyzed the basic principles and process of ES cluster document indexing, combined with the operation and maintenance experience and lessons learned from many major customers of Tencent Cloud ES, and summarized 7 suggestions related to write performance optimization. I hope it can be helpful to every customer of Tencent Cloud ES.

Guess you like

Origin blog.csdn.net/cloudbigdata/article/details/131467703