ES Index Write Performance Optimization

See the excellent article necessary to turn from
This link: https: //blog.csdn.net/zhuzhuba008/article/details/77483199


1, written with bulk quantities
if you want to go inside es poured into the data, then according to your business scenarios, if your business scenario can support allows you to gather together a group of data, write-once es, then try using bulk manner, each batch write hundreds like this.

Performance of a bulk batch write a document that write a lot of performance is much better than you. However, if you want to know the size of a bulk request best, pressure measurement needs to be done on a single individual es node of the shard. First bulk write 100 document, then 200, 400, and so on, each time the bulk size is doubled once. If the bulk write performance begins to flatten the time, then this is the best bulk sizes. Bulk size is not bigger is better, but depending on your specific cluster environment, etc. To test it out, because the greater the bulk size can lead to excessive memory pressure, so it's best not to send a request for the amount of data than 10mb.

First is to determine a bulk size, this time to try to be single-threaded, a es node, a shard, for testing. Take a look at the maximum number of data-time single-threaded write performance is relatively good.

2, the use of multiple threads to write data es
single-threaded send bulk requests can not be written to maximize throughput es cluster. If you want to take advantage of all the resources of the cluster, you need to use multi-threaded bulk data written to the cluster. In order to better utilize the resources of the cluster, so multi-threaded concurrent writes, you can reduce the number and cost of each of the underlying disk fsync. First make a single pressure sensing es shard single node, for example, first two threads, the thread 4 is then followed by eight threads 16, each time doubling the number of threads. Once es returned TOO_MANY_REQUESTS error, JavaClient is EsRejectedExecutionException. At this point then that es is said to have been the biggest bottleneck to a concurrent writes, and this time we know can only support such a high concurrency written.

3, increasing the refresh interval
default refresh interval is 1s, with index.refresh_interval parameters can be set, so it will be forced es per second all the data in memory to disk, create a new segment file. It is this gap, let us each time after data is written, after 1s can see. But if we will be the major interval, such as 30s, 30s can accept data written after seeing, then we can get a larger write throughput, because the 30s are written in the memory of every 30s only It creates a segment file.

4, prohibiting refresh and replia
If we want to load one-time large quantities of data into es, you can first copy prohibit refresh and replia will index.refresh_interval set to -1, the index.number_of_replicas can be set to 0. This may result in our loss of data, because there is no mechanism to refresh and a replica. But no need to create segment file, do not need to copy the data to another replica of the replica shasrd go above. At this time, the write speed will be very fast, once finished, refresh and replica can be modified back to a normal state.

5, prohibits swapping swap memory
if you want to jvm memory is swapped to disk es, then switched back to memory, lots of disk IO, poor performance

6, more to the filesystem cache memory
filesystem cache is used to perform more IO operations, if we give filesystemcache more memory resources, then write performance es will be much better.

7, using the automatically generated id
If we want to manually set a id es document, you need every es to confirm the existence of the id, the process is relatively time-consuming. If we use the id generated automatically, so you can skip this step es, write performance will be better. For your business in the table id, it can be used as a field es document is.

8, with a better performance of the hardware
we can give more filesystem cache memory, SSD may be used instead of mechanical hard drives, avoiding the use of other NAS storage network, consider use RAID 0 striping disk concurrent read enhance efficiency, etc. Wait.

9, index buffer
if we want to be very heavy high concurrent write operation, it is best to transfer large buffer index number, indices.memory.index_buffer_size, this can be adjusted a bit more, set the index buffer size, all the shard public, but in the future if divided by the number shard, calculated the size of the average shard of memory that can be used is generally recommended, but for each shard for up to 512mb, because then nothing big performance improves. es will this set as each shard share index buffer, those particularly active shard will be more use of this buffer. The default value of this parameter is 10%, which is jvm 10% heap, if we allocate 10gb memory to jvmheap, then the index buffer have 1gb, for two shard share, it is enough of.
---------------------
Disclaimer: This article is the original article CSDN bloggers "zhuzhuba008", following the CC 4.0 by-sa copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/zhuzhuba008/article/details/77483199

Guess you like

Origin www.cnblogs.com/xinyumuhe/p/11363543.html