Elasticsearch real - Disk IO is played

background

The way it is. 16:00 one day about 42 minutes. Business feedback I developed a problem with service in a test environment, the return resource data is 0. The investigation found that ES access timeout log. The equivalent of hanging a database. It lasted more than 20 minutes their own recovery.
Consulting the ES team to finally obtain the following reply:

The current status of the cluster:
 1 ) the current highest index cluster data IO is XXX, a very small amount of data (100MB) 
 2) to read and write but are large (read> 1000QPS, write> 1000QPS), using an offline environment machine
 3 ) index divided 10 pieces, four copies of issue 
analysis:
 1 ) understand the machine before the line hard drive performance test environment environment would have on the poor, the need to determine a business SRE
 2 ) query time, a one-time query 10 pieces, which may check data 10 machine, barrel effect is prone, resulting in decreased performance of the cluster
 3 ) when written, although made 10 slices, looks can increase the ability to write, but a small number of machines, each machine cause results to be distributed five slices, equivalent to only two fragments, did not expand the capacity of writing 
recommendations:
 1 ) upgrade hardware, replace the SSD
 2 ) into slices 2, so there must enhance reading skills than ever before, the ability to write an equivalent
 amount of data 3) is very small, it is recommended directly into Redis
I did a survey. There are ten test environment ES VM (non-local disk ESB) as the server. One of which IO is played. Other machines load, IO is very low. For this question, the answer given by the ES team is:
ES service load balancing, write your own discovery mechanism is generally not a problem, 
Client only the official client to do a simple package, 
the best course is to transform the official client, 
but we're human obviously not, can only continue to use older clients to use; 
we expected around October would be a self-study of the client, 
will try to avoid a machine cause some problems with the query, 
but can not be avoided, 
the interior of the ES service discovery mechanism, we can not change, unless the change ES

survey

1. The need to replace the local disk, the test environment is our formal environment. Whether a direct replacement for a physical machine? How many appropriate? How you can smooth replace?

No need to replace the physical machine. Because the ES memory can be up to 32G. The extra memory is no access to waste, it is separated into physical machine VM to use.

10 original VM is sufficient, only need to replace the same amount.

Machine-replace function. The principle is to apply for the deployment of the machine when replacing. Then click on the machines to replace. Station will be a slice arrived on the new machine. A complete automatic logoff audience older machines.

2. We test environment have 10 servers, 10 slices, four copies, write / read about QPS 7: 6. What few slices a few index is more reasonable?

Because each slice and copies are synchronized to write. Write proportion of large, multi-copy has a great influence on performance. Alternatively we need to rebuild the index fragment, hard smooth. Therefore, only a reduced number of copies of a fragment.

3. There are no procedures that can be optimized?

Tair increase cache ES top. Performing a data update operation is a single read data. Tair better use of transactional and reduce the pressure on the ES. ES only handle complex queries.

Guess you like

Origin www.cnblogs.com/xiexj/p/11626706.html