Elasticsearch slow query Troubleshooting

ES recently doing a search tuning, read some documentation and code lucene search, knowledge and summarize their thinking tuning process learned for this article.

After the slow query caught ES, will pass profile or kibana the Search Profiler console view specific slow in the where. Prior to the implementation of the general profile search, you need to slightly change the query statement in the query, prevent cache affect the test results.

profile includes time-consuming shard-level query time-consuming, query time-consuming and ultimately rewrite the statement of lucene collector.

The main concern of the general consuming query, profile return of the query part of a detailed presentation of the query rewrite after being time-consuming and each sub-queries. Including the type (type) sub-query, sub query (description), sub-queries time-consuming (time_in_nanos) and a breakdown of each set contains a time-consuming phase of the lucene segements search. From this breakdown in the collection can be more time-consuming clearly see the reason, of course, the premise is to understand the breakdown in each indicator and what does it mean to achieve internal logic.

breakdown in the main indicators and lucene in implementation:

build_scorer: time-consuming construction of a scorer. scorer is mainly used for matching the doc scoring and sorting. Internal build_scorer constructed iterator, the iterator can iterate over all matched document, construction of the iterator operation is very time consuming because it involves docId result set is configured for each sub-query the bitset chain or inverted, and generates a final conjunction be made iterated docId bitset or inverted chain. Most major time-consuming queries in this step.

next_doc: to find the next matching document Id. Here keyword, text and other types of text fields using skipList, numeric data types using the Tree structure will quickly find the next matching docoument Id. At the same time, there will be record number of sub-queries the doc hits for filtering the final min_should_match like.

advance: a low level of similar next_doc. Not all query can achieve next_doc, such as query document must go under the advance to find a match.

score: socrer record in the document scoring consuming, tf-idf is calculated by algorithms Freq, normal data such as a combined score.

match: scoring record in the second stage time-consuming. Some queries require a two-stage scoring, such as a query phrase (phrase query) "chinese love china", first find out all of the first phase include "chinese", "love", "china" term of three documents. All documents matched to the second stage and then in the first stage to calculate "chinese", "love", "china" position and order of the three words satisfy the condition, this operation is very time consuming, so the first stage narrow the scope of matching documents.

create_weight: Create a time-consuming process of weight, weight is equivalent to context lucene query, which contains the query, collector, indexreader and so on.

* _Count: Record number of method calls, such as next_doc_count: 2, representatives next_doc method is called twice.

In addition to detailed statistics query process, further comprising:

rewrite_time: query statement is rewritten consuming, Lucene maintain their own set of query rewrite logic, such as the number of query terms terms to query if less than 16, will be rewritten to make or binding a plurality TermQuery; if more than 16 will be rewritten TermInSetQuery.

collector: indicators data collection phase of the query. Including collector query used the number, type and time-consuming. ES is used by default SimpleTopScoreDocCollector. The main collector lucene merge and sort the results of each segment by matching the reduce method returns topN.

 

Troubleshooting procedures, in addition to targeting by profile API slow queries, but also need to focus on the overall resource usage ES clusters, such as data node's CPU, Mem, disk IO bottleneck if there is, whether the number of shard single nodes as excessive. General state of the cluster can be monitored by cerebro or elasticsearch_exporter + Prometheus, you can also view metrics by ES API.

 

Guess you like

Origin www.cnblogs.com/zhq1007/p/11744102.html