Elasticsearch depth 6

Quest Search _filter depth technical analysis of the implementation of the principle of depth

 (1) Find Search strings in inverted index, obtain document list

 Examples to date

 word                 doc1          doc2          doc3

 2017-01-01       *                *

2017-02-02                         *                *

2017-03-03       *                *                *

 filter:2017-02-02

 The inverted index in a look, found 2017-02-02 corresponding document list is doc2, doc3

 (2) for each inverted index in the search results, build a bitset, [0, 0, 0, 1, 0, 1]

 Very important

 Use doc List found, the bitset a construct, is a binary array, the array elements are each 0 or 1, to identify a doc on a filter condition is matched, if a match is 1, 0 is mismatched

 [0, 1, 1]

 doc1: This filter does not match the

doc2 and do3: match the filter of

 As far as possible with a simple data structure to implement complex functions, save memory space, improve performance

 (3) through each filter condition bitset corresponding priority from the start searching for best sparse look to satisfy all of the conditions of document

 Will be explained later, in fact, may be a one-time search request, issuing a plurality of filter conditions, each filter will correspond to a condition bitset

Bitset filter through each corresponding to the conditions, starting with the most sparsely traversing

 [0, 0, 0, 1, 0, 0]: relatively sparse

[0, 1, 0, 1, 0, 1]

 First traversal sparse bitset, you can start to filter out as much data

 Through all bitset, find all matching filter criteria doc

Request: filter, postDate = 2017-01-01, userID = 1

 postDate: [0, 0, 1, 1, 0, 0]

userID:   [0, 1, 0, 1, 0, 1]

 After completion of traversing two bitset, find matching doc all conditions, is doc4

 The document can be returned to the client as a result of the

 (4) caching bitset, tracking query, over a certain number of times in recent query the filter conditions 256, the cache which bitset. For small segment (<1000, or <3%), not cached bitset.

 For example postDate = 2017-01-01, [0, 0, 1, 1, 0, 0], can be cached in memory, so next time if we have this condition over time, you do not rescan inverted index, repeatedly generating bitset, can greatly improve performance.

 In a recent 256 filter, there is a filter over a certain number, the number is not fixed, it will automatically cache the filter corresponding bitset

 segment (half of the season), filter for a small segment of the acquired result, may not be cached, the recording segment number <1000, or the size of the segment 3% <index total size

 segment small amount of data, this time even scan quickly; segment will be merged automatically in the background, a small segment will soon merge with other small segment larger segment, this time on the cache does not make sense, segment soon Disappeared

 Bitset for a small segment of [0, 0, 1, 0]

 filter than good query is that will caching, caching but before I do not know what is, in fact, is not a complete doc list filter the data results returned. But filter bitset cached. Next time do not scan the inverted index.

 (5) for the most part filter, before query execution, first try to filter out as much data

 query: will calculate doc relevance score of search criteria, but also according to this score to sort

filter: simply filter out the data you want, not counting the relevance score, not the sort

 (6)如果document有新增或修改,那么cached bitset会被自动更新

 postDate=2017-01-01,[0, 0, 1, 0]

document,id=5,postDate=2017-01-01,会自动更新到postDate=2017-01-01这个filter的bitset中,全自动,缓存会自动更新。postDate=2017-01-01的bitset,[0, 0, 1, 0, 1]

document,id=1,postDate=2016-12-30,修改为postDate-2017-01-01,此时也会自动更新bitset,[1, 0, 1, 0, 1]

 

(7)以后只要是有相同的filter条件的,会直接来使用这个过滤条件对应的cached bitset

 

 

 

Guess you like

Origin www.cnblogs.com/jiahaoJAVA/p/11028720.html