elasticsearch的and,filter,or

Here is a very good article, very good, translated and organized, the English is good, it is recommended to read the original text: http://euphonious-intuition.com/2013/05/all-about-elasticsearch-filter-bitsets/

There are BOOL  filter , AND, OR, NOT  filter in elasticsearch . These look very similar. What is the difference? When to use bool filter ? When to use AND filter?

In fact, bool filter is completely different from AND, OR, NOT filter, and the impact on query performance is very large.

 

The first thing we need to understand is how the filter works. One of the core things is called BitSet , which can be understood as a large bit array. Each element in the array has 2 states: 0 and 1 (bloom filter knows Is it?), and everyone knows that filter only deals with whether the document is matched or not, and does not involve document scoring operations. If a document matches the filter query, its corresponding bit is set to 1, otherwise it is set to 0.

When es executes filter query filtering, it will open each segment file of lucene, and then judge whether the document in it matches the filter or not. We can use bitset to store the matching result, and the same filter query next time Come over, we can directly use the bitset in the memory to make judgments, without opening the segment file of lucene, avoiding the operation of io, which can greatly improve the speed of query processing, which is why filter is so efficient reason.

Because the segment segment file of lucene is unchanged, lucene will generate a new segment, but the old segment is unchanged, so the bitset is reused. According to different filter conditions and different segments, the corresponding bitset will be generated. The query may involve the intersection of multiple bitsets, and the computer is very good at this kind of bit processing, and the speed is very fast.

In addition, if the result of the filter is empty, the bitset bits in it are all 0. When es processes the filter in the future, the bitset will be completely ignored to improve performance.

The basic content has been said before, let's take a look at the difference between bool filter and AND filter.

The bool filter will use the bitset data structure (bitset faction) mentioned earlier , while AND \OR\ NOTfilter cannot use the bitset (non-bitset faction), why?

AND, OR, NOT filter is a document-by-document processing of doc by doc, es loads the field content in the document one by one, and then checks whether the content of the field satisfies the query conditions, and the unsatisfied documents are excluded from the result set, and iteratively proceeds in turn. , until all the documents are finished, the middle process does not use the bitset mentioned above, and the cache resources cannot be reused

If you have multiple filter conditions, that is, an AND, OR, NOT contains multiple filter conditions (arrays are supported), then the processing logic is that each filter will pass the generated result set to the next filter in turn , Theoretically, the number of documents to be processed will be less and less, because only the filtering will decrease, not increase, so filtering in sequence, so generally the more stringent restrictions can be placed in the front execution, so that the number of documents that the latter filter needs to process will be It is very small, which can greatly improve the overall processing speed. In addition to the number of considerations, the efficiency of the filter also needs to be considered. Some filters have low execution efficiency, such as Geo filter (a large number of calculations) or script based filter (dynamic script) , it is recommended to execute these queries with high performance overhead at the end to improve the overall processing speed.

Well, now there should be such a concept, AND, OR, NOT are documents by documents, which are processed in turn. If your result set is large, that is, a very loose query with many hits, then you use AND, OR, NOT Filter is not suitable, but some filters must be processed by document, such as the following filters:

  • Geo* filters
  • Scripts
  • Numeric_range

So in addition to the above ones, other filters should always use bool filter to improve query performance.

If you need to use both bitset and non-bitset type filters in your query, you can combine bool filter and AND\OR\NOT filter,

As mentioned earlier, AND is that the result set is passed backwards in turn, so we put the better performance in the front, and the non-bitset is behind the AND filter, such as the following complex filter containing multiple filter types

 

{
  "and" : [
    {
      "bool" : {
        "must" : [
          { "term" : {} },
          { "range" : {} },
          { "term" : {} }
        ]
      }
    },
    {
      "or" : [
        { "custom_script" : {} },
        { "geo_distance" :{} }
      ]
    }
  ]
}

 

and is a wrapper in the outermost layer. The first filter is a bool filter, which contains 3 must sub-filters. After processing, the document result set is obtained, and then an or sub-filter is executed. The two queries in the OR will be Separately, the final document result set is our search result.

In short, when the filter is used, the bitset stream must be used first, and then the filter order and combination must be considered.

  • Geo, Script or Numeric_range filter: 使用 And/Or/Not Filters
  • All others : use Bool Filter

Mastering the above, it is not difficult to write high-performance queries.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326433417&siteId=291194637