Elasticsearch query filtering puzzle

Introduction

I have been confused by query and filter before, why the same bool is a query in one place and a filter in another.

Later, I took a closer look at the official documents and found out that they were all queries, just distinguishing:

  1. query context
  2. filter context

Why distinguish between query context and filter context?

Because the efficiency is different, the query in the filter context is more efficient , because the filter context does not calculate the relevance score, and ES will automatically cache high-frequency filtering queries.

What is the filtering context

First of all, I see that the filter must be, and the must_not in the bool query will also be executed in the filter context.

filter

bool query:

{
    
    
  "query": {
    
     
    "bool": {
    
     
      "filter": [ 
        {
    
     "term":  {
    
     "status": "1" }},
        {
    
     "range": {
    
     "create_date": {
    
     "lte": "2020-01-01" }}}
      ]
    }
  }
}

constant_score query:

{
    
    
    "query": {
    
    
        "constant_score" : {
    
    
            "filter" : {
    
    
                "term" : {
    
     "user" : "tim"}
            },
            "boost" : 1.2
        }
    }
}

In the aggregate function:

{
    
    
  "aggs" : {
    
    
      "month" : {
    
    
          "filter" : {
    
     "term": {
    
     "type": "2" } }
      }
  }
}

must_not

{
    
    
  "query": {
    
    
    "bool" : {
    
    
      "must_not" : {
    
    
        "range" : {
    
    
          "age" : {
    
     "gte" : 18, "lte" : 30 }
        }
      }
    }
  }
}

bool query

Types of Description
must Indicates that the conditions must be met, the relationship between multiple conditions is and, which means that they are met at the same time
should Indicates that at least one is matched, and the relationship between multiple conditions is or, which indicates that at least one is satisfied
filter The conditions must be met. Unlike must, the filter is executed in the filtering context and the relevance score is not calculated
must_not The conditions must not be met, executed in the filtering context, and the relevance score is not calculated

Note: must_not is in the filter context (filter context execution)

{
    
    
  "query": {
    
    
    "bool" : {
    
    
      "must" : {
    
    
        "term" : {
    
     "status" : 1 }
      },
      "filter": {
    
    
        "term" : {
    
     "type" : "Component" }
      },
      "must_not" : {
    
    
        "range" : {
    
    
          "ctime" : {
    
     "gte" : "2019-12-01", "lte" : "2020-01-01" }
        }
      },
      "should" : [
        {
    
     "term" : {
    
     "tag" : "Kafka" } },
        {
    
     "term" : {
    
     "tag" : "Elasticsearch" } }
      ]
    }
  }
}

As shown above, term means term query, which means exact match, and range query means range match.

The meaning of the above query is that the status field in the document must be 1, the type field must be "Component", the ctime field must be within the range of 2019-12-01 to 2020-01-01, and the tag field must contain "Kafka" Or "Elasticsearch".

Full text search query

match

The most commonly used match is the full-text index of the corresponding field.

{
    
    
  "match" : {
    
    
        "lauguage" : "Java"
    }
}
{
    
    
    "query": {
    
    
        "match" : {
    
    
            "message" : {
    
    
                "query" : "java python ruby"
            }
        }
    }
}

multi_match

multi_match is similar to match, but you can specify multiple fields to search.

{
    
    
  "query": {
    
    
    "multi_match" : {
    
    
      "query":    "java python ruby", 
      "fields": [ "subject", "message" ] 
    }
  }
}

match_all

Match_all can query all index fields, and match_all is used by default without query conditions.

{
    
    
  "match_all" : {
    
    }
}

match_phrase

match_phrase is a bit similar to match, but there are two big differences:

  1. The word for query segmentation must appear in the corresponding field segmentation of the document
  2. The order of the words must also be the same
{
    
    
    "query": {
    
    
        "match_phrase" : {
    
    
            "message" : "java python ruby"
        }
    }
}

The relative position can be adjusted using the slop parameter

{
    
    
  "query": {
    
    
      "match_phrase": {
    
    
          "message": {
    
    
              "query": "java python ruby",
              "slop": 2
          }
      }
  }
}

You can refer to the following query example, the following is the test data:
slop-all

no-slop

As shown in the figure above, it is the query result without the slop parameter.

slop 2

As shown in the figure above, it is the query result with the slop parameter of 2.

slop3

As shown in the figure above, it is the query result with the slop parameter of 3.

We can see that the slop parameter is to adjust the slop-1 word in the query word.

match_phrase_prefix

match_phrase_prefix is ​​similar to match_phrase, but runs prefix matching

{
    
    
    "query": {
    
    
        "match_phrase_prefix" : {
    
    
            "message" : {
    
    
                "query" : "java python r"
            }
        }
    }
}

Other common queries

Coming out of the bool query, full-text search query and related term, range and other queries we introduced above, we also have some other commonly used queries, let’s briefly introduce them below.

terms

Earlier we have mentioned term query, which means exact match. Terms and term are basically the same, but terms allow multiple values ​​to be set. As long as one value matches exactly, the match is successful.

{
    
    
  "query" : {
    
    
      "terms" : {
    
    
          "component" : ["kafka", "elasticsearch"],
          "boost" : 1.5
      }
  }
}

As shown above, it means that as long as the component field exactly matches kafka or elasticsearch, the match is successful.

Boost affects the document relevance, greater than 1 means increasing the document relevance during calculation, and less than 1 means reducing the document relevance during calculation.

terms_set

The terms_set query is given an array, if the corresponding field in the document contains at least the specified value in the parameter array, it matches.

It sounds a bit convoluted, it doesn't matter, let's look at an example, first create a user index, and then add 2 documents as follows.

PUT /user/_doc/1?refresh

{
    
    
    "name": "tim",
    "hobby": ["看书", "跑步"],
    "required_matches": 2
}

PUT /user/_doc/2?refresh

{
    
    
    "name": "allen",
    "hobby": ["看书", "冥想"],
    "required_matches": 2
}

Now we can execute the terms_set query.
GET /user/_search

{
    
    
    "query": {
    
    
        "terms_set": {
    
    
            "hobby.keyword": {
    
    
                "terms": ["看书", "游泳", "冥想"],
                "minimum_should_match_field": "required_matches"
            }
        }
    }
}

As shown above, hobby.keyword is used in the query instead of hobby, because the mapping is not set when the user mapping is added, and hobby is automatically mapped to text type, but the automatic mapping also uses the fields parameter, so we can use hobby. keyword to search.

If you are afraid of trouble, you can directly set the mapping and set hobby as the keyword type.

The above search will find the document whose name is allen. If we replace "meditation" with "running", we will find the document whose name is tim.

It feels more painful that the minimum number of matches should be set in the document, instead of directly setting the value in the query. The query uses the smallest matching field in the reference document

exists

{
    
    
    "query": {
    
    
        "exists": {
    
    
            "field": "gid"
        }
    }
}

ids

{
    
    
    "query": {
    
    
        "ids" : {
    
    
            "values" : ["1", "3", "5", "7", "9"]
        }
    }
}

prefix

Prefix matching is sometimes very useful, for example, we are looking for Lao Wang:

{
    
    
  "query": {
    
    
      "prefix" : {
    
     "name" : "王" }
  }
}

wildcard

If you think that prefix query cannot meet your needs, you can also consider wildcard query.

{
    
    
  "query": {
    
    
      "wildcard": {
    
    
          "name": {
    
    
              "value": "王*五",
              "boost": 1.5
          }
      }
  }
}

wildcard supports 2 wildcards:

  1. ? Means match a character
  2. *Match 0 to multiple characters

The value of boost can affect the relevance, less than 1 means reducing the relevance of the document, and greater than 1 means increasing the relevance of the document.

If wildcards do not meet the requirements, you can also consider regexp regular expression query

However, it is strongly recommended to only use prefix in the online environment. If you want to affect the score relevance, then use wildcard and do not start with a wildcard.

constant_score

When we only want to filter, we can consider constant_score query:

{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": {
    
    
        "match_all": {
    
    }
      },
      "filter": {
    
    
        "term": {
    
    
          "status": 1
        }
      }
    }
  }
}

Equivalent to:

{
    
    
  "query": {
    
    
    "constant_score": {
    
    
      "filter": {
    
    
        "term": {
    
    
          "status": 1
        }
      }
    }
  }
}

Documentation

Query and filter context

Structured query

bool query

match

Guess you like

Origin blog.csdn.net/trayvontang/article/details/107998923