The minimum_should_match parameter in ES Boolean query uses pit avoidance

In Elasticsearch (ES), Boolean Query is a query type that allows you to combine multiple query clauses to control the matching logic of search results. minimum_should_match is an important parameter in Boolean queries, used to specify at least the number of clauses that should match.

The value of minimum_should_match can be a specific number or a percentage. What it means depends on the number of should clauses in the query.

When minimum_should_match is an integer, it indicates at least the number of should clauses that need to match. For example, if minimum_should_match is set to 2 and there are 4 should clauses in the query, at least 2 of them need to match to satisfy the query condition.

When minimum_should_match is a percentage, it expresses a relative proportion based on the sum of the should clauses. The percentage calculation is based on the number of valid clauses, which are those with non-null, non-Boolean conditions. For example, if you set minimum_should_match to "50%", and there are 6 should clauses in the query, and only 3 of them are non-empty, then at least 2 of them (3 of 50%) need to match to satisfy the query condition.

minimum_should_match can also use special syntax to more precisely control the matching conditions. For example, a combination of percentages and fixed values ​​can be used, such as "3<90%", indicating that at least 3 clauses or 90% of the total need to be matched (whichever is greater).

Using the minimum_should_match parameter can flexibly control the matching requirements of Boolean queries, making the query results more in line with expectations. According to your needs, you can adjust this parameter according to the number of clauses, percentage or their combination to achieve the best query results.

1. First semantics

The minimum_should_match parameter is used to specify the number or percentage of conditions that the documents returned by should must match. If the bool query contains at least one should clause and no must or filter clauses, the default value is 1.

# 条件1: name中包含 "phone"
# 当未设置 minimum_should_match 参数时,多个条件的关系为 OR
# 条件2: type 等于 "phone"

GET goods_en/_search
{
    
    
  "_source": false,
  "query": {
    
    
    "bool": {
    
    
      "should": [
        {
    
    
          "term": {
    
    
            "type.keyword": "phone"
          }
        },
        {
    
    
          "match": {
    
    
            "name": "phone"
          }
        }
      ],
      "minimum_should_match": 2 // 设置为 2 则此时需要至少满足 2 个条件
    }
  }
}

2. Second semantics

But if the must or filter clause appears in the sibling clause in the bool query, the default value of minimum_should_match will become 0.

# ( must 或者 filter )和 should 组合
# 条件1: 价格小于20000
# 条件2: name中包含"phone"或者 type 等于"phone"
GET goods_en/_search
{
    
    
  "_source": false,
  "query": {
    
    
    "bool": {
    
    
      "filter": [
        {
    
    
          "range": {
    
    
            "price": {
    
    
              "lte": "20000"
            }
          }
        }
      ],
      "should": [
        {
    
    
          "term": {
    
    
            "type.keyword": "phone"
          }
        },
        {
    
    
          "match": {
    
    
            "name": "phone"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

Guess you like

Origin blog.csdn.net/wlei0618/article/details/130577679