ElasticSearch advanced query and indexing principle

Advanced Search

ES provides a powerful method of retrieving data. This retrieval method is known as the Query DSLuse Query DSLof Rest API to transfer JSON-formatted request body (Request Body) data to interact with ES. The rich query syntax of this method allows ES to retrieve Be more powerful and more concise .

match query [match_all]

  • match_all: returns all documents in the index
  • match : The search term will be segmented first, and then matched with the target query field. If any word in the segment matches the target field, it can be queried
  • match_phrase : Do not divide the search word into words, and require the search word and field content to be matched in an orderly and coherent manner. All words and sequences need to be exactly the same, except for punctuation marks
  • match_phrase_prefix : Similar to match_phrase usage, the difference is that prefix matching is allowed

Explain the difference between them with an example

  • First store a piece of data. The i like eating and cookingdefault tokenizer should divide the content into " i" " like" " eating" and" " " kuing"
query term/match type match m_phrase m_p_prefix
i
i like
i like singing
i like ea
and

Summarize:

  1. matchWill split the search word into words before matching, match_phraseand match_phrase_prefixwill not split the search word into words
  2. matchand match_phraseare exact matches, match_phrasewhich require an orderly and coherent match between the search term and the field content
  3. match_phrase_prefixIt is not an exact match, it match_phraseallows the last word to use a prefix match on the basis of

Keyword query [term]

term keyword : use keyword query

  • keyword type: When using term to query keyworda field of type, all content needs to be matched
  • Integer type, double type, date type: no word segmentation, must match all
  • text type: default es standard tokenizer, Chinese word segmentation, English word segmentation

So except for the text type, other types are not word-segmented

The standard tokenizer is used in es by default, Chinese word segmentation, English word segmentation

#查询语句
GET /products/_search
{
  "query": {
    "term": {
      "title": {
        "value": "猪猪侠"
      }
    }
  }
}
#结果
"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.2039728,
    "hits" : [
      {
        "_index" : "products",
        "_type" : "_doc",
        "_id" : "rj5iCYABiB8dDekOlCwE",
        "_score" : 1.2039728,
        "_source" : {
          "id" : 2,
          "title" : "猪猪侠",
          "price" : 0.5,
          "created_at" : "2022-04-08",
          "description" : "5毛钱一包"
        }
      }
    ]
  }

range query [range]

range keyword: used to query documents within a specified range

#范围查询   range
GET /products/_search
{
  "query": {
    "range": {
      "字段名": {  
        "gte": 2,  #下界
        "lte": 4   #上界
      }
    }
  }
}

prefix query [prefix]

prefix keyword: query according to the prefix of the document

#前缀查询 prefix
GET /products/_search
{
  "query": {
    "prefix": {
      "FIELD": {
        "value": ""
      }
    }
  }
}

Wildcard query [wildcard]

Wildcard queries can be used:

  • matches a character
  • *match multiple characters
#通配符查询
GET /products/_search
{
  "query": {
    "wildcard": {
      "FIELD": {
        "value": "VALUE"
      }
    }
  }
}

Query by id array [ids]

Query documents through an id array

#通过一组id查询
GET /products/_search
{
  "query": {
    "ids": {
      "values": [1,2]
    }
  }
}

Fuzzy query [fuzzy]

Fuzzy search for documents containing specified keywords

Notice:fuzzy 模糊查询 最大模糊错误 必须在0-2之间

  • The length of the search keyword is 2, ambiguity is not allowed
  • The length of the search keyword is 3-5, allowing one fuzzy
  • The search keyword length is greater than 5, allowing a maximum of 2 blurs
GET /products/_search
{
  "query": {
    "fuzzy": {
      "FIELD": "xxxx"
    }
  }
}

Boolean query [bool]

Elasticsearch can use the bool keyword to combine multiple conditions to achieve complex queries, similar to the operations used in SQL AND, ORandNOT

The Boolean logic types supported by Elasticsearch include the following:

Types include the following:

  • must: The document must meet all the query conditions. When it contains multiple conditions, it is similar to that in SQL and ANDin operators.&&

  • should: The document must meet any one or more of the query conditions ( minimum_should_matchthe number of conditions that need to be satisfied can be specified by specifying), when multiple conditions are included, it is similar to that in SQL OR, and in operators||

  • must_not: The document must not meet all of the query conditions, similar to SQL NOT, and does not participate in the calculation of the score, and the returned branches are all 0

  • filter:: Filter out the documents that meet the criteria first, and do not calculate the score. Under normal circumstances, we should first use the filter operation to filter out part of the data, and then use the query to accurately match the data to improve query efficiency

must query

When using musta query, documents must match all query conditions included therein.

{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        "term": {
    
    
          "age": 20
        }
      ]
    }
  }
}

This query is equivalent to the corresponding SQLstatement below

SELECT * FROM xxx WHERE age = 20;

When using mustit, you can specify multiple query conditions at the same time. In DSL, it is expressed in the form of an array, and the effect is similar to the ANDoperation in SQL. For example the following example:

{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
     "term": {
    
     "age": 20 } },
        {
    
     "term": {
    
     "gender": "male" } }
      ]
    }
  }
}

should query

shouldA query is similar to a statement in SQL OR. When two or more conditions are included, the result of the query must satisfy at least one of them. When there is only one query condition, that is, the result must satisfy that condition.

{
    
    
  "query": {
    
    
    "bool": {
    
    
      "should": [
        {
    
     "term": {
    
     "age": 20 } },
        {
    
     "term": {
    
     "gender": "male" } },
        {
    
     "range": {
    
     "height": {
    
     "gte": 170 } } },
      ]
    }
  }
}

This query is equivalent to the corresponding SQL statement below:

SELECT * FROM xxx WHERE age = 20 OR gender = "male" or height >= 170;

shouldORThe difference between queries and operations in SQL is that shouldqueries can use minimum_should_matchparameters to specify at least several conditions that need to be met. For example, in the following example, the query result needs to meet two or more query conditions:

{
    
    
  "query": {
    
    
    "bool": {
    
    
      "should": [
        {
    
     "term": {
    
     "age": 20 } },
        {
    
     "term": {
    
     "gender": "male" } },
        {
    
     "term": {
    
     "height": 170 } },
      ],
      "minimum_should_match": 2
    }
  }
}

If there is no or in the same boolstatement , the default value is 1, that is, at least one of the conditions must be met; but if there are other or exist , the default value of minimum_should_match is 0.mustfilterminimum_should_matchmustfilterThat is to say, the should query will fail by default

For example, in the query below, all returned documents must have an age value of 20, but may include documents whose status value is not "active". If you need both to take effect at the same time, you can add a parameter "minimum_should_match": 1 to the bool query as in the above example.

{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": {
    
    
        "term": {
    
    
          "age": 20
        },
      },
      "should": {
    
    
        "term": {
    
    
          "status": "active"
        }
      },
       "minimum_should_match": 1
    }
  }
}

must_not query

must_notA query is similar to an operation in an SQL statement NOT, and it will only return documents that do not meet the specified criteria. For example:

{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must_not": [
        {
    
     "term": {
    
     "age": 20 } },
        {
    
     "term": {
    
     "gender": "male" } }
      ]
    }
  }
}

This query is equivalent to the following SQL query statement (because MySQL does not support the following statement using NOT, so it is rewritten to use !=implementation):

SELECT * FROM xxx WHERE age != 20 AND gender != "male";

In addition, must_notas with filterthe filter, it does not need to calculate the score of the document, so the corresponding score of the returned result is 0.

filter query

When using filterquery, its effect is equivalent to mustquery, but different from mustquery, first filter out the documents that meet the conditions, and do not calculate the score

For example, the following query will return all documents with a value statusof ."active"0.0

{
    
    
  "query": {
    
    
    "bool": {
    
    
      "filter": {
    
    
        "term": {
    
    
          "status": "active"
        }
      }
    }
  }
}

Boolean combination query

We can also do nested queries within individual queries. But it should be noted that the Boolean query must be included boolin the query statement, so the query statement must be used again inside the nested query bool.

{
    
    
  "query": {
    
    
    "bool": {
    
     
      "must": [
        {
    
    
          "bool": {
    
    
            "should": [
              {
    
     "term": {
    
     "age": 20 } },
              {
    
     "term": {
    
     "age": 25 } }
            ]
          }
        },
        {
    
    
          "range": {
    
    
            "level": {
    
    
              "gte": 3
            }
          }
        }
      ]
    }
  }
}

This query statement is equivalent to the following SQL statement:

SELECT * FROM xxx WHERE (age = 20 OR age = 25) AND level >= 3;

Multi-field query [multi_match]

After the query condition is divided into words, it will be used for query separately

For example, instant noodles will be divided into "paste" and "noodles" and then taken separately for query

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "泡面",
      "fields": ["title","description"]
    }
  }
}

Default field word segmentation query [query_string]

  • If the type of the query field is not word-segmented, query without word-segmentation
  • If the type of the query field is word-segmented, use the word-segment query
GET /products/_search
{
  "query": {
    "query_string": {
      "default_field": "description",
      "query": "xxxx"
    }
  }
}

Highlight query [highlight]

Key words in eligible documents can be highlighted

  • Only the fields whose type is text can be highlighted
  • *means match all fields
  • Highlighting does not modify the original document, but puts the highlighted result in a highlight
GET /products/_search
{
  "query": {
    "term": {
      "description": {
        "value": "泡面"
      }
    }
  },
  "highlight": {
    "fields": {
      "*":{}
    }
  }
}

Custom highlight html tags : can be used in highlight pre_tagsandpost_tags

GET /products/_search
{
  "query": {
    "term": {
      "description": {
        "value": "xxx"
      }
    }
  },
  "highlight": {
    "post_tags": ["</span>"], 
    "pre_tags": ["<span style='color:red'>"],
    "fields": {
      "*":{}
    }
  }
}

Multi-field highlighting Use to require_field_matchenable multiple field highlighting

GET /products/_search
{
  "query": {
    "term": {
      "description": {
        "value": "xxx"
      }
    }
  },
  "highlight": {
    "require_field_match": "false",
    "post_tags": ["</span>"], 
    "pre_tags": ["<span style='color:red'>"],
    "fields": {
      "*":{}
    }
  }
}

Return the specified number of items [size]

size keyword : specify the specified number of items to be returned in the query result. The default return value is 10

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "size": 5
}

Paging query [form]

from keyword : used to specify the starting return position, used in conjunction with the size keyword to achieve paging effect

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "size": 5,
  "from": 0  #(page-1)*
}

Specify field sorting [sort]

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

Return the specified field [_source]

_source keyword : It is an array, which is used to specify which fields to display in the array

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["title","description"]
}

Index principle

An inverted index is also called a reverse index, where there is a forward direction, there is a reverse direction. The forward index is to find the value through the key, and the reverse index is to find the key through the value.

When the bottom layer of ES is searching, the bottom layer uses the inverted index

test case

The existing indexes and mappings are as follows:

{
    
    
  "products" : {
    
    
    "mappings" : {
    
    
      "properties" : {
    
    
        "description" : {
    
    
          "type" : "text"
        },
        "price" : {
    
    
          "type" : "float"
        },
        "title" : {
    
    
          "type" : "keyword"
        }
      }
    }
  }

Enter the following data

_id title price description
1 Blue Moon Laundry Detergent 19.9 Blue Moon laundry detergent is very efficient
2 iphone13 19.9 very nice phone
3 Little raccoon crisp noodles 1.5 Raccoons are delicious

Visual representation

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-EWFtAtiJ-1655732344386)(ElasticSearch.assets/image-20220410092110246.png)]

  • es builds an index based on whether the field can be word-segmented. If it can be word-segmented, it builds an index on the word; if it cannot, it builds an index on the entire field:
    • For example, the keyword type cannot be word-segmented: when indexing, the entire field value is used as the index
    • The text type can be word-segmented, and the field value will be word-segmented before building an index, and then the index will be built
  • The es index and the innodb engine of mysql create an index type. The key of the index structure stores the index field, and the value stores the id value of the entire piece of data. When querying, first find the id value through the index, and then go to the metadata area to find the corresponding entire piece of data according to the id value Documentation

Guess you like

Origin blog.csdn.net/qq_50596778/article/details/125381193