Write directory title here
- Advanced Search
-
- ==MATCHING Query[match_all]==
- ==keyword query [term]==
- range query [range]
- prefix query [prefix]
- Wildcard query [wildcard]
- Query by id array [ids]
- Fuzzy query [fuzzy]
- Boolean query [bool]
- Multi-field query [multi_match]
- Default field word segmentation query [query_string]
- Highlight query [highlight]
- Return the specified number of items [size]
- Paging query [form]
- Specify field sorting [sort]
- Return the specified field [_source]
- Index principle
Advanced Search
ES provides a powerful method of retrieving data. This retrieval method is known as the Query DSL
use Query DSL
of Rest API to transfer JSON-formatted request body (Request Body) data to interact with ES. The rich query syntax of this method allows ES to retrieve Be more powerful and more concise .
match query [match_all]
- match_all: returns all documents in the index
- match : The search term will be segmented first, and then matched with the target query field. If any word in the segment matches the target field, it can be queried
- match_phrase : Do not divide the search word into words, and require the search word and field content to be matched in an orderly and coherent manner. All words and sequences need to be exactly the same, except for punctuation marks
- match_phrase_prefix : Similar to match_phrase usage, the difference is that prefix matching is allowed
Explain the difference between them with an example
- First store a piece of data. The
i like eating and cooking
default tokenizer should divide the content into "i
" "like
" "eating
"and
" " "kuing
"
query term/match type | match | m_phrase | m_p_prefix |
---|---|---|---|
i | ✅ | ✅ | ✅ |
i like | ✅ | ✅ | ✅ |
i like singing | ✅ | ❌ | ❌ |
i like ea | ✅ | ❌ | ✅ |
and | ✅ | ✅ | ✅ |
Summarize:
match
Will split the search word into words before matching,match_phrase
andmatch_phrase_prefix
will not split the search word into wordsmatch
andmatch_phrase
are exact matches,match_phrase
which require an orderly and coherent match between the search term and the field contentmatch_phrase_prefix
It is not an exact match, itmatch_phrase
allows the last word to use a prefix match on the basis of
Keyword query [term]
term keyword : use keyword query
- keyword type: When using term to query
keyword
a field of type, all content needs to be matched - Integer type, double type, date type: no word segmentation, must match all
- text type: default es standard tokenizer, Chinese word segmentation, English word segmentation
So except for the text type, other types are not word-segmented
The standard tokenizer is used in es by default, Chinese word segmentation, English word segmentation
#查询语句
GET /products/_search
{
"query": {
"term": {
"title": {
"value": "猪猪侠"
}
}
}
}
#结果
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2039728,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "rj5iCYABiB8dDekOlCwE",
"_score" : 1.2039728,
"_source" : {
"id" : 2,
"title" : "猪猪侠",
"price" : 0.5,
"created_at" : "2022-04-08",
"description" : "5毛钱一包"
}
}
]
}
range query [range]
range keyword: used to query documents within a specified range
#范围查询 range
GET /products/_search
{
"query": {
"range": {
"字段名": {
"gte": 2, #下界
"lte": 4 #上界
}
}
}
}
prefix query [prefix]
prefix keyword: query according to the prefix of the document
#前缀查询 prefix
GET /products/_search
{
"query": {
"prefix": {
"FIELD": {
"value": ""
}
}
}
}
Wildcard query [wildcard]
Wildcard queries can be used:
?
matches a character*
match multiple characters
#通配符查询
GET /products/_search
{
"query": {
"wildcard": {
"FIELD": {
"value": "VALUE"
}
}
}
}
Query by id array [ids]
Query documents through an id array
#通过一组id查询
GET /products/_search
{
"query": {
"ids": {
"values": [1,2]
}
}
}
Fuzzy query [fuzzy]
Fuzzy search for documents containing specified keywords
Notice:fuzzy 模糊查询 最大模糊错误 必须在0-2之间
- The length of the search keyword is 2, ambiguity is not allowed
- The length of the search keyword is 3-5, allowing one fuzzy
- The search keyword length is greater than 5, allowing a maximum of 2 blurs
GET /products/_search
{
"query": {
"fuzzy": {
"FIELD": "xxxx"
}
}
}
Boolean query [bool]
Elasticsearch can use the bool keyword to combine multiple conditions to achieve complex queries, similar to the operations used in SQL AND
, OR
andNOT
The Boolean logic types supported by Elasticsearch include the following:
Types include the following:
-
must
: The document must meet all the query conditions. When it contains multiple conditions, it is similar to that in SQL andAND
in operators.&&
-
should
: The document must meet any one or more of the query conditions (minimum_should_match
the number of conditions that need to be satisfied can be specified by specifying), when multiple conditions are included, it is similar to that in SQLOR
, and in operators||
-
must_not
: The document must not meet all of the query conditions, similar to SQLNOT
, and does not participate in the calculation of the score, and the returned branches are all 0 -
filter
:: Filter out the documents that meet the criteria first, and do not calculate the score. Under normal circumstances, we should first use the filter operation to filter out part of the data, and then use the query to accurately match the data to improve query efficiency
must query
When using must
a query, documents must match all query conditions included therein.
{
"query": {
"bool": {
"must": [
"term": {
"age": 20
}
]
}
}
}
This query is equivalent to the corresponding SQL
statement below
SELECT * FROM xxx WHERE age = 20;
When using must
it, you can specify multiple query conditions at the same time. In DSL, it is expressed in the form of an array, and the effect is similar to the AND
operation in SQL. For example the following example:
{
"query": {
"bool": {
"must": [
{
"term": {
"age": 20 } },
{
"term": {
"gender": "male" } }
]
}
}
}
should query
should
A query is similar to a statement in SQL OR
. When two or more conditions are included, the result of the query must satisfy at least one of them. When there is only one query condition, that is, the result must satisfy that condition.
{
"query": {
"bool": {
"should": [
{
"term": {
"age": 20 } },
{
"term": {
"gender": "male" } },
{
"range": {
"height": {
"gte": 170 } } },
]
}
}
}
This query is equivalent to the corresponding SQL statement below:
SELECT * FROM xxx WHERE age = 20 OR gender = "male" or height >= 170;
should
OR
The difference between queries and operations in SQL is that should
queries can use minimum_should_match
parameters to specify at least several conditions that need to be met. For example, in the following example, the query result needs to meet two or more query conditions:
{
"query": {
"bool": {
"should": [
{
"term": {
"age": 20 } },
{
"term": {
"gender": "male" } },
{
"term": {
"height": 170 } },
],
"minimum_should_match": 2
}
}
}
If there is no or in the same bool
statement , the default value is 1, that is, at least one of the conditions must be met; but if there are other or exist , the default value of minimum_should_match is 0.must
filter
minimum_should_match
must
filter
That is to say, the should query will fail by default
For example, in the query below, all returned documents must have an age value of 20, but may include documents whose status value is not "active". If you need both to take effect at the same time, you can add a parameter "minimum_should_match": 1 to the bool query as in the above example.
{
"query": {
"bool": {
"must": {
"term": {
"age": 20
},
},
"should": {
"term": {
"status": "active"
}
},
"minimum_should_match": 1
}
}
}
must_not query
must_not
A query is similar to an operation in an SQL statement NOT
, and it will only return documents that do not meet the specified criteria. For example:
{
"query": {
"bool": {
"must_not": [
{
"term": {
"age": 20 } },
{
"term": {
"gender": "male" } }
]
}
}
}
This query is equivalent to the following SQL query statement (because MySQL does not support the following statement using NOT, so it is rewritten to use !=
implementation):
SELECT * FROM xxx WHERE age != 20 AND gender != "male";
In addition, must_not
as with filter
the filter, it does not need to calculate the score of the document, so the corresponding score of the returned result is 0.
filter query
When using filter
query, its effect is equivalent to must
query, but different from must
query, first filter out the documents that meet the conditions, and do not calculate the score
For example, the following query will return all documents with a value status
of ."active"
0.0
{
"query": {
"bool": {
"filter": {
"term": {
"status": "active"
}
}
}
}
}
Boolean combination query
We can also do nested queries within individual queries. But it should be noted that the Boolean query must be included bool
in the query statement, so the query statement must be used again inside the nested query bool
.
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"age": 20 } },
{
"term": {
"age": 25 } }
]
}
},
{
"range": {
"level": {
"gte": 3
}
}
}
]
}
}
}
This query statement is equivalent to the following SQL statement:
SELECT * FROM xxx WHERE (age = 20 OR age = 25) AND level >= 3;
Multi-field query [multi_match]
After the query condition is divided into words, it will be used for query separately
For example, instant noodles will be divided into "paste" and "noodles" and then taken separately for query
GET /products/_search
{
"query": {
"multi_match": {
"query": "泡面",
"fields": ["title","description"]
}
}
}
Default field word segmentation query [query_string]
- If the type of the query field is not word-segmented, query without word-segmentation
- If the type of the query field is word-segmented, use the word-segment query
GET /products/_search
{
"query": {
"query_string": {
"default_field": "description",
"query": "xxxx"
}
}
}
Highlight query [highlight]
Key words in eligible documents can be highlighted
- Only the fields whose type is text can be highlighted
*
means match all fields- Highlighting does not modify the original document, but puts the highlighted result in a highlight
GET /products/_search
{
"query": {
"term": {
"description": {
"value": "泡面"
}
}
},
"highlight": {
"fields": {
"*":{}
}
}
}
Custom highlight html tags : can be used in highlight
pre_tags
andpost_tags
GET /products/_search
{
"query": {
"term": {
"description": {
"value": "xxx"
}
}
},
"highlight": {
"post_tags": ["</span>"],
"pre_tags": ["<span style='color:red'>"],
"fields": {
"*":{}
}
}
}
Multi-field highlighting Use to
require_field_match
enable multiple field highlighting
GET /products/_search
{
"query": {
"term": {
"description": {
"value": "xxx"
}
}
},
"highlight": {
"require_field_match": "false",
"post_tags": ["</span>"],
"pre_tags": ["<span style='color:red'>"],
"fields": {
"*":{}
}
}
}
Return the specified number of items [size]
size keyword : specify the specified number of items to be returned in the query result. The default return value is 10
GET /products/_search
{
"query": {
"match_all": {}
},
"size": 5
}
Paging query [form]
from keyword : used to specify the starting return position, used in conjunction with the size keyword to achieve paging effect
GET /products/_search
{
"query": {
"match_all": {}
},
"size": 5,
"from": 0 #(page-1)*
}
Specify field sorting [sort]
GET /products/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"price": {
"order": "desc"
}
}
]
}
Return the specified field [_source]
_source keyword : It is an array, which is used to specify which fields to display in the array
GET /products/_search
{
"query": {
"match_all": {}
},
"_source": ["title","description"]
}
Index principle
An inverted index is also called a reverse index, where there is a forward direction, there is a reverse direction. The forward index is to find the value through the key, and the reverse index is to find the key through the value.
When the bottom layer of ES is searching, the bottom layer uses the inverted index
test case
The existing indexes and mappings are as follows:
{
"products" : {
"mappings" : {
"properties" : {
"description" : {
"type" : "text"
},
"price" : {
"type" : "float"
},
"title" : {
"type" : "keyword"
}
}
}
}
Enter the following data
_id | title | price | description |
---|---|---|---|
1 | Blue Moon Laundry Detergent | 19.9 | Blue Moon laundry detergent is very efficient |
2 | iphone13 | 19.9 | very nice phone |
3 | Little raccoon crisp noodles | 1.5 | Raccoons are delicious |
Visual representation
- es builds an index based on whether the field can be word-segmented. If it can be word-segmented, it builds an index on the word; if it cannot, it builds an index on the entire field:
- For example, the keyword type cannot be word-segmented: when indexing, the entire field value is used as the index
- The text type can be word-segmented, and the field value will be word-segmented before building an index, and then the index will be built
- The es index and the innodb engine of mysql create an index type. The key of the index structure stores the index field, and the value stores the id value of the entire piece of data. When querying, first find the id value through the index, and then go to the metadata area to find the corresponding entire piece of data according to the id value Documentation