ElasticSearch Must Know - Basics

JD Logistics Kang Rui Yao Zaiyi Li Zhen Liu Bin Wang Beiyong

Note: All of the following are based on eslaticsearch version 8.1

1. Definition of index

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/indices.html

Global Awareness of Indexes

ElasticSearch Mysql
Index Table
Type Obsolete Table Obsolete
Document Row
Field Column
Mapping Schema
Everything is indexed Index
Query DSL SQL
GET http://... select * from
POST http://... update table set ...
Aggregations group by\sum\sum
cardinality deduplication distinct
reindex data migration

index definition

Definition: The combination of documents with the same document structure (Mapping) is marked by a unique index name. There are multiple indexes in a cluster. Different indexes represent different business types. Data Notes: Index names do not support uppercase. Name, uppercase is supported, but it is recommended that all lowercase be unified

index creation

index-settings parameter analysis

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-modules.html

Note: After the static parameter index is created, it can no longer be modified, and the dynamic parameter can be modified. Thinking: 1. Why can’t the primary shard be modified after it is created? A document is routed to a particular shard in an index using the following formula: <shard_num = hash(_routing) % num_primary_shards> the default value userd for _routing is the document`s _id es. The data written in is calculated according to the above formula Which shard should the document be stored in? Subsequent document reading is also based on this formula. Once the number of shards changes, the data will not be found. Simply understand, Hash is done according to the ID and then divided by the number of primary shards to get the remainder, and the dividend changes. , the result is different. 2. If the business level really needs to expand the number of primary shards according to the data situation, what should we do? reindex migrate data to another index https://www.elastic.co/guide/en/elasticsearch/reference/8.1/docs-reindex.html

Basic operations of indexes


2. Dynamic of Mapping-Param

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic.html

core function

Adding a field after automatically detecting the field type means that even if you do not define the field in the mapping of es, es will dynamically detect the field type for you

Getting to know dynamic

// 删除test01索引,保证这个索引现在是干净的
DELETE test01

// 不定义mapping,直接一条插入数据试试看,
POST test01/_doc/1
{
  "name":"kangrui10"
}

// 然后我们查看test01该索引的mapping结构 看看name这个字段被定义成了什么类型
// 由此可以看出,name一级为text类型,二级定义为keyword,但其实这并不是我们想要的结果,
// 我们业务查询中name字段并不会被分词查询,一般都是全匹配(and name = xxx)
// 以下的这种结果,我们想要实现全匹配 就需要 name.keyword = xxx  反而麻烦
GET test01/_mapping
{
  "test01" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

optional value for dynamic

optional value illustrate Explanation
true New fields are added to the mapping (default). When creating a mapping, if you do not specify a dynamic value, the default is true, that is, if your field does not receive the specified type, es will help you dynamically match the field type
false New fields are ignored. These fields will not be indexed or searchable, but will still appear in the _source field of returned hits. These fields will not be added to the mapping, and new fields must be added explicitly. If it is set to false, if your field is not created in the mapping of es, then the new field can also be written, but it cannot be queried, and there will be no such field in the mapping, that is, the field to be written. will be indexed
strict If new fields are detected, an exception is thrown and the document is rejected. New fields must be explicitly added to the mapping. If it is set to strict, if the new field is not created in the mapping, an error will be reported directly if it is added. It is recommended for the production environment and is more rigorous. The example is as follows, if you want to add a new field, you must manually add a new field

Disadvantages of dynamic mapping

  • Field matches are relatively accurate, but not necessarily what users expect
  • For example, if there is a text field now, es will only set you as the default standard tokenizer, but what we generally need is the ik Chinese tokenizer
  • Take up excess storage space
  • The string type matches two types: text and keyword, which means it will take up more storage space
  • mapping explosion
  • If you accidentally write the wrong query statement, get is used as a put misoperation, and many fields will be created by mistake

3. doc_values ​​of Mapping-Param

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/doc-values.html

core function

DocValue is actually an additional ordered forward index (based on document => field value mapping list) when Lucene builds an inverted index. DocValue is essentially a serialized columnar storage. This structure is very suitable for aggregation (aggregations), sorting (Sorting), scripts (scripts access to field) and other operations. Moreover, this storage method is also very convenient for compression, especially for numeric types. This reduces disk space and improves access speed. DocValue is supported for almost all field types except text and annotated_text fields.

What is a forward index

The forward index is actually similar to the database table, which associates the id with the data, and obtains the corresponding data by searching the document id

doc_values ​​optional values

  • true: default value, enabled by default
  • false: It needs to be specified manually. After setting to false, sort, aggregate, and access the field from script will not be available, but it will save disk space

real practice

// 创建一个索引,test03,字段满足以下条件
//     1. speaker: keyword
//     2. line_id: keyword and not aggregateable
//     3. speech_number: integer
PUT test03
{
  "mappings": {
    "properties": {
      "speaker": {
        "type": "keyword"
      },
      "line_id":{
        "type": "keyword",
        "doc_values": false
      },
      "speech_number":{
        "type": "integer"
      }
    }
  }
}

4. Tokenizer analyzers

ik Chinese word breaker installation

https://github.com/medcl/elasticsearch-analysis-ik

What is an inverted index

The process of data indexing

Classification of tokenizers

Official website address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-analyzers.html


Five. Custom word segmentation

Custom word breaker syllogism

1.Character filters character filtering

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-charfilters.html can be configured with 0 or more

HTML Strip Character Filter : Purpose: Remove HTML elements, such as <b>, and decode HTML entities, such as &

Mapping Character Filter : Purpose: Replace specified characters

Pattern Replace Character Filter : Purpose: Replace specified characters based on regular expressions

2.Tokenizer text is cut into word segmentation

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenizers.html#_word_oriented_tokenizers Only one tokenizer can be configured to segment the text

3. Token filters filter after word segmentation

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenfilters.html can be configured with 0 or more word segmentation before processing, such as lowercase, delete some special Stop words, add synonyms, etc.

real practice

There is a document whose content is similar to dag & cat. It is required to index this document, and use match_parse_query to query dag & cat or dag and cat, and you can find the topic analysis: 1. What is match_parase_query: match_phrase will segment the search keywords into words. The word segmentation results of match_phrase must be included in the word segmentation of the searched field, and the order must be the same, and the default must be continuous. 2. To achieve the equivalent of & and and query results, you need to customize the word breaker to achieve it. Customized requirements 3. How to customize a word breaker: https://www.elastic.co/guide/en /elasticsearch/reference/8.1/analysis-custom-analyzer.html 4. Solution 1 core function points, Mapping Character Filter 5. Solution 2 core function points, https://www.elastic.co/guide/en/elasticsearch /reference/8.1/analysis-synonym-tokenfilter.html

Solution 1

# 新建索引
PUT /test01
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "char_filter": [
            "my_mappings_char_filter"
          ],
          "tokenizer": "standard",
        }
      },
      "char_filter": {
        "my_mappings_char_filter": {
          "type": "mapping",
          "mappings": [
            "& => and"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}
// 说明
// 三段论之Character filters,使用char_filter进行文本替换
// 三段论之Token filters,使用默认分词器
// 三段论之Token filters,未设定
// 字段content 使用自定义分词器my_analyzer

# 填充测试数据
PUT test01/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}

# 执行测试,doc & cat || oc and cat 结果输出都为两条
POST test01/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "content": "doc & cat"
          }
        }
      ]
    }
  }
}

Solution 2

# 解题思路,将& 和 and  设定为同义词,使用Token filters
# 创建索引
PUT /test02
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_synonym_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "my_synonym"
          ]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "lenient": true,
          "synonyms": [
            "& => and"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_synonym_analyzer"
      }
    }
  }
}
// 说明
// 三段论之Character filters,未设定
// 三段论之Token filters,使用whitespace空格分词器,为什么不用默认分词器?因为默认分词器会把&分词后剔除了,就无法在去做分词后的过滤操作了
// 三段论之Token filters,使用synony分词后过滤器,对&和and做同义词
// 字段content 使用自定义分词器my_synonym_analyzer

# 填充测试数据
PUT test02/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}

# 执行测试
POST test02/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "content": "doc & cat"
          }
        }
      ]
    }
  }
}

Six.multi-fields

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/multi-fields.html

// 单字段多类型,比如一个字段我想设置两种分词器
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "analyzer":"standard",
        "fields": {
          "fieldText": { 
            "type":  "text",
            "analyzer":"ik_smart",
          }
        }
      }
    }
  }
}

Seven. runtime_field runtime field

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime.html

Background

If the business needs to sort according to the difference between two numeric fields, that is, I need a field that does not exist, what should I do at this time? Of course, you can swipe the number and add a difference result field to achieve it. What if you are not allowed to swipe the number to add a new field at this time?

solution

Application Scenario

  1. Add new fields to existing documents without reindexing
  2. Manipulate data without understanding data structures
  3. Overwrites the value returned from the original index field at query time
  4. Define fields for specific purposes without modifying the underlying schema

Features

  1. Lucene is completely insensitive, because it is not indexed and has no doc_values
  2. Scoring is not supported because there is no inverted index
  3. Breaking the traditional way of defining first and then using
  4. Can prevent mapping explosion
  5. Increased API flexibility
  6. Note that this will slow down the search

actual use

  • Retrieval is specified at runtime, that is, the retrieval link can be used (that is, even if there is no such field in the mapping, I can still query)
  • Dynamic or static mapping specification, that is, the mapping link can be used (that is, adding a runtime field to the mapping)

Real question exercise 1

# 假定有以下索引和数据
PUT test03
{
  "mappings": {
    "properties": {
      "emotion": {
        "type": "integer"
      }
    }
  }
}
POST test03/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}

# 要求:emotion > 5, 返回emotion_falg = '1',  
# 要求:emotion < 5, 返回emotion_falg = '-1',  
# 要求:emotion = 5, 返回emotion_falg = '0',  

Solution 1

Specify the runtime field when searching: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html This field does not exist in essence, so it needs to be added when searching fields*

GET test03/_search
{
  "fields": [
    "*"
  ], 
  "runtime_mappings": {
    "emotion_falg": {
      "type": "keyword",
      "script": {
        "source": """
          if(doc['emotion'].value>5)emit('1');
          if(doc['emotion'].value<5)emit('-1');
          if(doc['emotion'].value==5)emit('0');
          """
      }
    }
  }
}

Solution 2

Specify runtime fields when creating an index: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-mapping-fields.html This method supports retrieval through runtime fields

# 创建索引并指定运行时字段
PUT test03_01
{
  "mappings": {
    "runtime": {
      "emotion_falg": {
        "type": "keyword",
        "script": {
          "source": """
          if(doc['emotion'].value>5)emit('1');
          if(doc['emotion'].value<5)emit('-1');
          if(doc['emotion'].value==5)emit('0');
          """
        }
      }
    },
    "properties": {
      "emotion": {
        "type": "integer"
      }
    }
  }
}
# 导入测试数据
POST test03_01/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}
# 查询测试
GET test03_01/_search
{
  "fields": [
    "*"
  ]
}

Real test practice 2

# 有以下索引和数据
PUT test04
{
  "mappings": {
    "properties": {
      "A":{
        "type": "long"
      },
      "B":{
        "type": "long"
      }
    }
  }
}
PUT task04/_bulk
{"index":{"_id":1}}
{"A":100,"B":2}
{"index":{"_id":2}}
{"A":120,"B":2}
{"index":{"_id":3}}
{"A":120,"B":25}
{"index":{"_id":4}}
{"A":21,"B":25}

# 需求:在task04索引里,创建一个runtime字段,其值是A-B,名称为A_B; 创建一个range聚合,分为三级:小于0,0-100,100以上;返回文档数
// 使用知识点:
// 1.检索时指定运行时字段: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html
// 2.范围聚合 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-range-aggregation.html

solution

# 结果测试
GET task04/_search
{
  "fields": [
    "*"
  ], 
  "size": 0, 
  "runtime_mappings": {
    "A_B": {
      "type": "long",
      "script": {
        "source": """
          emit(doc['A'].value - doc['B'].value);
          """
      }
    }
  },
  "aggs": {
    "price_ranges_A_B": {
      "range": {
        "field": "A_B",
        "ranges": [
          { "to": 0 },
          { "from": 0, "to": 100 },
          { "from": 100 }
        ]
      }
    }
  }
}

八.Search-highlighted

Introduction to highlighted grammar

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/highlighting.html

Nine. Search-Order

Introduction to Order Grammar

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/sort-search-results.html

// 注意:text类型默认是不能排或聚合的,如果非要排序或聚合,需要开启fielddata
GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {
      "customer_last_name": "wood"
    }
  },
  "highlight": {
    "number_of_fragments": 3,
    "fragment_size": 150,
    "fields": {
      "customer_last_name": {
        "pre_tags": [
          "<em>"
        ],
        "post_tags": [
          "</em>"
        ]
      }
    }
  },
  "sort": [
    {
      "currency": {
        "order": "desc"
      },
      "_score": {
        "order": "asc"
      }
    }
  ]
}

10. Search-Page

Introduction to page grammar

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/paginate-search-results.html

# 注意 from的起始值是 0 不是 1
GET kibana_sample_data_ecommerce/_search
{
  "from": 5,
  "size": 20,
  "query": {
    "match": {
      "customer_last_name": "wood"
    }
  }
}

Real question exercise 1

# 题目
In the spoken lines of the play, highlight the word Hamlet (int the text_entry field) startint the highlihnt with "#aaa#" and ending it with "#bbb#"
return all of speech_number field lines in reverse order; '20' speech lines per page,starting from line '40'

# highlight 处理 text_entry 字段 ; 关键词 Hamlet 高亮
# page分页:from:40;size:20
# speech_number:倒序

POST test09/_search
{
  "from": 40,
  "size": 20,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "text_entry": "Hamlet"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "text_entry": {
        "pre_tags": [
          "#aaa#"
        ],
        "post_tags": [
          "#bbb#"
        ]
      }
    }
  },
  "sort": [
    {
      "speech_number.keyword": {
        "order": "desc"
      }
    }
  ]
}

11. Search-AsyncSearch

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/async-search.html

release version

7.7.0

Applicable scene

Allows users to retrieve search results as they occur asynchronously, eliminating the need to wait for a final response only after the query completes

Common commands

  • perform asynchronous retrieval
  • POST /sales*/_async_search?size=0
  • View asynchronous retrieval
  • GET /_async_search/id值
  • View asynchronous retrieval status
  • GET /_async_search/id值
  • Delete, terminate asynchronous retrieval
  • DELETE /_async_search/id值

Description of asynchronous query results

return value meaning
id The unique identifier returned by the asynchronous retrieval
is_partial Indicates whether the search succeeded or failed on all shards when the query is no longer running. is_partial=true when executing the query
is_running Whether the search should still be performed
total How many shards the search will be performed on
successful How many shards have successfully completed the search

Twelve. Aliases index aliases

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/aliases.html

The role of Aliases

In ES, index aliases (index aliases) are like a shortcut or soft link that can point to one or more indexes. Aliases give us great flexibility, we can use index aliases to achieve the following functions:

  1. Seamlessly switch from one index to another in a running ES cluster (without downtime)
  2. To group multiple indexes, such as indexes created by month, we can construct an index of the last 3 months through an alias
  3. Query part of the data in an index to form a database-like view (views

Assuming there is no alias, how to handle multi-index retrieval

Method 1: POST index_01, index_02.index_03/_search Method 2: POST index*/search

Three ways to create an alias

  1. Specify an alias while creating the index
# 指定test05的别名为 test05_aliases
PUT test05
{
  "mappings": {
    "properties": {
      "name":{
        "type": "keyword"
      }
    }
  },
  "aliases": {
    "test05_aliases": {}
  }
}
  1. Specify an alias by using an index template
PUT _index_template/template_1
{
  "index_patterns": ["te*", "bar*"],
  "template": {
    "settings": {
      "number_of_shards": 1
    },
    "mappings": {
      "_source": {
        "enabled": true
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    },
    "aliases": {
      "mydata": { }
    }
  },
  "priority": 500,
  "composed_of": ["component_template1", "runtime_component_template"], 
  "version": 3,
  "_meta": {
    "description": "my custom"
  }
}
  1. Create an alias for an existing index
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "logs-nginx.access-prod",
        "alias": "logs"
      }
    }
  ]
}

delete alias

POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "logs-nginx.access-prod",
        "alias": "logs"
      }
    }
  ]
}

Real question exercise 1

# Define an index alias for 'accounts-row' called 'accounts-male': Apply a filter to only show the male account owners
# 为'accounts-row'定义一个索引别名,称为'accounts-male':应用一个过滤器,只显示男性账户所有者

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "accounts-row",
        "alias": "accounts-male",
        "filter": {
          "bool": {
            "filter": [
              {
                "term": {
                  "gender.keyword": "male"
                }
              }
            ]
          }
        }
      }
    }
  ]
}

Thirteen. Search-template

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-template.html

Features

Templates accept parameters that can be specified at runtime. Search templates are stored server-side and can be modified without changing client-side code.

Getting to know search-template

# 创建检索模板
PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "{{query_key}}": "{{query_value}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    }
  }
}

# 使用检索模板查询
GET my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "query_key": "your filed",
    "query_value": "your filed value",
    "from": 0,
    "size": 10
  }
}

Operations on Index Templates

Create index template

PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "message": "{{query_string}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    },
    "params": {
      "query_string": "My query string"
    }
  }
}

Validate Index Template

POST _render/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 20,
    "size": 10
  }
}

Execute Retrieval Template

GET my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 0,
    "size": 10
  }
}

Get all search templates

GET _cluster/state/metadata?pretty&filter_path=metadata.stored_scripts

delete search template

DELETE _scripts/my-search-templateath=metadata.stored_scripts

14. Search-dsl simple search

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl.html

Search selection

search category

custom scoring

How to customize scoring

1. Index Boost index level modification correlation

// 一批数据里,有不同的标签,数据结构一致,不同的标签存储到不同的索引(A、B、C),最后要严格按照标签来分类展示的话,用什么查询比较好?
// 要求:先展示A类,然后B类,然后C类

# 测试数据如下
put /index_a_123/_doc/1
{
  "title":"this is index_a..."
}
put /index_b_123/_doc/1
{
  "title":"this is index_b..."
}
put /index_c_123/_doc/1
{
  "title":"this is index_c..."
}
# 普通不指定的查询方式,该查询方式下,返回的三条结果数据评分是相同的
POST index_*_123/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "this"
          }
        }
      ]
    }
  }
}

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-search.html
indices_boost
# 也就是索引层面提升权重
POST index_*_123/_search
{
  "indices_boost": [
    {
      "index_a_123": 10
    },
    {
      "index_b_123": 5
    },
    {
      "index_c_123": 1
    }
  ], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "this"
          }
        }
      ]
    }
  }
}

2.boosting modify document correlation

某索引index_a有多个字段, 要求实现如下的查询:
1)针对字段title,满足'ssas'或者'sasa’。
2)针对字段tags(数组字段),如果tags字段包含'pingpang',
则提升评分。
要求:写出实现的DSL?

# 测试数据如下
put index_a/_bulk
{"index":{"_id":1}}
{"title":"ssas","tags":"basketball"}
{"index":{"_id":2}}
{"title":"sasa","tags":"pingpang; football"}

# 解法1
POST index_a/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "title": "ssas"
                }
              },
              {
                "match": {
                  "title": "sasa"
                }
              }
            ]
          }
        }
      ],
      "should": [
        {
          "match": {
            "tags": {
              "query": "pingpang",
              "boost": 1
            }
            
          }
        }
      ]
    }
  }
}
# 解法2
// https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
POST index_a/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "function_score": {
            "query": {
              "match": {
                "tags": {
                  "query": "pingpang"
                }
              }
            },
            "boost": 1
          }
        }
      ],
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "title": "ssas"
                }
              },
              {
                "match": {
                  "title": "sasa"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

3. negative_boost reduces correlation

对于某些结果不满意,但又不想通过 must_not 排除掉,可以考虑可以考虑boosting query的negative_boost。
即:降低评分
negative_boost
(Required, float) Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the negative query.
官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-boosting-query.html

POST index_a/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "tags": "football"
        }
      },
      "negative": {
        "term": {
          "tags": "pingpang"
        }
      },
      "negative_boost": 0.5
    }
  }
}

4. function_score custom scoring

如何同时根据 销量和浏览人数进行相关度提升?
问题描述:针对商品,例如有想要有一个提升相关度的计算,同时针对销量和浏览人数?
例如oldScore*(销量+浏览人数)
**************************  
商品        销量        浏览人数  
A         10           10      
B         20           20
C         30           30
************************** 
# 示例数据如下    
put goods_index/_bulk
{"index":{"_id":1}}
{"name":"A","sales_count":10,"view_count":10}
{"index":{"_id":2}}
{"name":"B","sales_count":20,"view_count":20}
{"index":{"_id":3}}
{"name":"C","sales_count":30,"view_count":30}

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
知识点:script_score

POST goods_index/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "script_score": {
        "script": {
          "source": "_score * (doc['sales_count'].value+doc['view_count'].value)"
        }
      }
    }
  }
}

15. Search-del Bool complex search

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-bool-query.html

basic grammar

real practice

写一个查询,要求某个关键字再文档的四个字段中至少包含两个以上
功能点:bool 查询,should / minimum_should_match
    1.检索的bool查询
    2.细节点 minimum_should_match
注意:minimum_should_match 当有其他子句的时候,默认值为0,当没有其他子句的时候默认值为1

POST test_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "filed1": "kr"
          }
        },
        {
          "match": {
            "filed2": "kr"
          }
        },
        {
          "match": {
            "filed3": "kr"
          }
        },
        {
          "match": {
            "filed4": "kr"
          }
        }
      ],
      "minimum_should_match": 2
    }
  }
}

16. Search-Aggregations

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations.html

aggregate classification

Bucket aggregation (bucket)

terms

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-terms-aggregation.html
# 按照作者统计文档数
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_user": {
      "terms": {
        "field": "user",
        "size": 1
      }
    }
  }
}

date_histogram

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-datehistogram-aggregation.html
# 按照up_time 按月进行统计
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_up_time": {
      "date_histogram": {
        "field": "up_time",
        "calendar_interval": "month"
      }
    }
  }
}

Indicator aggregation (metrics)

Max

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-max-aggregation.html
# 获取up_time最大的
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_max_up_time": {
      "max": {
        "field": "up_time"
      }
    }
  }
}

Top_hits

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-top-hits-aggregation.html
# 根据user聚合只取一个聚合结果,并且获取命中数据的详情前3条,并按照指定字段排序
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "terms_agg_user": {
      "terms": {
        "field": "user",
        "size": 1
      },
      "aggs": {
        "top_user_hits": {
          "top_hits": {
            "_source": {
              "includes": [
                "video_time",
                "title",
                "see",
                "user",
                "up_time"
              ]
            }, 
            "sort": [
              {
                "see":{
                  "order": "desc"
                }
              }
            ], 
            "size": 3
          }
        }
      }
    }
  }
}

// 返回结果如下
{
  "took" : 91,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "terms_agg_user" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 975,
      "buckets" : [
        {
          "key" : "Elastic搜索",
          "doc_count" : 25,
          "top_user_hits" : {
            "hits" : {
              "total" : {
                "value" : 25,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "5ccCVoQBUyqsIDX6wIcm",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "03:45",
                    "see" : "92",
                    "up_time" : "2021-03-19",
                    "title" : "Elastic 社区大会2021: 用加 Gatling 进行Elasticsearch的负载测试,寓教于乐。",
                    "user" : "Elastic搜索"
                  },
                  "sort" : [
                    "92"
                  ]
                },
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "8scCVoQBUyqsIDX6wIgn",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "10:18",
                    "see" : "79",
                    "up_time" : "2020-10-20",
                    "title" : "为Elasticsearch启动htpps访问",
                    "user" : "Elastic搜索"
                  },
                  "sort" : [
                    "79"
                  ]
                },
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "7scCVoQBUyqsIDX6wIcm",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "04:41",
                    "see" : "71",
                    "up_time" : "2021-03-19",
                    "title" : "Elastic 社区大会2021: Elasticsearch作为一个地理空间的数据库",
                    "user" : "Elastic搜索"
                  },
                  "sort" : [
                    "71"
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Sub-aggregation (Pipeline)

Pipeline: aggregation-based aggregation official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline.html

bucket_selector

Official website document address: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-bucket-selector-aggregation.html

# 根据order_date按月分组,并且求销售总额大于1000
POST kibana_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "date_his_aggs": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sum_aggs": {
          "sum": {
            "field": "total_unique_products"
          }
        },
        "sales_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "totalSales": "sum_aggs"
            },
            "script": "params.totalSales > 1000"
          }
        }
      }
    }
  }
}

real practice

earthquakes索引中包含了过去30个月的地震信息,请通过一句查询,获取以下信息
l 过去30个月,每个月的平均 mag
l 过去30个月里,平均mag最高的一个月及其平均mag
l 搜索不能返回任何文档
    
max_bucket 官网地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-max-bucket-aggregation.html

POST earthquakes/_search
{
  "size": 0, 
  "query": {
    "range": {
      "time": {
        "gte": "now-30M/d",
        "lte": "now"
      }
    }
  },
  "aggs": {
    "agg_time_his": {
      "date_histogram": {
        "field": "time",
        "calendar_interval": "month"
      },
      "aggs": {
        "avg_aggs": {
          "avg": {
            "field": "mag"
          }
        }
      }
    },
    "max_mag_sales": {
      "max_bucket": {
        "buckets_path": "agg_time_his>avg_aggs" 
      }
    }
  }
}
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/5956362