ElasticSearch basically uses _1_ basic syntax

1. Important concepts

1.1 Index (index)

一个索引就是一个拥有几分相似特征的文档的集合, for example, there can be an index of product data, an index of order data, and an index of user data. 一个索引由一个名字来标识(必须全部是小写字母)When we want to index, search, update or delete documents in this index, we need to use the name.ES indexes can be regarded as individual databases in the MySQL service

1.2 Mapping

映射(mapping)是定义一个文档和它所包含的字段如何被存储和索引的过程, in the default configuration, ES can automatically create a mapping based on the inserted data, or manually create a mapping (usually manually create a mapping) . The mapping mainly includes field names, field types, etc.The mapping of ES can be regarded as a table in a database in the MySQL service

注意

  • Before ES 7.0 , an index could have multiple Mappings
  • After ES 7.0 , an index only corresponds toOne Mapping ( that is, there is only one table in one database )

1.3 Documentation

A document is a piece of data stored in the index . A document is the smallest unit that can be indexed. Documents in ES are represented by lightweight JSONformat data

Documents in ES can be regarded as the data of a table in a database in the MySQL service

2. Basic use

2.1 Index related operations

#添加索引
PUT /index_name

#查看ES中的所有索引 参数v表示显示所有索引时显示带上标题
GET _cat/indices?v 

#删除指定索引
DELETE /index_name


注意:索引没有修改操作

2.2 Mapping related operations

2.2.1 Data type (commonly used)

字符串类型 keyword(不分词) text(分词)
数字类型 integer long
小数类型 float double
布尔类型 boolean
日期类型 date

2.2.2 Create & Query Mapping

注意:ES没有修改和删除映射(mapping)的操作 If you want to delete or modify the mapping, just delete the index index and re-create the mapping

PUT /index_name
{
    
    
  //固定写法mappings
  "mappings": {
    
    
    //固定写法properties
    "properties": {
    
    
      //字段名
      "id":{
    
    
        //固定写法 type:数据类型
        "type":"integer"
      },
      "title":{
    
    
        "type":"keyword"
      },
      "price":{
    
    
        "type":"double"
      },
      "created_at":{
    
    
        "type": "date"
      },
      "description":{
    
    
        "type": "text"
      }
    }
  }
}

#查看某一个索引中的映射
GET /index_name/_mapping

2.3 Document related operations

2.3.1 Insert document

Insert document **(manually specify _id)**

// 为索引中添加数据,指定_id 
// 格式 POST /index_name/_doc/_id
POST /index_name/_doc/1
{
    
    
  "id":1, //注意:这里的id与文档id(_id)不是一个东西,但是尽量与_id保持一致
  "title":"布鲁克林篮网",
  "price":233,
  "created_at":"2001-10-20",
  "description":"拥有KD-KI的一只超级球队"
}

Insert document (automatically generate _id)

//后面不加_id 自动生成的_id是一串uuid
POST /index_name/_doc/
{
    
    
  //注意,手动生成_id,所以这里不在指定id的值
  "title":"金州勇士",
  "price":233,
  "created_at":"2001-10-21",
  "description":"拥有SC的一只超级球队"
}

2.3.2 Query documents based on _id

Note: It must be _id, that is, the document id

GET /index_name/_doc/_id

2.3.3 Delete documents according to _id

Note: It must be _id, that is, the document id

DELETE /index_name/_doc/_id

2.3.4 Modify the document according to _id

method one

//这种方式是先删除原始文档,然后插入,不会保留原始字段(不推荐)
PUT /index_name/_doc/_id
{
    
    
  "price":233
}

way two

POST /index_name/_doc/_id/_update
{
    
    
  "doc":{
    
    
    "xx":xxx
  }
}

2.3.5 Batch operation_bulk

Note: If one statement fails to execute, it will not affect the execution of other statements

Example 1: Add multiple pieces of data

POST /index_name/_doc/_bulk
{
    
    "index":{
    
    "_id":13}} //手动指定_id 下面的数据不允许格式化,只能写在一行
  {
    
    "id":13, "title":"休斯顿火箭","price":123,"created_at":"2001-10-21","description":"拥有SC的一只超级球队"}
{
    
    "index":{
    
    "_id":14}}
  {
    
    "id":14, "title":"底特律活塞","price":13, "created_at":"2003-10-21","description":"拥有SC的一只超级球队"}

Example 2: addition, deletion and modification

#批量操作 -增删改
POST /products/_doc/_bulk
{
    
    "index":{
    
    "_id":15}} // index就是添加操作
  {
    
    "id":13, "title":"夏洛特黄蜂","price":123,"created_at":"2001-10-21","description":"拥有MJ的一只超级球队"}
{
    
    "delete":{
    
    "_id":13}} // delete删除
{
    
    "update":{
    
    "_id":1}} //update更新
  {
    
    "doc":{
    
    "title":"OKC"}}

3. Advanced Query DSL

3.1 Simple query

3.1.2 Query all

GET /index_name/_search
{
    
    
  "query":{
    
    
    "match_all": {
    
    }
  }
}

3.1.3 term query

  • In addition to textthe type of field (word segmentation query,Only hit the participle to query successfully), the remaining fields are not word-segmented when querying term, that is, exact query
  • The default tokenizer used in ES is the standard tokenizer . The standard tokenizer performs word segmentation for English content, and for Chinese word segmentation
  • term only supports querying one field.
GET /products/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "price": {
    
     //字段名
        "value": "xxx" //字段值 其他都是固定写法
      }
    }
  }
}

3.1.4 Range query (range)

GET /products/_search
{
    
    
  "query": {
    
    
    "range": {
    
    
      "price": {
    
     //字段名 
        "gte": 10,
        "lte": 232
      }
    }
  }
}

3.1.5 Prefix query (prefix)

基于关键词的前缀查询 Keyword type or text type keyword after word segmentation

GET /products/_search
{
    
    
  "query": {
    
    
    "prefix": {
    
    
      "title": {
    
     //字段名
        "value": "夏洛特" //值
      }
    }
  }
}

3.1.6 Wildcard query

基于关键词的通配符查询 Keyword type or text type keyword after word segmentation

GET /products/_search
{
    
    
  "query": {
    
    
    "wildcard": {
    
    
      "title": {
    
     //字段名
        "value": "夏洛特*" // *表示匹配多个字符  ?表示一个一个字符
      }
    }
  }
}

3.1.7 Multi_id query (ids)

GET /products/_search
{
    
    
  "query": {
    
    
    "ids": {
    
    
      "values": [1, 2] //多个_id
    }
  }
}

3.1.8 Fuzzy query

常用于keyword类型的字段

GET /products/_search
{
    
    
  "query": {
    
    
    "fuzzy": {
    
    
      "title": "金州勇士" 
    }
  }
}

Note: fuzzy fuzzy query, the maximum fuzzy error must be between 0-2

  • Ambiguity is not allowed when the search keyword length is less than or equal to 2
  • When the search keyword length is 3-5, 1 fuzzing is allowed
  • When the search keyword length is greater than 5, a maximum of 2 blurs is allowed

3.1.9 Boolean (combined) query (bool)

bool keyword: used to combine multiple conditions to achieve complex queries

  • must is equivalent to &&being established at the same time
  • should is equivalent to ||setting up a
  • must_not is equivalent to not satisfying any one
GET /products/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [ //必须匹配的条件
        {
    
    
          "term": {
    
    
            "title": {
    
    
              "value": "金州勇士"
            }
          }
        },
        {
    
    
          "ids":{
    
    
            "values": [1, 2]
          }
        }
      ], 
      "must_not": [ //必须不匹配的条件
        {
    
    
          "term": {
    
    
            "title": {
    
    
              "value": "金州勇士"
            }
          }
        }
      ],
      "should": [ //相当于匹配一个或多个即可
        {
    
    
          // ...条件
        }
      ]
    }
  }
}

3.1.10 Multi-field query multi_match*

Multiple fields can be queried based on the same condition

Note: If the field type has a participle (text type), 分词query the field after the query condition, if there is no participle (text type) attribute in the field, the query condition will be queried as a whole

GET /products/_search
{
    
    
  "query": {
    
    
    "multi_match": {
    
    
      "query": "库里",
      "fields": ["title", "description"]
    }
  }
}

3.1.11query_string

If the query field type is word-segmented, the query condition will be used 分词后进行查询; if the query field type is not word-segmented, the query condition will be searched without word segmentation

GET /products/_search
{
    
    
  "query": {
    
    
    "query_string": {
    
    
      "default_field": "description",
      "query": "xxx"
    }
  }
}

3.1.12highlight query

对keyword和text类型都适用

GET /products/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "description": {
    
    
        "value": "手"
      }
    }
  },
  "highlight": {
    
    
    "pre_tags": ["<span style='color:red;'>"],   //自定义前缀
    "post_tags": ["</span>"],  //自定义后缀
    "fields": {
    
    
      "*":{
    
    }
    }
  }
}

3.1.13 Pagination query

size + from

GET /products/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  
  "size": 2, //一页的记录数
  "from": 0  //从哪条记录开始返回 (第一条数据从0开始)
}

3.1.14 Sorting

GET /products/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "sort": [
    {
    
    
      "price": {
    
    
        "order": "desc" //desc(默认降序) asc
      }
    }
  ]
}

3.1.15 Return the specified field _source

GET /products/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "_source": ["price","description"] //字段名
}

4. Index principle (inverted index)

测试数据

index

PUT /products/
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type":"keyword"
      },
      "price":{
    
    
        "type":"double"
      },
      "description":{
    
    
        "type": "text"
      }
    }
  }
}

data

POST /products/_doc/_bulk
{
    
    "index":{
    
    "_id":1}}
  {
    
    "title":"杜兰特","description":"雷勇篮","price":123.2}
{
    
    "index":{
    
    "_id":2}}
  {
    
    "title":"库里","description":"勇","price":23.2}
{
    
    "index":{
    
    "_id":3}}
  {
    
    "title":"詹姆斯","description":"骑热湖","price":13.4}

Storage data model in ES

倒排索引(Inverted Index), Also called reverse index, if there is a reverse index, there must be a forward index. In layman's terms, the forward index is to find the value through the key, and the reverse index is to find the key through the value . The bottom layer of ES uses the inverted index when searching

Searching through an inverted index is very fast because there are no repeated keywords in the index and each word points directly to the relevant row

Store the number of occurrences of keywords and the length of the data to make a correlation score ,次数越多,长度越短得分越高
insert image description here

5. IK tokenizer configuration

关于安装IK分词器不再描述

use

//分词器测试
POST /_analyze
{
    
    
  "analyzer": "ik_max_word", //ik_max_word分词粒度细,ik_smart粒度粗
  "text": "中华人民共和国歌" //需要分析的语句
}


// 创建映射,指定字段使用分词器
PUT /test
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "description":{
    
    
        "type":"text",
        "analyzer": "ik_max_word" //配置analyzer为ik_max_word,ik_smart粒度粗
      }
    }
  }
}

//插入数据
POST /test/_doc/1
{
    
    
  "description":"中华人民共和国"
}


//查询,人民 中华 共和国 等关键字都可以查询出来
GET /test/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "description": {
    
    
        "value": "人民"
      }
    }
  }
}

Expanded words, stop words configuration

IK supports custom 扩展词典and 停用词典,

  • The extended dictionary means that some words are not keywords, but they also hope to be used by ES as keywords for retrieval. These words can be added to the extended dictionary
  • The disabled dictionary means that some words are keywords, but in business scenarios, you do not want to use these keywords to be retrieved, you can put these words into the disabled dictionary

Defining extension dictionaries and disabling dictionaries can modify IK分词器中config目录中IKAnalyzer.cfg.xmlthis file

insert image description here

Note: custom dictionaries,每一个单词占一行

6. Filter queries

Filtering query, in fact, to be precise, the query operation in ES is divided into two types, 查询(query)and 过滤(filter). The query is what I mentioned before query查询. By default, it will calculate the score of each returned document, and then sort according to the score , but 过滤(filter)only filter out the matching documents, without calculating the score, and it can cache the documents. So from a performance point of view, filtering is faster than querying, in other words 过滤适合在大范围筛选数据,而查询则适合精确匹配数据,一般应用时,应先使用过滤操作过滤数据,然后使用查询匹配数据.

insert image description here

GET /products/_search
{
    
    
  "query": {
    
    
    "bool": {
    
     //filter必须配合bool使用
      "must": [
        {
    
    
          "match_all": {
    
    }
        }
      ],
      "filter": [  //注意,以下的过滤条件只能写一个
        {
    
    
          "term": {
    
     //过滤出字段匹配的
            "description": "雷",
          },
          "terms": {
    
     //过滤出字段匹配多个的 (或)
            "description": [
              "雷",
              "骑"
            ]
          },
          "range": {
    
     //过滤出price在范围内的
            "price": {
    
    
              "gte": 10,
              "lte": 20
            }
          },
          "exists": {
    
     //过滤出存在description字段的
            "field": "description"
          },
          "ids": {
    
     //过滤出id为xx xx的
            "values": [
              "1","2"
            ]
          },
        }
      ]
    }
  }
}

Guess you like

Origin blog.csdn.net/qq_46312987/article/details/125467193