1. Important concepts
1.1 Index (index)
一个索引就是一个拥有几分相似特征的文档的集合
, for example, there can be an index of product data, an index of order data, and an index of user data. 一个索引由一个名字来标识(必须全部是小写字母)
When we want to index, search, update or delete documents in this index, we need to use the name.ES indexes can be regarded as individual databases in the MySQL service
1.2 Mapping
映射(mapping)是定义一个文档和它所包含的字段如何被存储和索引的过程
, in the default configuration, ES can automatically create a mapping based on the inserted data, or manually create a mapping (usually manually create a mapping) . The mapping mainly includes field names, field types, etc.The mapping of ES can be regarded as a table in a database in the MySQL service
注意
- Before ES 7.0 , an index could have multiple Mappings
- After ES 7.0 , an index only corresponds toOne Mapping ( that is, there is only one table in one database )
1.3 Documentation
A document is a piece of data stored in the index . A document is the smallest unit that can be indexed. Documents in ES are represented by lightweight JSON
format data
Documents in ES can be regarded as the data of a table in a database in the MySQL service
2. Basic use
2.1 Index related operations
#添加索引
PUT /index_name
#查看ES中的所有索引 参数v表示显示所有索引时显示带上标题
GET _cat/indices?v
#删除指定索引
DELETE /index_name
注意:索引没有修改操作
2.2 Mapping related operations
2.2.1 Data type (commonly used)
字符串类型 keyword(不分词) text(分词)
数字类型 integer long
小数类型 float double
布尔类型 boolean
日期类型 date
2.2.2 Create & Query Mapping
注意:ES没有修改和删除映射(mapping)的操作
If you want to delete or modify the mapping, just delete the index index and re-create the mapping
PUT /index_name
{
//固定写法mappings
"mappings": {
//固定写法properties
"properties": {
//字段名
"id":{
//固定写法 type:数据类型
"type":"integer"
},
"title":{
"type":"keyword"
},
"price":{
"type":"double"
},
"created_at":{
"type": "date"
},
"description":{
"type": "text"
}
}
}
}
#查看某一个索引中的映射
GET /index_name/_mapping
2.3 Document related operations
2.3.1 Insert document
Insert document **(manually specify _id)**
// 为索引中添加数据,指定_id
// 格式 POST /index_name/_doc/_id
POST /index_name/_doc/1
{
"id":1, //注意:这里的id与文档id(_id)不是一个东西,但是尽量与_id保持一致
"title":"布鲁克林篮网",
"price":233,
"created_at":"2001-10-20",
"description":"拥有KD-KI的一只超级球队"
}
Insert document (automatically generate _id)
//后面不加_id 自动生成的_id是一串uuid
POST /index_name/_doc/
{
//注意,手动生成_id,所以这里不在指定id的值
"title":"金州勇士",
"price":233,
"created_at":"2001-10-21",
"description":"拥有SC的一只超级球队"
}
2.3.2 Query documents based on _id
Note: It must be _id, that is, the document id
GET /index_name/_doc/_id
2.3.3 Delete documents according to _id
Note: It must be _id, that is, the document id
DELETE /index_name/_doc/_id
2.3.4 Modify the document according to _id
method one
//这种方式是先删除原始文档,然后插入,不会保留原始字段(不推荐)
PUT /index_name/_doc/_id
{
"price":233
}
way two
POST /index_name/_doc/_id/_update
{
"doc":{
"xx":xxx
}
}
2.3.5 Batch operation_bulk
Note: If one statement fails to execute, it will not affect the execution of other statements
Example 1: Add multiple pieces of data
POST /index_name/_doc/_bulk
{
"index":{
"_id":13}} //手动指定_id 下面的数据不允许格式化,只能写在一行
{
"id":13, "title":"休斯顿火箭","price":123,"created_at":"2001-10-21","description":"拥有SC的一只超级球队"}
{
"index":{
"_id":14}}
{
"id":14, "title":"底特律活塞","price":13, "created_at":"2003-10-21","description":"拥有SC的一只超级球队"}
Example 2: addition, deletion and modification
#批量操作 -增删改
POST /products/_doc/_bulk
{
"index":{
"_id":15}} // index就是添加操作
{
"id":13, "title":"夏洛特黄蜂","price":123,"created_at":"2001-10-21","description":"拥有MJ的一只超级球队"}
{
"delete":{
"_id":13}} // delete删除
{
"update":{
"_id":1}} //update更新
{
"doc":{
"title":"OKC"}}
3. Advanced Query DSL
3.1 Simple query
3.1.2 Query all
GET /index_name/_search
{
"query":{
"match_all": {
}
}
}
3.1.3 term query
- In addition to
text
the type of field (word segmentation query,Only hit the participle to query successfully), the remaining fields are not word-segmented when querying term, that is, exact query - The default tokenizer used in ES is the standard tokenizer . The standard tokenizer performs word segmentation for English content, and for Chinese word segmentation
- term only supports querying one field.
GET /products/_search
{
"query": {
"term": {
"price": {
//字段名
"value": "xxx" //字段值 其他都是固定写法
}
}
}
}
3.1.4 Range query (range)
GET /products/_search
{
"query": {
"range": {
"price": {
//字段名
"gte": 10,
"lte": 232
}
}
}
}
3.1.5 Prefix query (prefix)
基于关键词的前缀查询
Keyword type or text type keyword after word segmentation
GET /products/_search
{
"query": {
"prefix": {
"title": {
//字段名
"value": "夏洛特" //值
}
}
}
}
3.1.6 Wildcard query
基于关键词的通配符查询
Keyword type or text type keyword after word segmentation
GET /products/_search
{
"query": {
"wildcard": {
"title": {
//字段名
"value": "夏洛特*" // *表示匹配多个字符 ?表示一个一个字符
}
}
}
}
3.1.7 Multi_id query (ids)
GET /products/_search
{
"query": {
"ids": {
"values": [1, 2] //多个_id
}
}
}
3.1.8 Fuzzy query
常用于keyword类型的字段
GET /products/_search
{
"query": {
"fuzzy": {
"title": "金州勇士"
}
}
}
Note: fuzzy fuzzy query, the maximum fuzzy error must be between 0-2
- Ambiguity is not allowed when the search keyword length is less than or equal to 2
- When the search keyword length is 3-5, 1 fuzzing is allowed
- When the search keyword length is greater than 5, a maximum of 2 blurs is allowed
3.1.9 Boolean (combined) query (bool)
bool keyword: used to combine multiple conditions to achieve complex queries
- must is equivalent to
&&
being established at the same time - should is equivalent to
||
setting up a - must_not is equivalent to
!
not satisfying any one
GET /products/_search
{
"query": {
"bool": {
"must": [ //必须匹配的条件
{
"term": {
"title": {
"value": "金州勇士"
}
}
},
{
"ids":{
"values": [1, 2]
}
}
],
"must_not": [ //必须不匹配的条件
{
"term": {
"title": {
"value": "金州勇士"
}
}
}
],
"should": [ //相当于匹配一个或多个即可
{
// ...条件
}
]
}
}
}
3.1.10 Multi-field query multi_match*
Multiple fields can be queried based on the same condition
Note: If the field type has a participle (text type), 分词
query the field after the query condition, if there is no participle (text type) attribute in the field, the query condition will be queried as a whole
GET /products/_search
{
"query": {
"multi_match": {
"query": "库里",
"fields": ["title", "description"]
}
}
}
3.1.11query_string
If the query field type is word-segmented, the query condition will be used 分词后进行查询
; if the query field type is not word-segmented, the query condition will be searched without word segmentation
GET /products/_search
{
"query": {
"query_string": {
"default_field": "description",
"query": "xxx"
}
}
}
3.1.12highlight query
对keyword和text类型都适用
GET /products/_search
{
"query": {
"term": {
"description": {
"value": "手"
}
}
},
"highlight": {
"pre_tags": ["<span style='color:red;'>"], //自定义前缀
"post_tags": ["</span>"], //自定义后缀
"fields": {
"*":{
}
}
}
}
3.1.13 Pagination query
size + from
GET /products/_search
{
"query": {
"match_all": {
}
},
"size": 2, //一页的记录数
"from": 0 //从哪条记录开始返回 (第一条数据从0开始)
}
3.1.14 Sorting
GET /products/_search
{
"query": {
"match_all": {
}
},
"sort": [
{
"price": {
"order": "desc" //desc(默认降序) asc
}
}
]
}
3.1.15 Return the specified field _source
GET /products/_search
{
"query": {
"match_all": {
}
},
"_source": ["price","description"] //字段名
}
4. Index principle (inverted index)
测试数据
index
PUT /products/
{
"mappings": {
"properties": {
"title":{
"type":"keyword"
},
"price":{
"type":"double"
},
"description":{
"type": "text"
}
}
}
}
data
POST /products/_doc/_bulk
{
"index":{
"_id":1}}
{
"title":"杜兰特","description":"雷勇篮","price":123.2}
{
"index":{
"_id":2}}
{
"title":"库里","description":"勇","price":23.2}
{
"index":{
"_id":3}}
{
"title":"詹姆斯","description":"骑热湖","price":13.4}
Storage data model in ES
倒排索引(Inverted Index)
, Also called reverse index, if there is a reverse index, there must be a forward index. In layman's terms, the forward index is to find the value through the key, and the reverse index is to find the key through the value . The bottom layer of ES uses the inverted index when searching
Searching through an inverted index is very fast because there are no repeated keywords in the index and each word points directly to the relevant row
Store the number of occurrences of keywords and the length of the data to make a correlation score ,次数越多,长度越短得分越高
5. IK tokenizer configuration
关于安装IK分词器不再描述
use
//分词器测试
POST /_analyze
{
"analyzer": "ik_max_word", //ik_max_word分词粒度细,ik_smart粒度粗
"text": "中华人民共和国歌" //需要分析的语句
}
// 创建映射,指定字段使用分词器
PUT /test
{
"mappings": {
"properties": {
"description":{
"type":"text",
"analyzer": "ik_max_word" //配置analyzer为ik_max_word,ik_smart粒度粗
}
}
}
}
//插入数据
POST /test/_doc/1
{
"description":"中华人民共和国"
}
//查询,人民 中华 共和国 等关键字都可以查询出来
GET /test/_search
{
"query": {
"term": {
"description": {
"value": "人民"
}
}
}
}
Expanded words, stop words configuration
IK supports custom 扩展词典
and 停用词典
,
- The extended dictionary means that some words are not keywords, but they also hope to be used by ES as keywords for retrieval. These words can be added to the extended dictionary
- The disabled dictionary means that some words are keywords, but in business scenarios, you do not want to use these keywords to be retrieved, you can put these words into the disabled dictionary
Defining extension dictionaries and disabling dictionaries can modify IK分词器中config目录中IKAnalyzer.cfg.xml
this file
Note: custom dictionaries,每一个单词占一行
6. Filter queries
Filtering query, in fact, to be precise, the query operation in ES is divided into two types, 查询(query)
and 过滤(filter)
. The query is what I mentioned before query查询
. By default, it will calculate the score of each returned document, and then sort according to the score , but 过滤(filter)
only filter out the matching documents, without calculating the score, and it can cache the documents. So from a performance point of view, filtering is faster than querying, in other words 过滤适合在大范围筛选数据,而查询则适合精确匹配数据,一般应用时,应先使用过滤操作过滤数据,然后使用查询匹配数据
.
GET /products/_search
{
"query": {
"bool": {
//filter必须配合bool使用
"must": [
{
"match_all": {
}
}
],
"filter": [ //注意,以下的过滤条件只能写一个
{
"term": {
//过滤出字段匹配的
"description": "雷",
},
"terms": {
//过滤出字段匹配多个的 (或)
"description": [
"雷",
"骑"
]
},
"range": {
//过滤出price在范围内的
"price": {
"gte": 10,
"lte": 20
}
},
"exists": {
//过滤出存在description字段的
"field": "description"
},
"ids": {
//过滤出id为xx xx的
"values": [
"1","2"
]
},
}
]
}
}
}