Elasticsearch Basic Operations - RESTful Operations

1. Introduction to RESTful

REST refers to a set of architectural constraints and principles. An application or design that satisfies these constraints and principles is RESTful. The most important REST principle for web applications is that the interaction between client and server is stateless between requests. Every request from client to server must contain the information necessary to understand the request. If the server restarts at any point between requests, the client will not be notified. Additionally, stateless requests can be answered by any available server, which is ideal for environments such as cloud computing. Clients can cache data to improve performance.

On the server side, application state and functionality can be grouped into various resources. A resource is an interesting conceptual entity that is exposed to clients. Examples of resources are: application objects, database records, algorithms, and so on. Each resource uses URI (Universal Resource Identifier) ​​to get a unique address. All resources share a uniform interface for transferring state between client and server. Standard HTTP methods are used, such as GET, PUT, POST, and DELETE.

In RESTful web services, each resource has an address. The resources themselves are all targets of method calls, and the method list is the same for all resources. These methods are standard and include HTTP GET, POST, PUT, DELETE, and possibly HEAD and OPTIONS. The simple understanding is that if you want to access resources on the Internet, you must send a request to the server where the resource is located, and the request body must include the network path of the resource and the operations on the resource (addition, deletion, modification, and query).

2. Client installation

If you send a request to the Elasticsearch server directly through the browser, you need to include the HTTP standard method in the sent request, and most of the features of HTTP only support the GET and POST methods. Therefore, in order to facilitate client access, you can use Postman (does not support Chinese but relatively many people use it), Apipost (made by Chinese), Apifox (made by Chinese) and other api debugging tools.

Postman:https://www.postman.com/downloads/

Apipost:https://www.apipost.cn//

Apifox:https://www.apifox.cn/

Postman: An old-fashioned and powerful webpage debugging tool, with a simple and clear interface, convenient and fast operation, and a very user-friendly design. But it does not support Chinese, and the Chinese language package for Chinese people also stops at version 9.12.2 https://github.com/hlmd/postman-cn

Apipost&Apifox: It is made by Chinese people, supports Chinese, and the personal version is free. Basically, they have what Postman has. They also support collaboration, support web version, export various documents, and generate codes in various languages. The main reason is that the server is in China, and your workspace can be synchronized to the remote end. However, the postman server is abroad, and the access speed is very slow. Try not to register and log in, otherwise it will be very stuck.

3. Data format

Elasticsearch is a document-oriented database, where a piece of data is a document.
An analogy is made between the concept of storing document data in Elasticsearch and the concept of storing data in relational database MySQL.
insert image description here
Index in ES can be regarded as a library, while Types is equivalent to a table, and Documents is equivalent to a row of a table. Here, the concept of Types has been gradually weakened. In Elasticsearch 6.X, an index can only contain one type. In Elasticsearch 7.X, the concept of Type has been deleted.

6 Use JSON as the document serialization format, such as a piece of user information:

{
    
    
	"name" : "John",
	"sex" : "Male",
	"age" : 25,
	"birthDate": "1990/05/01",
	"about" : "I love to go rock climbing",
	"interests": [ "sports", "music" ]
}

4. HTTP operation

4.1 Index operation

create index

Compared with relational databases, creating an index is equivalent to creating a database
Send a PUT request to the ES server: http://127.0.0.1:9200/shopping
insert image description here

{
    
    
	"acknowledged"【响应结果】: true, # true 操作成功
	"shards_acknowledged"【分片结果】: true, # 分片操作成功
	"index"【索引名称】: "shopping"
}
# 注意:创建索引库的分片数默认 1 片,在 7.0.0 之前的 Elasticsearch 版本中,默认 5

If the index is added repeatedly, an error message will be returned
insert image description here

view a single index

GET request: http://127.0.0.1:9200/shopping
insert image description here
to view the index The request path sent to the ES server is consistent with the index creation. But HTTP methods are inconsistent. Here
you can experience the meaning of RESTful.
After the request, the server responds as follows:

{
    
    
	"shopping"【索引名】: {
    
    
		"aliases"【别名】: {
    
    },
		"mappings"【映射】: {
    
    },
		"settings"【设置】: {
    
    
			"index"【设置 - 索引】: {
    
    
				"creation_date"【设置 - 索引 - 创建时间】: "1614265373911",
				"number_of_shards"【设置 - 索引 - 主分片数量】: "1",
				"number_of_replicas"【设置 - 索引 - 副分片数量】: "1",
				"uuid"【设置 - 索引 - 唯一标识】:"eI5wemRERTumxGCc1bAk2A",
				"version"【设置 - 索引 - 版本】: {
    
    
					"created": "7080099"
				},
				"provided_name"【设置 - 索引 - 名称】: "shopping"
			}
		}
	}
}

view all indexes

GET request: http://127.0.0.1:9200/_cat/indices?v
insert image description here
The _cat in the request path here means viewing, and indices means index, so the overall meaning is to view all indexes in the current ES server, just like MySQL The feeling of show tables in the server response results are as follows:

Header meaning
health Current server health status:
green (cluster complete) yellow (single point normal, cluster incomplete) red (single point abnormal)
status Index open, closed state
index index name
uuid index uniform number
at Number of primary shards
rep number of copies
docs.count Number of documents available
docs.deleted Document deletion status (tombstone)
store.size The overall size of the primary and secondary shards
pri.store.size The space occupied by the primary shard

delete index

DELETE request: http://127.0.0.1:9200/shopping
insert image description here
When revisiting the index, the server returns a response: The index does not exist
insert image description here

4.2 Document Operation

create document

The documents here can be compared to table data in a relational database, and the added data format is in JSON format.
POST request: http://127.0.0.1:9200/shopping/_doc
The content of the request body is: (the request body must exist, otherwise an error message will be returned)

{
    
    
	"title":"小米手机",
	"category":"小米",
	"images":"http://www.gulixueyuan.com/xm.jpg",
	"price":3999.00
}

The method of sending the request here must be POST, not PUT, otherwise a similar 405 error will occur:
insert image description here

Since I found some problems with the Apipost6.x version during my study, I switched to the 5.x version. The interface may differ from the screenshot above.
The cause of the problem is that the Apipost6.x version will redirect the response code 201, causing the ES server to receive a Get request. The error message is shown above. It has been reported to the official and will be fixed in subsequent versions.
Domestic production still needs to continue to work hard.

The normal server response results are as follows:
insert image description here

{
    
    
	"_index"【索引】: "shopping",
	"_type"【 类型-文档 】: "_doc",
	"_id"【唯一标识】: "w_WoYoIBNKuSN7cz5FHR", #可以类比为 MySQL 中的主键,随机生成
	"_version"【版本】: 1,
	"result"【结果】: "created", #这里的 created 表示创建成功
	"_shards"【分片】: {
    
    
		"total"【分片 - 总数】: 2,
		"successful"【分片 - 成功】: 1,
		"failed"【分片 - 失败】: 0
	},
	"_seq_no": 0,
	"_primary_term": 1
}

After the above data is created, since no data unique identifier (ID) is specified, by default, the ES server will randomly generate one.
If you want to customize the unique identifier, you need to specify it when creating: http://127.0.0.1:9200/shopping/_doc/1 Note
insert image description here
here: If you specify the data primary key when adding data, then the request method can also be PUT

view a single document

When viewing a document, you need to specify the unique identifier of the document, similar to the primary key query of data in MySQL

GET request: http://127.0.0.1:9200/shopping/_doc/1
insert image description here

{
    
    
	"_index"【索引】: "shopping",
	"_type"【文档类型】: "_doc",
	"_id": "1",
	"_version": 2,
	"_seq_no": 2,
	"_primary_term": 2,
	"found"【查询结果】: true, # true 表示查找到,false 表示未查找到
	"_source"【文档源信息】: {
    
    
		"title": "华为手机",
		"category": "华为",
		"images": "http://www.gulixueyuan.com/hw.jpg",
		"price": 4999.00
	}
}

Modify the document (full revision)

Just like adding a new document, enter the same URL address request, if the request body changes, the original data content will be overwritten.
POST/PUT request: http://127.0.0.1:9200/shopping/_doc/1

{
    
    
	"title":"小米手机",
	"category":"小米",
	"images":"http://www.gulixueyuan.com/xm.jpg",
	"price":2999.00
}

insert image description here

{
    
    
	"_index": "shopping",
	"_type": "_doc",
	"_id": "1",
	"_version"【版本】: 2,
	"result"【结果】: "updated", # updated 表示数据被更新
	"_shards": {
    
    
		"total": 2,
		"successful": 1,
		"failed": 0
		},
	"_seq_no": 2,
	"_primary_term": 2
}

Modify field (local modification)

When modifying data, you can also only modify the partial information of a given piece of data
POST request: http://127.0.0.1:9200/shopping/_update/1
The content of the request body is:

{
    
    
	"doc": {
    
    
		"price":3000.00
	}
}

insert image description here
According to the unique identification, query the document data, the document data has been updated
insert image description here

delete document

Deleting a document is not immediately removed from disk, it is just marked as deleted (tombstone).

DELETE request: http://127.0.0.1:9200/shopping/_doc/1
insert image description here

{
    
    
	"_index": "shopping",
	"_type": "_doc",
	"_id": "1",
	"_version"【版本】: 4, #对数据的操作,都会更新版本
	"result"【结果】: "deleted", # deleted 表示数据被标记为删除
	"_shards": {
    
    
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 4,
	"_primary_term": 2
}

After deleting, query the current document information
insert image description here
If you delete a document that does not exist
insert image description here

{
    
    
	"_index": "shopping",
	"_type": "_doc",
	"_id": "1",
	"_version": 1,
	"result"【结果】: "not_found", # not_found 表示未查找到
	"_shards": {
    
    
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"_seq_no": 5,
	"_primary_term": 2
}

Conditionally delete documents

Generally, data is deleted according to the unique identifier of the document. In actual operation, multiple pieces of data can also be deleted according to conditions.

  1. First add multiple pieces of data respectively:
{
    
    
	"title": "小米手机",
	"category": "小米",
	"images": "http://www.gulixueyuan.com/xm.jpg",
	"price": 4000
}

{
    
    
	"title":"华为手机",
	"category":"华为",
	"images":"http://www.gulixueyuan.com/hw.jpg",
	"price":4000.00
}

insert image description here
insert image description here
POST request: http://127.0.0.1:9200/shopping/_delete_by_query
The content of the request body is:

{
    
    
	"query":{
    
    
		"match":{
    
    
			"price":4000.00
		}
	}
}

insert image description here

{
    
    
	"took"【耗时】: 6,
	"timed_out"【是否超时】: false,
	"total"【总数】: 1,
	"deleted"【删除数量】: 1,
	"batches": 1,
	"version_conflicts": 0,
	"noops": 0,
	"retries": {
    
    
		"bulk": 0,
		"search": 0
	},
	"throttled_millis": 0,
	"requests_per_second": -1,
	"throttled_until_millis": 0,
	"failures": []
}

4.3 Mapping operation

With the index library, it is equivalent to having a database in the database.

Next, you need to build the mapping in the index library (index), which is similar to the table structure (table) in the database (database). To create a database table, you need to set the field name, type, length, constraints, etc.; the same is true for the index library, you need to know which fields are under this type, and what constraint information each field has. This is called mapping.

create mapping

  • Create an index student
    PUT request: http://127.0.0.1:9200/student
  • Create a mapping
    PUT request: http://127.0.0.1:9200/student/_mapping
    insert image description here
    mapping data description:
  • Field name: Fill in freely, specify many attributes below, for example: title, subtitle, images, price
  • type: type, the data types supported in Elasticsearch are very rich, say a few key ones:
    • The String type is divided into two types:
      • text: separable words
      • keyword: Indivisible, the data will be matched as a complete field
    • Numerical: Numerical type, divided into two categories
      • Basic data types: long, integer, short, byte, double, float, half_float
      • High-precision type of floating-point numbers: scaled_float
    • Date: date type
    • Array: array type
    • Object: object
  • index: Whether to index, the default is true, that is to say, all fields will be indexed without any configuration.
    • true: the field will be indexed and can be used for searching
    • false: the field will not be indexed and cannot be used for searching
  • store: Whether to store the data independently, the default is false,
    the original text will be stored in _source, by default, other extracted fields are not stored independently, but extracted from _source. Of course, you can also store a certain field independently, as long as you set "store": true. Obtaining an independently stored field is much faster than parsing from _source, but it will also take up more space, so it should be based on the actual situation business needs to set.
  • analyzer: word breaker, the ik_max_word here is to use the ik word breaker, there will be a special chapter to learn later

view map

GET request: http://127.0.0.1:9200/student/_mapping
insert image description here

index map association

PUT request: http://127.0.0.1:9200/student1

{
    
    
	"settings": {
    
    },
	"mappings": {
    
    
		"properties": {
    
    
			"name": {
    
    
				"type": "text",
				"index": true
			},
			"sex": {
    
    
				"type": "text",
				"index": false
			},
			"age": {
    
    
				"type": "long",
				"index": false
			}
		}
	}
}

Equivalent to mapping and association when creating an index
insert image description here

4.4 Advanced query

Elasticsearch provides a complete query DSL based on JSON to define queries
and define data:

# POST /student/_doc/1001
{
    
    
	"name":"zhangsan",
	"nickname":"zhangsan",
	"sex":"男",
	"age":30
}
# POST /student/_doc/1002
{
    
    
	"name":"lisi",
	"nickname":"lisi",
	"sex":"男",
	"age":20
}
# POST /student/_doc/1003
{
    
    
	"name":"wangwu",
	"nickname":"wangwu",
	"sex":"女",
	"age":40
}
# POST /student/_doc/1004
{
    
    
	"name":"zhangsan1",
	"nickname":"zhangsan1",
	"sex":"女",
	"age":50
}
# POST /student/_doc/1005
{
    
    
	"name":"zhangsan2",
	"nickname":"zhangsan2",
	"sex":"女",
	"age":30
}

View all documents in the index library

GET/POST request: http://127.0.0.1:9200/_search
insert image description here

View all documents under the specified index

GET/POST request: http://127.0.0.1:9200/student/_search
insert image description here

{
    
    
	"took"【查询花费时间,单位毫秒】: 1,
	"timed_out"【是否超时】: false,
	"_shards"【分片信息】: {
    
    
		"total"【总数】: 1,
		"successful"【成功】: 1,
		"skipped"【忽略】: 0,
		"failed"【失败】: 0
	},
	"hits"【搜索命中结果】: {
    
    
		"total"【搜索条件匹配的文档总数】: {
    
    
			"value"【总命中计数的值】: 5,
			"relation"【计数规则】: "eq" # eq 表示计数准确, gte 表示计数不准确
		},
		"max_score"【匹配度分值】: 1,
		"hits"【命中结果集合】: [
			... ...
		]
	}
}

condition matching query

  1. Path splicing parameter query (recommended the second)

GET/POST request: http://127.0.0.1:9200/student/_search?q=name:zhangsan

parameter illustrate
Code to add query parameters
q Indicates the meaning of the query
name query field name

insert image description here

  1. Request body carries parameter query (recommended)

match match type query, the query condition will be divided into words, and then the query will be performed, and the relationship between multiple entries is or

GET/POST request: http://127.0.0.1:9200/shopping/_search
The content of the request body is:

{
    
    
    "query": {
    
    
        "match":{
    
    
            "name":"zhangsan"
        }
    }
}

insert image description here

field match query

multi_match is similar to match, except that it can be queried on multiple fields.

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"multi_match": {
    
    
			"query": "zhangsan",
			"fields": ["name","nickname"]
		}
	}
}

insert image description here

keyword exact query

term query, exact keyword matching query, no word segmentation for query conditions.

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"term": {
    
    
			"name": {
    
    
				"value": "zhangsan"
			}
		}
	}
}

insert image description here

Multi-keyword precise query

The terms query is the same as the terms query, but it allows you to specify multiple values ​​to match against.
If this field contains any of the specified values, then the document meets the conditions, similar to mysql's in.

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"terms": {
    
    
			"name": ["zhangsan","lisi"]
		}
	}
}

insert image description here

specify query fields

By default, Elasticsearch will return all the fields in the document stored in _source in the search results.
If we only want to get some of the fields, we can add _source filtering

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"_source": ["name","nickname"],
	"query": {
    
    
		"terms": {
    
    
			"nickname": ["zhangsan"]
		}
	}
}

insert image description here

filter field

We can also pass:

  • includes: to specify the fields you want to display
  • excludes: to specify the fields that do not want to be displayed

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"_source": {
    
    
		"includes": ["name","sex"],
		"excludes": ["nickname"]
	},
	"query": {
    
    
		"terms": {
    
    
			"nickname": ["zhangsan"]
		}
	}
}

insert image description here

combined query

boolCombine various other queries by must(must), must_not(must not), should(should)

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"bool": {
    
    
			"must": [
				{
    
    
					"match": {
    
    
						"name": "zhangsan"
					}
				}
			],
			"must_not": [
				{
    
    
					"match": {
    
    
						"age": "40"
					}
				}
			],
			"should": [
				{
    
    
					"match": {
    
    
						"sex": "男"
					}
				}
			]
		}
	}
}

Error description:
age cannot be indexed in the mapping and cannot be viewed. (The following range queries will also be encountered)

This is because the index of age and sex is set to false when creating the index mapping, the following is the screenshot of the error.
If you want to test, it is recommended to re-build an index, and then set the index of the mapping association age and sex to true. The
insert image description here
normal result return should be:
insert image description here

range query

The range query finds numbers or times that fall within a specified range. range queries allow the following characters

operator illustrate
gt greater than>
gte greater than or equal to >=
lt less than<
lte less than or equal to <=

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"range": {
    
    
			"age": {
    
    
				"gte": 30,
				"lte": 35
			}
		}
	}
}

insert image description here

fuzzy query

Returns documents that contain terms similar to the search term.
Edit distance is the number of one character changes required to convert one term into another. These changes can include:

  • change character (box → fox)
  • delete character (black → lack)
  • insert character (sic → sick)
  • transpose two adjacent characters (act → cat)

To find similar terms, a fuzzy query creates a set of all possible variations or expansions of a search term within a specified edit distance. The query then returns an exact match for each extension.

Modify edit distance by fuzziness. The default value of AUTO is generally used, and the edit distance is generated according to the length of the term.

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"fuzzy": {
    
    
			"name": {
    
    
				"value": "zhangsan"
			}
		}
	}
}

insert image description here
GET request: http://127.0.0.1:9200/student/_search

{
    
    
   "query": {
    
    
   	"fuzzy": {
    
    
   		"name": {
    
    
   			"value": "zhangsan",
   			"fuzziness": 2
   		}
   	}
   }
}

insert image description here

Single field sorting

sort allows us to sort by different fields, and specify the sorting method through order. desc descending order, asc ascending order.

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"match": {
    
    
			"name": "zhangsan"
		}
	},
	"sort": [
		{
    
    
			"age": {
    
    
				"order": "desc"
			}
		}
	]
}

insert image description here

Multi-field sorting

Suppose we want to query with age and _score together, and the matches are sorted first by age and then by relevance score

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"match_all": {
    
    }
	},
	"sort": [
		{
    
    
			"age": {
    
    
				"order": "desc"
			}
		},
		{
    
    
			"_score": {
    
    
				"order": "desc"
			}
		}
	]
}

insert image description here

highlight query

When performing a keyword search, the keywords in the searched content will be displayed in different colors, which is called highlighting.

Elasticsearch can set the label and style (highlight) of the keyword part in the query content.
While using the match query, add a highlight attribute:

  • pre_tags: pre-label
  • post_tags: post tags
  • fields: fields that need to be highlighted
  • title: It is declared here that the title field needs to be highlighted, and later you can set a unique configuration for this field, or it can be empty

GET request: http://127.0.0.1:9200/student/_search

{
    
    
	"query": {
    
    
		"match_all": {
    
    }
	},
	"from": 0,
	"size": 2
}

insert image description here

full query

(Usually used in conjunction with paging, because when the amount of data is large...)
GET/POST request: http://127.0.0.1:9200/student/_search
The content of the request body is:

{
    
    
    "query": {
    
    
        "match_all":{
    
    
        }
    }
}

insert image description here

Paging query

from: The starting index of the current page, starting from 0 by default. from = (pageNum - 1) * size
size: how many items are displayed on each page,

GET/POST request: http://127.0.0.1:9200/student/_search
The content of the request body is:

{
    
    
	"query": {
    
    
		"match_all": {
    
    }
	},
	"from": 0,
	"size": 2
}

insert image description here

aggregation query

Aggregation allows users to perform statistical analysis on es documents, similar to group by in relational databases, and of course there are many other aggregations, such as taking the maximum value, average value, etc.

  • Take the maximum value of a field max
    GET request: http://127.0.0.1:9200/student/_search
{
    
    
	"aggs": {
    
    
		"max_age": {
    
    
			"max": {
    
    
				"field": "age"
			}
		}
	},
	"size": 0
}

insert image description here

  • Take the minimum value min for a field
{
    
    
	"aggs": {
    
    
		"min_age": {
    
    
			"min": {
    
    
				"field": "age"
			}
		}
	},
	"size": 0
}

insert image description here

  • sum a field
{
    
    
	"aggs": {
    
    
		"sum_age": {
    
    
			"sum": {
    
    
				"field": "age"
			}
		}
	},
	"size": 0
}

insert image description here

  • Take the average avg of a field

  • Deduplicate the value of a field and then take the total

  • State Aggregation

bucket aggregation query

Guess you like

Origin blog.csdn.net/weixin_52799373/article/details/126137070