Elasticsearch Basic Operations - RESTful Operations
- 1. Introduction to RESTful
- 2. Client installation
- 3. Data format
- 4. HTTP operation
-
- 4.1 Index operation
- 4.2 Document Operation
- 4.3 Mapping operation
- 4.4 Advanced query
-
- View all documents in the index library
- View all documents under the specified index
- condition matching query
- field match query
- keyword exact query
- Multi-keyword precise query
- specify query fields
- filter field
- combined query
- range query
- fuzzy query
- Single field sorting
- Multi-field sorting
- highlight query
- full query
- Paging query
- aggregation query
- bucket aggregation query
1. Introduction to RESTful
REST refers to a set of architectural constraints and principles. An application or design that satisfies these constraints and principles is RESTful. The most important REST principle for web applications is that the interaction between client and server is stateless between requests. Every request from client to server must contain the information necessary to understand the request. If the server restarts at any point between requests, the client will not be notified. Additionally, stateless requests can be answered by any available server, which is ideal for environments such as cloud computing. Clients can cache data to improve performance.
On the server side, application state and functionality can be grouped into various resources. A resource is an interesting conceptual entity that is exposed to clients. Examples of resources are: application objects, database records, algorithms, and so on. Each resource uses URI (Universal Resource Identifier) to get a unique address. All resources share a uniform interface for transferring state between client and server. Standard HTTP methods are used, such as GET, PUT, POST, and DELETE.
In RESTful web services, each resource has an address. The resources themselves are all targets of method calls, and the method list is the same for all resources. These methods are standard and include HTTP GET, POST, PUT, DELETE, and possibly HEAD and OPTIONS. The simple understanding is that if you want to access resources on the Internet, you must send a request to the server where the resource is located, and the request body must include the network path of the resource and the operations on the resource (addition, deletion, modification, and query).
2. Client installation
If you send a request to the Elasticsearch server directly through the browser, you need to include the HTTP standard method in the sent request, and most of the features of HTTP only support the GET and POST methods. Therefore, in order to facilitate client access, you can use Postman (does not support Chinese but relatively many people use it), Apipost (made by Chinese), Apifox (made by Chinese) and other api debugging tools.
Postman:https://www.postman.com/downloads/
Apipost:https://www.apipost.cn//
Apifox:https://www.apifox.cn/
Postman: An old-fashioned and powerful webpage debugging tool, with a simple and clear interface, convenient and fast operation, and a very user-friendly design. But it does not support Chinese, and the Chinese language package for Chinese people also stops at version 9.12.2 https://github.com/hlmd/postman-cn
Apipost&Apifox: It is made by Chinese people, supports Chinese, and the personal version is free. Basically, they have what Postman has. They also support collaboration, support web version, export various documents, and generate codes in various languages. The main reason is that the server is in China, and your workspace can be synchronized to the remote end. However, the postman server is abroad, and the access speed is very slow. Try not to register and log in, otherwise it will be very stuck.
3. Data format
Elasticsearch is a document-oriented database, where a piece of data is a document.
An analogy is made between the concept of storing document data in Elasticsearch and the concept of storing data in relational database MySQL.
Index in ES can be regarded as a library, while Types is equivalent to a table, and Documents is equivalent to a row of a table. Here, the concept of Types has been gradually weakened. In Elasticsearch 6.X, an index can only contain one type. In Elasticsearch 7.X, the concept of Type has been deleted.
6 Use JSON as the document serialization format, such as a piece of user information:
{
"name" : "John",
"sex" : "Male",
"age" : 25,
"birthDate": "1990/05/01",
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
4. HTTP operation
4.1 Index operation
create index
Compared with relational databases, creating an index is equivalent to creating a database
Send a PUT request to the ES server: http://127.0.0.1:9200/shopping
{
"acknowledged"【响应结果】: true, # true 操作成功
"shards_acknowledged"【分片结果】: true, # 分片操作成功
"index"【索引名称】: "shopping"
}
# 注意:创建索引库的分片数默认 1 片,在 7.0.0 之前的 Elasticsearch 版本中,默认 5 片
If the index is added repeatedly, an error message will be returned
view a single index
GET request: http://127.0.0.1:9200/shopping
to view the index The request path sent to the ES server is consistent with the index creation. But HTTP methods are inconsistent. Here
you can experience the meaning of RESTful.
After the request, the server responds as follows:
{
"shopping"【索引名】: {
"aliases"【别名】: {
},
"mappings"【映射】: {
},
"settings"【设置】: {
"index"【设置 - 索引】: {
"creation_date"【设置 - 索引 - 创建时间】: "1614265373911",
"number_of_shards"【设置 - 索引 - 主分片数量】: "1",
"number_of_replicas"【设置 - 索引 - 副分片数量】: "1",
"uuid"【设置 - 索引 - 唯一标识】:"eI5wemRERTumxGCc1bAk2A",
"version"【设置 - 索引 - 版本】: {
"created": "7080099"
},
"provided_name"【设置 - 索引 - 名称】: "shopping"
}
}
}
}
view all indexes
GET request: http://127.0.0.1:9200/_cat/indices?v
The _cat in the request path here means viewing, and indices means index, so the overall meaning is to view all indexes in the current ES server, just like MySQL The feeling of show tables in the server response results are as follows:
Header | meaning |
---|---|
health | Current server health status: |
green (cluster complete) yellow (single point normal, cluster incomplete) red (single point abnormal) | |
status | Index open, closed state |
index | index name |
uuid | index uniform number |
at | Number of primary shards |
rep | number of copies |
docs.count | Number of documents available |
docs.deleted | Document deletion status (tombstone) |
store.size | The overall size of the primary and secondary shards |
pri.store.size | The space occupied by the primary shard |
delete index
DELETE request: http://127.0.0.1:9200/shopping
When revisiting the index, the server returns a response: The index does not exist
4.2 Document Operation
create document
The documents here can be compared to table data in a relational database, and the added data format is in JSON format.
POST request: http://127.0.0.1:9200/shopping/_doc
The content of the request body is: (the request body must exist, otherwise an error message will be returned)
{
"title":"小米手机",
"category":"小米",
"images":"http://www.gulixueyuan.com/xm.jpg",
"price":3999.00
}
The method of sending the request here must be POST, not PUT, otherwise a similar 405 error will occur:
Since I found some problems with the Apipost6.x version during my study, I switched to the 5.x version. The interface may differ from the screenshot above.
The cause of the problem is that the Apipost6.x version will redirect the response code 201, causing the ES server to receive a Get request. The error message is shown above. It has been reported to the official and will be fixed in subsequent versions.
Domestic production still needs to continue to work hard.
The normal server response results are as follows:
{
"_index"【索引】: "shopping",
"_type"【 类型-文档 】: "_doc",
"_id"【唯一标识】: "w_WoYoIBNKuSN7cz5FHR", #可以类比为 MySQL 中的主键,随机生成
"_version"【版本】: 1,
"result"【结果】: "created", #这里的 created 表示创建成功
"_shards"【分片】: {
"total"【分片 - 总数】: 2,
"successful"【分片 - 成功】: 1,
"failed"【分片 - 失败】: 0
},
"_seq_no": 0,
"_primary_term": 1
}
After the above data is created, since no data unique identifier (ID) is specified, by default, the ES server will randomly generate one.
If you want to customize the unique identifier, you need to specify it when creating: http://127.0.0.1:9200/shopping/_doc/1 Note
here: If you specify the data primary key when adding data, then the request method can also be PUT
view a single document
When viewing a document, you need to specify the unique identifier of the document, similar to the primary key query of data in MySQL
GET request: http://127.0.0.1:9200/shopping/_doc/1
{
"_index"【索引】: "shopping",
"_type"【文档类型】: "_doc",
"_id": "1",
"_version": 2,
"_seq_no": 2,
"_primary_term": 2,
"found"【查询结果】: true, # true 表示查找到,false 表示未查找到
"_source"【文档源信息】: {
"title": "华为手机",
"category": "华为",
"images": "http://www.gulixueyuan.com/hw.jpg",
"price": 4999.00
}
}
Modify the document (full revision)
Just like adding a new document, enter the same URL address request, if the request body changes, the original data content will be overwritten.
POST/PUT request: http://127.0.0.1:9200/shopping/_doc/1
{
"title":"小米手机",
"category":"小米",
"images":"http://www.gulixueyuan.com/xm.jpg",
"price":2999.00
}
{
"_index": "shopping",
"_type": "_doc",
"_id": "1",
"_version"【版本】: 2,
"result"【结果】: "updated", # updated 表示数据被更新
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 2
}
Modify field (local modification)
When modifying data, you can also only modify the partial information of a given piece of data
POST request: http://127.0.0.1:9200/shopping/_update/1
The content of the request body is:
{
"doc": {
"price":3000.00
}
}
According to the unique identification, query the document data, the document data has been updated
delete document
Deleting a document is not immediately removed from disk, it is just marked as deleted (tombstone).
DELETE request: http://127.0.0.1:9200/shopping/_doc/1
{
"_index": "shopping",
"_type": "_doc",
"_id": "1",
"_version"【版本】: 4, #对数据的操作,都会更新版本
"result"【结果】: "deleted", # deleted 表示数据被标记为删除
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 2
}
After deleting, query the current document information
If you delete a document that does not exist
{
"_index": "shopping",
"_type": "_doc",
"_id": "1",
"_version": 1,
"result"【结果】: "not_found", # not_found 表示未查找到
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 5,
"_primary_term": 2
}
Conditionally delete documents
Generally, data is deleted according to the unique identifier of the document. In actual operation, multiple pieces of data can also be deleted according to conditions.
- First add multiple pieces of data respectively:
{
"title": "小米手机",
"category": "小米",
"images": "http://www.gulixueyuan.com/xm.jpg",
"price": 4000
}
{
"title":"华为手机",
"category":"华为",
"images":"http://www.gulixueyuan.com/hw.jpg",
"price":4000.00
}
POST request: http://127.0.0.1:9200/shopping/_delete_by_query
The content of the request body is:
{
"query":{
"match":{
"price":4000.00
}
}
}
{
"took"【耗时】: 6,
"timed_out"【是否超时】: false,
"total"【总数】: 1,
"deleted"【删除数量】: 1,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
4.3 Mapping operation
With the index library, it is equivalent to having a database in the database.
Next, you need to build the mapping in the index library (index), which is similar to the table structure (table) in the database (database). To create a database table, you need to set the field name, type, length, constraints, etc.; the same is true for the index library, you need to know which fields are under this type, and what constraint information each field has. This is called mapping.
create mapping
- Create an index student
PUT request: http://127.0.0.1:9200/student - Create a mapping
PUT request: http://127.0.0.1:9200/student/_mapping
mapping data description: - Field name: Fill in freely, specify many attributes below, for example: title, subtitle, images, price
- type: type, the data types supported in Elasticsearch are very rich, say a few key ones:
- The String type is divided into two types:
- text: separable words
- keyword: Indivisible, the data will be matched as a complete field
- Numerical: Numerical type, divided into two categories
- Basic data types: long, integer, short, byte, double, float, half_float
- High-precision type of floating-point numbers: scaled_float
- Date: date type
- Array: array type
- Object: object
- The String type is divided into two types:
- index: Whether to index, the default is true, that is to say, all fields will be indexed without any configuration.
- true: the field will be indexed and can be used for searching
- false: the field will not be indexed and cannot be used for searching
- store: Whether to store the data independently, the default is false,
the original text will be stored in _source, by default, other extracted fields are not stored independently, but extracted from _source. Of course, you can also store a certain field independently, as long as you set "store": true. Obtaining an independently stored field is much faster than parsing from _source, but it will also take up more space, so it should be based on the actual situation business needs to set. - analyzer: word breaker, the ik_max_word here is to use the ik word breaker, there will be a special chapter to learn later
view map
GET request: http://127.0.0.1:9200/student/_mapping
index map association
PUT request: http://127.0.0.1:9200/student1
{
"settings": {
},
"mappings": {
"properties": {
"name": {
"type": "text",
"index": true
},
"sex": {
"type": "text",
"index": false
},
"age": {
"type": "long",
"index": false
}
}
}
}
Equivalent to mapping and association when creating an index
4.4 Advanced query
Elasticsearch provides a complete query DSL based on JSON to define queries
and define data:
# POST /student/_doc/1001
{
"name":"zhangsan",
"nickname":"zhangsan",
"sex":"男",
"age":30
}
# POST /student/_doc/1002
{
"name":"lisi",
"nickname":"lisi",
"sex":"男",
"age":20
}
# POST /student/_doc/1003
{
"name":"wangwu",
"nickname":"wangwu",
"sex":"女",
"age":40
}
# POST /student/_doc/1004
{
"name":"zhangsan1",
"nickname":"zhangsan1",
"sex":"女",
"age":50
}
# POST /student/_doc/1005
{
"name":"zhangsan2",
"nickname":"zhangsan2",
"sex":"女",
"age":30
}
View all documents in the index library
GET/POST request: http://127.0.0.1:9200/_search
View all documents under the specified index
GET/POST request: http://127.0.0.1:9200/student/_search
{
"took"【查询花费时间,单位毫秒】: 1,
"timed_out"【是否超时】: false,
"_shards"【分片信息】: {
"total"【总数】: 1,
"successful"【成功】: 1,
"skipped"【忽略】: 0,
"failed"【失败】: 0
},
"hits"【搜索命中结果】: {
"total"【搜索条件匹配的文档总数】: {
"value"【总命中计数的值】: 5,
"relation"【计数规则】: "eq" # eq 表示计数准确, gte 表示计数不准确
},
"max_score"【匹配度分值】: 1,
"hits"【命中结果集合】: [
... ...
]
}
}
condition matching query
- Path splicing parameter query (recommended the second)
GET/POST request: http://127.0.0.1:9200/student/_search?q=name:zhangsan
parameter | illustrate |
---|---|
? | Code to add query parameters |
q | Indicates the meaning of the query |
name | query field name |
- Request body carries parameter query (recommended)
match match type query, the query condition will be divided into words, and then the query will be performed, and the relationship between multiple entries is or
GET/POST request: http://127.0.0.1:9200/shopping/_search
The content of the request body is:
{
"query": {
"match":{
"name":"zhangsan"
}
}
}
field match query
multi_match is similar to match, except that it can be queried on multiple fields.
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"multi_match": {
"query": "zhangsan",
"fields": ["name","nickname"]
}
}
}
keyword exact query
term query, exact keyword matching query, no word segmentation for query conditions.
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"term": {
"name": {
"value": "zhangsan"
}
}
}
}
Multi-keyword precise query
The terms query is the same as the terms query, but it allows you to specify multiple values to match against.
If this field contains any of the specified values, then the document meets the conditions, similar to mysql's in.
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"terms": {
"name": ["zhangsan","lisi"]
}
}
}
specify query fields
By default, Elasticsearch will return all the fields in the document stored in _source in the search results.
If we only want to get some of the fields, we can add _source filtering
GET request: http://127.0.0.1:9200/student/_search
{
"_source": ["name","nickname"],
"query": {
"terms": {
"nickname": ["zhangsan"]
}
}
}
filter field
We can also pass:
- includes: to specify the fields you want to display
- excludes: to specify the fields that do not want to be displayed
GET request: http://127.0.0.1:9200/student/_search
{
"_source": {
"includes": ["name","sex"],
"excludes": ["nickname"]
},
"query": {
"terms": {
"nickname": ["zhangsan"]
}
}
}
combined query
bool
Combine various other queries by must
(must), must_not
(must not), should
(should)
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "zhangsan"
}
}
],
"must_not": [
{
"match": {
"age": "40"
}
}
],
"should": [
{
"match": {
"sex": "男"
}
}
]
}
}
}
Error description:
age cannot be indexed in the mapping and cannot be viewed. (The following range queries will also be encountered)
This is because the index of age and sex is set to false when creating the index mapping, the following is the screenshot of the error.
If you want to test, it is recommended to re-build an index, and then set the index of the mapping association age and sex to true. The
normal result return should be:
range query
The range query finds numbers or times that fall within a specified range. range queries allow the following characters
operator | illustrate |
---|---|
gt | greater than> |
gte | greater than or equal to >= |
lt | less than< |
lte | less than or equal to <= |
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"range": {
"age": {
"gte": 30,
"lte": 35
}
}
}
}
fuzzy query
Returns documents that contain terms similar to the search term.
Edit distance is the number of one character changes required to convert one term into another. These changes can include:
- change character (box → fox)
- delete character (black → lack)
- insert character (sic → sick)
- transpose two adjacent characters (act → cat)
To find similar terms, a fuzzy query creates a set of all possible variations or expansions of a search term within a specified edit distance. The query then returns an exact match for each extension.
Modify edit distance by fuzziness. The default value of AUTO is generally used, and the edit distance is generated according to the length of the term.
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"fuzzy": {
"name": {
"value": "zhangsan"
}
}
}
}
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"fuzzy": {
"name": {
"value": "zhangsan",
"fuzziness": 2
}
}
}
}
Single field sorting
sort allows us to sort by different fields, and specify the sorting method through order. desc descending order, asc ascending order.
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"match": {
"name": "zhangsan"
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
Multi-field sorting
Suppose we want to query with age and _score together, and the matches are sorted first by age and then by relevance score
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"match_all": {
}
},
"sort": [
{
"age": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
}
]
}
highlight query
When performing a keyword search, the keywords in the searched content will be displayed in different colors, which is called highlighting.
Elasticsearch can set the label and style (highlight) of the keyword part in the query content.
While using the match query, add a highlight attribute:
- pre_tags: pre-label
- post_tags: post tags
- fields: fields that need to be highlighted
- title: It is declared here that the title field needs to be highlighted, and later you can set a unique configuration for this field, or it can be empty
GET request: http://127.0.0.1:9200/student/_search
{
"query": {
"match_all": {
}
},
"from": 0,
"size": 2
}
full query
(Usually used in conjunction with paging, because when the amount of data is large...)
GET/POST request: http://127.0.0.1:9200/student/_search
The content of the request body is:
{
"query": {
"match_all":{
}
}
}
Paging query
from: The starting index of the current page, starting from 0 by default. from = (pageNum - 1) * size
size: how many items are displayed on each page,
GET/POST request: http://127.0.0.1:9200/student/_search
The content of the request body is:
{
"query": {
"match_all": {
}
},
"from": 0,
"size": 2
}
aggregation query
Aggregation allows users to perform statistical analysis on es documents, similar to group by in relational databases, and of course there are many other aggregations, such as taking the maximum value, average value, etc.
- Take the maximum value of a field max
GET request: http://127.0.0.1:9200/student/_search
{
"aggs": {
"max_age": {
"max": {
"field": "age"
}
}
},
"size": 0
}
- Take the minimum value min for a field
{
"aggs": {
"min_age": {
"min": {
"field": "age"
}
}
},
"size": 0
}
- sum a field
{
"aggs": {
"sum_age": {
"sum": {
"field": "age"
}
}
},
"size": 0
}
-
Take the average avg of a field
-
Deduplicate the value of a field and then take the total
-
State Aggregation