Stage 8: Advanced Service Framework (Chapter 6: ElasticSearch1)
- Day-Chapter 6: `ElasticSearch`
- 0.Learning Objectives
- 1. First introduction to elasticsearch
- 2. Index library operation
- 3.Document operations
- 4. RestAPI [java operation es index library]
- 5. RestClient [java operation documentation]
Day-Chapter 6:ElasticSearch
0.Learning Objectives
1. First introduction to elasticsearch
1.1.Understand ES
1.1.1.The role of elasticsearch
elasticsearch
It's a very powerfulOpen source search engine, has many powerful functions that can help usQuickly find what you need from massive amounts of data
For example:
- Searching for
GitHub
code
- Search for products on e-commerce websites
- Search answers on Baidu
- Search for nearby cars on taxi-hailing apps
1.1.2.ELKtechnology stack
elasticsearch
Combining kibana
, Logstash
, Beats
, it is elastic stack
( ELK
).
is widely used inLog data analysis, real-time monitoringetc. fields: rather
elasticsearch
elastic stack
core,ResponsibleStore, search, analyzedata.
1.1.3.elasticsearch和lucene
elasticsearch
The bottom layer is implemented based on Lucene。
Lucene is a Java language search engine class library (jar package). It is a top-level project of Apache Company and was developed by Doug Cutting in 1999. Official website address: https://lucene.apache.org/
.
The development history of elasticsearch :
- In 2004, Shay Banon developed Compass based on Lucene.
- In 2010 Shay Banon rewrote Compass and named it
Elasticsearch
.
1.1.4. Why not other search technologies?
The current ranking of relatively well-known search engine technologies:
Although in the early days, Apache Solr was the most important search engine technology, with the development of elasticsearch, it has gradually surpassed Solr and taken the lead:
1.1.5. Summary
What is elasticsearch?
- An open source distributed search engine that can be used to implement functions such as search, log statistics, analysis, and system monitoring.
What is elastic stack (ELK)?
- It
elasticsearch
is the core technology stack, including beats, Logstash, kibana, elasticsearch
What is Lucene?
- It is Apache's open source search engine class library (jar package), which provides the core API of the search engine.
1.2.Inverted index
Inverted indexThe concept is based on MySQL
such a forward index.
1.2.1.forward index
So what isforward indexWoolen cloth? For example, create an index for the id in the following table ( tb_goods
):
If the query is based on the id, then the index is used directly, and the query speed is very fast.
But if you do a fuzzy query based on title, you can only scan the data line by line. The process is as follows:
1) The user searches for data, and the condition is that the title matches. "%手机%"
2) Obtains the data row by row, such as the data with id 1.
3) Determines whether the title in the data matches the user's search conditions.
4) If it matches, it will be put into the result set. If it does not match, it will be discarded. Go back to step 1
Progressive scan, that is, full table scan,As the amount of data increases, its query efficiency will become lower and lower. When the amount of data reaches millions, it is a disaster。
1.2.2.Inverted index
There are two very important concepts in inverted index:
- document(
Document
): Data used to search, of whichEach piece of data is a document. For example, a web page or product information - entry(
Term
):Words into which documents are divided semantically(For document data or user search data, use a certain algorithm to segment words, and the words with meaning obtained are the entries). For example: I am a native of Shangguo, which can be divided into several terms: I, am, a native of Shangguo, Shangguo, Shangren.
Creating an inverted index is a special processing of the forward index. The process is as follows:
- Use the algorithm to segment the data of each document to obtain each entry.
- Create a table. Each row of data includes information such as the entry, the document ID, and the location of the entry.
- becauseThe uniqueness of the term can create an index for the term, such as hash table structure index
As shown in the picture:
The search process of the inverted index is as follows (take searching for "Huawei mobile phone" as an example):
1) The user enters criteria "华为手机"
to search.
2) Segment the user input content into words and obtain the entries: 华为
, 手机
.
3) Hold the entry inInverted indexSearching in , you can get the document IDs containing the terms: 1, 2, 3.
4) Take the document ID to find the specific document in the forward index.
As shown in the figure:
Although the inverted index needs to be queried first, and then the forward index is queried, both the term and the document ID are indexed, and the query speed is very fast!No need for full table scan。
1.2.3. Forward and reverse
So why is one calledforward index, one calledInverted indexWoolen cloth?
- Forward index is the most traditional way of indexing based on id. However, when querying based on terms, you must first obtain each document one by one, and then determine whether the document contains the required terms. This is the process of finding terms based on the document .
- On the contrary, the inverted index first finds the term that the user wants to search for, obtains the ID of the document that protects the term based on the term, and then obtains the document based on the ID. It is the process of finding documents based on entries .
Is it just the other way around?
So what are the advantages and disadvantages of the two methods?
Forward index :
- advantage:
- Indexes can be created for multiple fields
- Searching and sorting based on index fields are very fast
- shortcoming:
- When searching based on non-indexed fields or partial entries in indexed fields, only the entire table can be scanned.
Inverted index :
- advantage:
- When searching based on terms or fuzzy search, the speed is very fast.
- shortcoming:
- Indexes can only be created for terms, not fields
- Unable to sort based on fields
1.3. Some concepts of es
elasticsearch
There are many unique concepts in , which mysql
are slightly different from , but there are also similarities.
1.3.1.Documents and fields
elasticsearch
It is stored for documents .A piece of data is a document, which can be a piece of product data or an order information in the database.Document data will be serialized into json format and stored in elasticsearch:
AndJson documents often contain many fields (Field) , similar to columns in the database, fields in the Json document。
1.3.2.Indexing and mapping
Index (Index) isA collection of documents of the same type。
For example:
- All user documents can be organized together, called the user's index;
- All product documents can be organized together and are called product indexes;
- All order documents can be organized together, called the order index;
Therefore, we can think of indexes as tables in the database.
Database tables will have constraint information, which is used to define the table structure, field names, types and other information. Therefore, there is mapping in the index library , which isField constraint information of documents in the index , such as field names and types, structural constraints similar to tables.
1.3.3.mysql与elasticsearch
Let us uniformly compare the concepts of mysql
and :elasticsearch
MySQL | Elasticsearch | illustrate |
---|---|---|
Table | Index | An index is a collection of documents, similar to a database table |
Row | Document | Document is a piece of data, similar to a row in a database. Documents are all in JSON format. |
Column | Field | Field (Field) is a field in a JSON document, similar to a column (Column) in a database |
Schema | Mapping | Mappings are constraints on documents in the index, such as field type constraints. Database-like table structure (Schema) |
SQL | DSL | DSL is a JSON-style request statement provided by elasticsearch, which is used to operate elasticsearch and implement CRUD. |
Does it mean that after we learn elasticsearch, we no longer need mysql?
This is not the case, each has its own advantages and disadvantages:
Mysql
: Good at transaction type operations and can ensure data security and consistencyElasticsearch
: Good at searching, analyzing and calculating massive data
Therefore, in enterprises, the two are often used in combination:
- rightWrite operations with high security requirements are implemented using mysql
- rightSearch needs with high query performance requirements are implemented using elasticsearch.
- The two are then based on a certain method to achieve data synchronization and ensure consistency.
1.3.4. Summary
1.4. Installation es
,kibana
1.4.1.Installation
Reference pre-course materials:
1. Deploy single-point es - 2. Deploy kibana
1.4.2.tokenizer
es
When creating an inverted index, the document needs to be segmented; when searching, the user input content needs to be segmented. But the default word segmentation rules are not friendly to Chinese processing.
We tested kibana
in DevTools
:
To process Chinese word segmentation, IK word segmentation is generally used.. https://github.com/medcl/elasticsearch-analysis-ik
To install the IK word segmenter, refer to the pre-course material "Installing elasticsearch.md":
3. Install the IK word segmenter (applicable to Chinese word segmentation).
Refer to the pre-course material:
ik word segmentation The device contains two modes (choose according to your needs):
ik_smart
: Minimum segmentation, coarse-grainedik_max_word
: Finest segmentation, fine-grained
1.4.3.expanded word dictionary
With the development of the Internet, "word-making movements" have become more and more frequent. Many new words appeared that did not exist in the original vocabulary list. For example: "Aoligei", "Chuanzhi Podcast", etc.
Therefore, our vocabulary also needs to be constantly updated. The IK word segmenter provides the function of expanding vocabulary.
1) Open the IK word segmenter config directory:
2) Add the following to the IKAnalyzer.cfg.xml configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典-->
<entry key="ext_dict">ext.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典 *** 添加停用词词典-->
<entry key="ext_stopwords">stopword.dic</entry>
</properties>
3) Create a new one ext.dic
(in the same directory as IKAnalyzer.cfg.xml). You can config
copy a configuration file in the reference directory to modify it.
传智播客
奥力给
stopword.dic
The file itself has
4) Restartelasticsearch
docker restart es
# 查看 日志
docker logs -f elasticsearch
ext.dic
The configuration file has been successfully loaded in the log
5) Test effect:
GET /_analyze
{
"analyzer": "ik_max_word",
"text": "传智播客Java就业超过90%,奥力给!"
}
Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited.
1.4.4.stop word dictionary
- In Internet projects, the transmission speed between networks is very fast, so many languages are not allowed to be transmitted on the Internet, such as sensitive words such as religion and politics, so we should also ignore the current vocabulary when searching.
The IK word segmenter also provides a powerful stop word function, allowing us to directly ignore the contents of the current stop word list during indexing.
1) Add the contents of the IKAnalyzer.cfg.xml configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典-->
<entry key="ext_dict">ext.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典 *** 添加停用词词典-->
<entry key="ext_stopwords">stopword.dic</entry>
</properties>
3) stopword.dic
Add stop words
张大大
4) Restart elasticsearch
# 重启服务
docker restart elasticsearch
docker restart kibana
# 查看 日志
docker logs -f elasticsearch
stopword.dic
The configuration file has been successfully loaded in the log
5) Test effect:
GET /_analyze
{
"analyzer": "ik_max_word",
"text": "传智播客Java就业率超过95%,张大大都点赞,奥力给!"
}
Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited.
1.4.5. Summary
What is the function of tokenizer?
- Tokenize documents when creating an inverted index
- When the user searches, the input content is segmented into words
How many modes does the IK word segmenter have?
ik_smart
: Intelligent segmentation, coarse-grainedik_max_word
: Finest segmentation, fine-grained
IK
How does the tokenizer expand terms? How to deactivate an entry?
- Use the files
config
in the directoryIkAnalyzer.cfg.xml
to add expanded dictionaries and deactivated dictionaries - Add expanded entries or deactivated entries in the dictionary
2. Index library operation
The index library is similar to a database table, and mapping
the mapping is similar to the table structure.
If we want to store data in ES, we must first create a "library" and "table".
2.1. mapping mapping properties
mapping
It is a constraint on documents in the index database. Common mapping
attributes include:
(No array type, but allows multiple values for an element)
type
: Field data type. Common simple types are:- String:
text
(separable text)、keyword
(exact value (inseparable word), for example: brand, country, IP address) - Numeric value: long, integer, short, byte, double, float,
- Boolean: boolean
- Date: date
- Object: object
- String:
index
: Whether to create an index, the default is trueanalyzer
: Which tokenizer to useproperties
: Subfield of this field (such as firstName under name below)
For example, the following json
document:
{
"age": 21,
"weight": 52.1,
"isMarried": false,
"info": "黑马程序员Java讲师",
"email": "[email protected]",
"score": [99.1, 99.5, 98.9],
"name": {
"firstName": "云",
"lastName": "赵"
}
}
Corresponding mapping of each field ( mapping
):
- Age: type is integer; participates in search, so index needs to be true; no word separator is required
- weight: type is float; participates in search, so index needs to be true; no word separator is required
- isMarried: type is boolean; participates in search, so index needs to be true; no word separator is required
- info: The type is a string and requires word segmentation, so it is text; it participates in search, so the index needs to be true; the word segmenter can be ik_smart
- email: type is string, but does not require word segmentation, so it is keyword; does not participate in search, so index needs to be false; no word segmentation is required
- Score: Although it is an array, we only look at the type of the element, which is float; it participates in the search, so the index needs to be true; no word separator is needed
- name: type is object, multiple sub-attributes need to be defined
- name.firstName; type is string, but does not require word segmentation, so it is keyword; participates in search, so index needs to be true; no word segmentation is required
- name.lastName; type is string, but does not require word segmentation, so it is keyword; participates in search, so index needs to be true; no word segmentation is required
2.1.1 Summary
2.2. CRUD of index library
Here we use the Kibana
writing DSL
method to demonstrate uniformly.
2.2.1. Create index library and mapping
Basic syntax:
- Request method: PUT
- Request path:/index library name, can be customized
- Request parameters: mapping mapping
Format:
PUT /索引库名称
{
"mappings": {
"properties": {
"字段名":{
"type": "text",
"analyzer": "ik_smart"
},
"字段名2":{
"type": "keyword",
"index": "false"
},
"字段名3":{
"properties": {
"子字段": {
"type": "keyword"
}
}
},
// ...略
}
}
}
Example:
according to:
#创建索引库
PUT /heima
{
"mappings": {
#做映射的
"properties": {
#代表里面是一个一个的字段了
"info":{
# 字段名1
"type": "text", #字段数据类型
"analyzer": "ik_smart" #分词器
},
"email":{
# 字段名2
"type": "keyword",
"index": "false" #是否创建爱你索引,默认为true;
},
"name":{
# 字段名3(嵌套类型的)
"type":"object",
"properties": {
"firstName": {
"type": "keyword"
},
"lastName": {
"type": "keyword"
}
}
},
// ... 略
}
}
}
operation result:
2.2.2. Query index library
Basic syntax :
- Request method: GET
- Request path:/index library name
- Request parameters: none
Format :
GET /索引库名
Example :
2.2.3. Modify the index library
Although the inverted index structure is not complicated, once the data structure changes (such as changing the word segmenter), the inverted index needs to be re-created, which is a disaster. Therefore the index libraryOnce created, the mapping cannot be modified。
AlthoughUnable to modify existing fields in mapping, butAllow adding new fieldsinto mapping, because it will not affect the inverted index.
Syntax description :
PUT /索引库名/_mapping
{
"properties": {
"新字段名":{
"type": "integer"
}
}
}
Example :
2.2.4. Delete index library
grammar:
- Request method: DELETE
- Request path:/index library name
- Request parameters: none
Format:
DELETE /索引库名
kibana
Testing in China:
Now it can’t be found
2.2.5. Summary
What are the index library operations?
- Create index library:
PUT /索引库名
- Query index library:
GET /索引库名
- Add fields (modify index library):
PUT /索引库名/_mapping
- Delete index library:
DELETE /索引库名
3.Document operations
3.1. Add new documents
grammar:
POST /索引库名/_doc/文档id
{
"字段1": "值1",
"字段2": "值2",
"字段3": {
"子属性1": "值3",
"子属性2": "值4"
},
// ...
}
Example:
POST /heima/_doc/1
{
"info": "程序员",
"email": "[email protected]",
"name": {
"firstName": "云",
"lastName": "赵"
}
}
response:
3.2. Query documents
According to rest
the style, the new addition is post
, the query should be get
, but queries generally require conditions, so here we bring the document id
.
grammar:
GET /{
索引库名称}/_doc/{
id}
View data through kibana:
GET /heima/_doc/1
View Results:
3.3.Delete documents
Delete using DELETE
request, again, need to id
be deleted according to:
syntax:
DELETE /{
索引库名}/_doc/id值
Example:
# 根据id删除数据
DELETE /heima/_doc/1
result:
3.4. Modify documents
There are two ways to modify:
- Full modification: Directly overwrite the original document
- Incremental modification: Modify some fields in the document
3.4.1.Full modification
Full modification covers the original document, its essence is:
- Delete documents based on specified id
- Add a new document with the same ID
Note : If the id does not exist when deleting based on the id, the addition in the second step will also be performed, which changes from a modification to a new operation.
grammar:
PUT /{
索引库名}/_doc/文档id
{
"字段1": "值1",
"字段2": "值2",
// ... 略
}
Example:
PUT /heima/_doc/1
{
"info": "黑马程序员高级Java讲师",
"email": "[email protected]",
"name": {
"firstName": "云",
"lastName": "赵"
}
}
3.4.2.Incremental modification
Incremental modification is to modify only some fields in the document matching the specified ID.
grammar:
POST /{
索引库名}/_update/文档id
{
"doc": {
"字段名": "新的值",
}
}
Example:
POST /heima/_update/1
{
"doc": {
"email": "[email protected]"
}
}
3.5. Summary
What are the document operations?
- Create document:
POST /{索引库名}/_doc/文档id { json文档 }
- Query documents:
GET /{索引库名}/_doc/文档id
- Delete document:
DELETE /{索引库名}/_doc/文档id
- Modify document:
- Full modification:
PUT /{索引库名}/_doc/文档id { json文档 }
- Incremental modification:
POST /{索引库名}/_update/文档id { "doc": {字段}}
- Full modification:
4. RestAPI [java operation es index library]
Our ultimate goal is to operate ES through Java code;
ES officially provides clients in various languages for operating ES. The essence of these clients is to assemble DSL statements and send them to ES through http requests. Official document address:https://www.elastic.co/guide/en/elasticsearch/client/index.html
There are two types of them Java Rest Client
:
- Java Low Level Rest Client
- Java High Level Rest Client
What we are learning is the Java HighLevel Rest Client client API
4.0.Import Demo project -Initialize RestHighLevelClient
4.0.1.Import data
First import the database data provided by the pre-course materials:
the data structure is as follows:
CREATE TABLE `tb_hotel` (
`id` bigint(20) NOT NULL COMMENT '酒店id',
`name` varchar(255) NOT NULL COMMENT '酒店名称;例:7天酒店',
`address` varchar(255) NOT NULL COMMENT '酒店地址;例:航头路',
`price` int(10) NOT NULL COMMENT '酒店价格;例:329',
`score` int(2) NOT NULL COMMENT '酒店评分;例:45,就是4.5分',
`brand` varchar(32) NOT NULL COMMENT '酒店品牌;例:如家',
`city` varchar(32) NOT NULL COMMENT '所在城市;例:上海',
`star_name` varchar(16) DEFAULT NULL COMMENT '酒店星级,从低到高分别是:1星到5星,1钻到5钻',
`business` varchar(255) DEFAULT NULL COMMENT '商圈;例:虹桥',
`latitude` varchar(32) NOT NULL COMMENT '纬度;例:31.2497',
`longitude` varchar(32) NOT NULL COMMENT '经度;例:120.3925',
`pic` varchar(255) DEFAULT NULL COMMENT '酒店图片;例:/img/1.jpg',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Import Data:
-- ----------------------------
-- Records of tb_hotel
-- ----------------------------
INSERT INTO `tb_hotel` VALUES (36934, '7天连锁酒店(上海宝山路地铁站店)', '静安交通路40号', 336, 37, '7天酒店', '上海', '二钻', '四川北路商业区', '31.251433', '121.47522', 'https://m.tuniucdn.com/fb2/t1/G1/M00/3E/40/Cii9EVkyLrKIXo1vAAHgrxo_pUcAALcKQLD688AAeDH564_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (38609, '速8酒店(上海赤峰路店)', '广灵二路126号', 249, 35, '速8', '上海', '二钻', '四川北路商业区', '31.282444', '121.479385', 'https://m.tuniucdn.com/fb2/t1/G2/M00/DF/96/Cii-TFkx0ImIQZeiAAITil0LM7cAALCYwKXHQ4AAhOi377_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (38665, '速8酒店上海中山北路兰田路店', '兰田路38号', 226, 35, '速8', '上海', '二钻', '长风公园地区', '31.244288', '121.422419', 'https://m.tuniucdn.com/fb2/t1/G2/M00/EF/86/Cii-Tlk2mV2IMZ-_AAEucgG3dx4AALaawEjiycAAS6K083_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2003479905, '上海榕港万怡酒店', '新松江路1277号', 798, 46, '万怡', '上海', '四钻', '佘山/松江大学城', '31.038198', '121.210178', 'https://m.tuniucdn.com/fb3/s1/2n9c/2GM761BYH8k15qkNrJrja3cwfr2D_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2009548883, '和颐至尚酒店(北京首都机场新国展店)', '府前二街6号', 611, 46, '和颐', '北京', '三钻', '首都机场/新国展地区', '40.063953', '116.576829', 'https://m.tuniucdn.com/fb3/s1/2n9c/43zCTomkMSkUfZByZxn77YH2XidJ_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2011785622, '北京世园凯悦酒店', '阜康南路1号院1号楼A', 558, 47, '凯悦', '北京', '五星级', '延庆休闲度假区', '40.440732', '115.963259', 'https://m.tuniucdn.com/fb3/s1/2n9c/uhGcQze3zZQxe4avSU8BysgYVvx_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2022598930, '上海宝华喜来登酒店', '南奉公路3111弄228号', 2899, 46, '喜来登', '上海', '五钻', '奉贤开发区', '30.921659', '121.575572', 'https://m.tuniucdn.com/fb2/t1/G6/M00/45/BD/Cii-TF3ZaBmIStrbAASnoOyg7FoAAFpYwEoz9oABKe4992_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2048050570, '汉庭酒店(深圳坪山火车站店)', '新和路127-2号', 436, 47, '汉庭', '深圳', '二钻', '坪山高铁站商圈', '22.700753', '114.339089', 'https://m.tuniucdn.com/fb3/s1/2n9c/2nXN2bWjfoqoTkPwHvLJQPYz17qD_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2056132395, '深圳深铁皇冠假日酒店', '深南大道9819号', 340, 47, '皇冠假日', '深圳', '五钻', '科技园', '22.538923', '113.944794', 'https://m.tuniucdn.com/fb3/s1/2n9c/eBLtrED2uJs7yURWfjnWge9dT1P_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2062643512, '深圳国际会展中心希尔顿酒店', '展丰路80号', 285, 46, '希尔顿', '深圳', '五钻', '深圳国际会展中心商圈', '22.705335', '113.77794', 'https://m.tuniucdn.com/fb3/s1/2n9c/2SHUVXNrN5NsXsTUwcd1yaHKbrGq_w200_h200_c1_t0.jpg');
SET FOREIGN_KEY_CHECKS = 1;
4.0.2.Import project
Then import the project provided by the pre-course materials:
The project structure is as shown below (the configuration file should be changed to your own information):
4.0.3. Mapping analysis
When creating an index library, the most critical thing is mapping. The information to be considered in mapping includes:
- Field name
- Field data type
- Whether to participate in the search
- Whether word segmentation is needed
- If word segmentation, what is the segmenter?
in:
- Field name, field data type, you can refer to the name and type of the data table structure
- Whether to participate in the search must be determined by analyzing the business , such as the image address, there is no need to participate in the search
- Whether word segmentation depends on the content. If the content is a whole, there is no need for word segmentation. Otherwise, word segmentation is required.
- Word separator, we can use it uniformly
ik_max_word
Data table structure
Let’s take a look at the index database structure of hotel data:
PUT /hotel
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "ik_max_word",
"copy_to": "all"
},
"address":{
"type": "keyword",
"index": false #不创建索引;
},
"price":{
"type": "integer"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword",
"copy_to": "all"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword",
"index": false
},
"all":{
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
Description of several special fields:
location
:Geographic coordinates, including precision and latitudeall
:A combined field whose purpose is to merge the values of multiple fields using copy_to and provide them to users for search
Geographical coordinates description:
copy_to
illustrate:
4.0.4.Initialize RestHighLevelClient
In the API provided by elasticsearch, all interactions with elasticsearch are encapsulated in a class named RestHighLevelClient.Complete the initialization of this object and establish a connection with elasticsearch。
Divided into three steps:
1) Introduced es
dependencies RestHighLevelClient
:
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.12.1</version>
</dependency>
2) Because SpringBoot
the default ES version is 7.6.2, we need to override the default ES version:
<properties>
<java.version>1.8</java.version>
<!--这一句-->
<elasticsearch.version>7.12.1</elasticsearch.version>
</properties>
3) Initialization RestHighLevelClient
:
The initialization code is as follows:
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://192.168.150.101:9200")
));
For the convenience of unit testing, we create a test class HotelIndexTest
and write the initialization code in @BeforeEach
the method:
package cn.itcast.hotel;
public class HotelIndexTest {
private RestHighLevelClient client; //将 RestHighLevelClient 定位成员变量;下面可以复用;
@BeforeEach
//客户端初始化创建
void setUp() {
this.client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://192.168.150.101:9200")
));
}
@AfterEach
//客户端销毁
void tearDown() throws IOException {
this.client.close();
}
}
4.1. Create index library
4.1.1. Code interpretation
API for creating index libraryAs follows:
The code is divided into three steps:
- 1)Create a Request object. Because it is an operation to create an index library, the Request is CreateIndexRequest.
- 2)Add request parameters, in factJSON parameter part of DSL. Because the json string is very long, the static string constant MAPPING_TEMPLATE is defined here to make the code look more elegant.
- 3)send request, The return value of the client.indices() method is the IndicesClient type, which encapsulates all methods related to index library operations.
4.1.2. Complete example
Under the cn.itcast.hotel.constants package of hotel-demo, create a class to define the JSON string constants of the mapping mapping:
package cn.itcast.hotel.constants;
public class HotelConstants {
public static final String MAPPING_TEMPLATE = "{\n" +
" \"mappings\": {\n" +
" \"properties\": {\n" +
" \"id\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"name\":{\n" +
" \"type\": \"text\",\n" +
" \"analyzer\": \"ik_max_word\",\n" +
" \"copy_to\": \"all\"\n" +
" },\n" +
" \"address\":{\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" },\n" +
" \"price\":{\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"score\":{\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"brand\":{\n" +
" \"type\": \"keyword\",\n" +
" \"copy_to\": \"all\"\n" +
" },\n" +
" \"city\":{\n" +
" \"type\": \"keyword\",\n" +
" \"copy_to\": \"all\"\n" +
" },\n" +
" \"starName\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"business\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"location\":{\n" +
" \"type\": \"geo_point\"\n" +
" },\n" +
" \"pic\":{\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" },\n" +
" \"all\":{\n" +
" \"type\": \"text\",\n" +
" \"analyzer\": \"ik_max_word\"\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
}
In the test class in hotel-demo
, HotelIndexTest
write the unit test,Implement index creation:
@Test
void createHotelIndex() throws IOException {
// 1.创建Request对象
CreateIndexRequest request = new CreateIndexRequest("hotel");
// 2.准备请求的参数:DSL语句
request.source(MAPPING_TEMPLATE, XContentType.JSON);
// 3.发送请求
client.indices().create(request, RequestOptions.DEFAULT);
}
4.2. Delete the index library
Deleting the DSL statement of the index library is very simple:
DELETE /hotel
Compared to creating an index library:
- The request method changes from PUT to DELTE
- The request path remains unchanged
- No request parameters
Therefore, the difference in code should be noted that it is reflected in the Request object. Still three steps:
- 1)Create a Request object. This time it’s
DeleteIndexRequest
the object - 2)Prepare parameters. There is no ginseng here
- 3)send request. Use
delete
method instead
In the test class hotel-demo
in HotelIndexTest
, write a unit test to implement deletion of the index:
@Test
void testDeleteHotelIndex() throws IOException {
// 1.创建Request对象
DeleteIndexRequest request = new DeleteIndexRequest("hotel");
// 2.发送请求
client.indices().delete(request, RequestOptions.DEFAULT);
}
4.3. Determine whether the index library exists
Determining whether the index library exists is essentially a query. The corresponding DSL is:
GET /hotel
So the flow of deleted Java code is similar. Still three steps:
- 1)Create
Request
objects. This time it’sGetIndexRequest
the object - 2)Prepare parameters. There is no ginseng here
- 3)send request. Use
exists
method instead
@Test
void testExistsHotelIndex() throws IOException {
// 1.创建Request对象
GetIndexRequest request = new GetIndexRequest("hotel");
// 2.发送请求
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
// 3.输出
System.err.println(exists ? "索引库已经存在!" : "索引库不存在!");
}
4.4. Summary
JavaRestClient
The operation elasticsearch
process is basically similar. The core is client.indices()
the method to obtain the operation object of the index library.
Basic steps for index library operation:
- initialization
RestHighLevelClient
- Create
XxxIndexRequest
.XXX
YesCreate
,Get
,Delete
- Preparation
DSL
(required when creating, others have no parameters) - send request. Calling
RestHighLevelClient#indices().xxx()
method, xxx is create, exists, delete
5. RestClient [java operation documentation]
In order to separate from the index library operation, we again join a test class to do two things:
- initialization
RestHighLevelClient
- Our hotel data is in the database and needs to be
IHotelService
queried, so we inject this interface
package cn.itcast.hotel;
@SpringBootTest
public class HotelDocumentTest {
@Autowired
private IHotelService hotelService; //利用对象完成酒店数据的查询;
private RestHighLevelClient client;
@BeforeEach
//客户端初始化创建
void setUp() {
this.client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://192.168.150.101:9200")
));
}
@AfterEach
//客户端销毁
void tearDown() throws IOException {
this.client.close();
}
}
5.1. Add new documents - write database data into the index library
We need to query the hotel data from the database and write it into it elasticsearch
.
5.1.1. Index library entity class
The result of the database query is a Hotel type object. The structure is as follows:
@Data
@TableName("tb_hotel")
public class Hotel {
@TableId(type = IdType.INPUT)
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String longitude;
private String latitude;
private String pic;
}
There are differences with our index library structure:
longitude
andlatitude
need to be merged intolocation
Therefore, we need to define a new type that matches the index library structure:
Implement type conversion from database data to index database data;
package cn.itcast.hotel.pojo;
import lombok.Data;
import lombok.NoArgsConstructor;
@Data
@NoArgsConstructor
public class HotelDoc {
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String location;
private String pic;
public HotelDoc(Hotel hotel) {
this.id = hotel.getId();
this.name = hotel.getName();
this.address = hotel.getAddress();
this.price = hotel.getPrice();
this.score = hotel.getScore();
this.brand = hotel.getBrand();
this.city = hotel.getCity();
this.starName = hotel.getStarName();
this.business = hotel.getBusiness();
this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
this.pic = hotel.getPic();
}
}
5.1.2.Grammar instructions
The DSL statement of the new document is as follows:
POST /{
索引库名}/_doc/1
{
"name": "Jack",
"age": 21
}
The corresponding java code is as shown in the figure:
you can see that it is similar to creating an index library, and it is also a three-step process:
- 1) Create a Request object
- 2) Prepare request parameters, which is the JSON document in DSL
- 3) Send request
The change is that client.xxx()
the API used directly here is no longer needed client.indices()
.
5.1.3. Complete code
When we import hotel data, the basic process is the same, but there are a few changes that need to be considered:
- The hotel data comes from the database. We need to query it first and get the hotel object.
- The hotel object needs to be converted into a HotelDoc object
- HotelDoc needs to be serialized into json format
Therefore, the overall steps of the code are as follows:
- 1) Query hotel data Hotel based on id
- 2) Encapsulate Hotel as HotelDoc
- 3) Serialize HotelDoc to JSON
- 4) Create IndexRequest and specify the index library name and id
- 5) Prepare request parameters, which is the JSON document
- 6) Send request
In hotel-demo
the HotelDocumentTest
test class, write the unit test:
@Test
void testAddDocument() throws IOException {
// 1.根据id查询酒店数据
Hotel hotel = hotelService.getById(61083L);
// 2.转换为文档类型
HotelDoc hotelDoc = new HotelDoc(hotel); //从数据库类型转换成索引库所需要的类型;
// 3.将HotelDoc转json
String json = JSON.toJSONString(hotelDoc); //json的序列化,把对象变成json的风格;
// 1.准备Request对象
IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
// 2.准备Json文档
request.source(json, XContentType.JSON);
// 3.发送请求
client.index(request, RequestOptions.DEFAULT);
}
5.2. Query documents
5.2.1. Grammar description
The query DSL statement is as follows:
GET /hotel/_doc/{
id}
It's very simple, so the code is roughly divided into two steps:
- Prepare the Request object
- send request
However, the purpose of the query is to get the result and parse it as HotelDoc
, so the difficulty is the parsing of the result. The complete code is as follows: As
you can see, the result is a JSON, in which the document is placed _source
in an attribute, so the parsing is to get it _source
and deserialize it into a Java object.
Similar to before, it is also a three-step process:
- 1) Prepare the Request object. This time it’s a query, so it’s GetRequest
- 2) Send a request and get the result. Because it is a query, the client.get() method is called here.
- 3) The parsing result is to deserialize JSON
5.2.2. Complete code
In hotel-demo
the HotelDocumentTest
test class, write the unit test:
@Test
void testGetDocumentById() throws IOException {
// 1.准备Request
GetRequest request = new GetRequest("hotel", "61082");
// 2.发送请求,得到响应
GetResponse response = client.get(request, RequestOptions.DEFAULT);
// 3.解析响应结果
String json = response.getSourceAsString();
HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class); //将json反序列化;
System.out.println(hotelDoc);
}
5.3.Delete documents
The deleted DSL looks like this:
DELETE /hotel/_doc/{
id}
Compared with query, only the request method has DELETE
changed GET
. It can be imagined Java
that the code should still take three steps:
- 1)Prepare
Request
the object, because it is deleted, this time it isDeleteRequest
the object. To specify the index library name andid
- 2)Prepare parameters, no parameters
- 3)send request. Because it is deletion, it is
client.delete()
a method
In hotel-demo
the HotelDocumentTest
test class, write the unit test:
@Test
void testDeleteDocument() throws IOException {
// 1.准备Request
DeleteRequest request = new DeleteRequest("hotel", "61083");
// 2.发送请求
client.delete(request, RequestOptions.DEFAULT);
}
5.4. Modify documents
5.4.1. Grammar description
We have talked about two ways to modify:
- Full modification: The essence is to delete based on the ID first and then add it.
- Incremental modification: Modify the specified field value in the document
In RestClient's API,All modifications are completely consistent with the new API, the judgment is based on ID:
- If the ID already exists when adding it, modify it.
- If the ID does not exist when adding, add it
I won’t go into details here, we mainly focus onIncremental modification.
The code example is shown in the figure:
similar to before, it is also a three-step process:
- 1)Prepare
Request
objects. This time it is a modification, so it isUpdateRequest
- 2)Prepare parameters. That is
JSON
, the document contains the fields to be modified. - 3)Update documentation.
client.update()
Call method here
5.4.2. Complete code
In hotel-demo
the HotelDocumentTest
test class, write the unit test:
@Test
void testUpdateDocument() throws IOException {
// 1.准备Request
UpdateRequest request = new UpdateRequest("hotel", "61083");
// 2.准备请求参数
request.doc(
"price", "952",
"starName", "四钻"
);
// 3.发送请求
client.update(request, RequestOptions.DEFAULT);
}
5.5. Import documents in batches - write database data into the index library in batches
Case requirements: BulkRequest
Import database data into the index library in batches.
Proceed as follows:
- Use mybatis-plus to query hotel data
- Convert the queried hotel data (Hotel) into document type data (HotelDoc)
- Use BulkRequest batch processing in JavaRestClient to add new documents in batches
5.5.1. Grammar description
The essence of batch processing BulkRequest is to combine multiple ordinary CRUD requests and send them together.
An add method is provided to add other requests:
As you can see, the requests that can be added include:
- IndexRequest, that is, new
- UpdateRequest, which is to modify
- DeleteRequest, that is, delete
Therefore, Bulk
adding multiple IndexRequest
functions is a batch of new functions. Example:
In fact, there are still three steps:
- 1)Create a Request object. Here is BulkRequest
- 2)Prepare parameters. The parameters of batch processing are other Request objects, here are multiple IndexRequests
- 3)Make a request. Here is batch processing, the method called is client.bulk() method
When we import hotel data,Just transform the above code into a for loop.。
5.5.2. Complete code
In hotel-demo
the HotelDocumentTest
test class, write the unit test:
@Test
void testBulkRequest() throws IOException {
// 批量查询酒店数据
List<Hotel> hotels = hotelService.list(); //查询所有酒店数据;
// 1.创建Request
BulkRequest request = new BulkRequest();
// 2.准备参数,添加多个新增的Request;(文档的转换与请求的创建)
for (Hotel hotel : hotels) {
// 2.1.转换为文档类型HotelDoc
HotelDoc hotelDoc = new HotelDoc(hotel);
// 2.2.创建新增文档的Request对象
request.add(new IndexRequest("hotel") //指定索引库名
.id(hotelDoc.getId().toString()) //指定id;
.source(JSON.toJSONString(hotelDoc), XContentType.JSON)); //指定json
}
// 3.发送请求
client.bulk(request, RequestOptions.DEFAULT);
}
5.6. Summary
Basic steps for document operations:
- Initialize RestHighLevelClient
- Create XxxRequest. XXX is Index, Get, Update, Delete, Bulk
- Prepare parameters (required for Index, Update, and Bulk)
- send request. Call RestHighLevelClient#.xxx() method, xxx is index, get, update, delete
- Parse results (required when Get)