Stage 8: Advanced Service Framework (Chapter 6: ElasticSearch1)

Stage 8: Advanced Service Framework (Chapter 6: ElasticSearch1)

Day-Chapter 6:ElasticSearch

0.Learning Objectives

Insert image description here
Insert image description here

1. First introduction to elasticsearch

1.1.Understand ES

1.1.1.The role of elasticsearch

  elasticsearchIt's a very powerfulOpen source search engine, has many powerful functions that can help usQuickly find what you need from massive amounts of data

For example:

  • Searching for GitHubcode
    Insert image description here
  • Search for products on e-commerce websites
    Insert image description here
  • Search answers on Baidu
    Insert image description here
  • Search for nearby cars on taxi-hailing apps
    Insert image description here

1.1.2.ELKtechnology stack

  elasticsearchCombining kibana, Logstash, Beats, it is elastic stack( ELK).
Insert image description here
is widely used inLog data analysis, real-time monitoringetc. fields: rather
Insert image description here
elasticsearchelastic stackcore,ResponsibleStore, search, analyzedata.
Insert image description here

1.1.3.elasticsearch和lucene

  elasticsearchThe bottom layer is implemented based on Lucene

  Lucene is a Java language search engine class library (jar package). It is a top-level project of Apache Company and was developed by Doug Cutting in 1999. Official website address: https://lucene.apache.org/ .
Insert image description here

The development history of elasticsearch :

  • In 2004, Shay Banon developed Compass based on Lucene.
  • In 2010 Shay Banon rewrote Compass and named it Elasticsearch.

Insert image description here

1.1.4. Why not other search technologies?

The current ranking of relatively well-known search engine technologies:
Insert image description here
Although in the early days, Apache Solr was the most important search engine technology, with the development of elasticsearch, it has gradually surpassed Solr and taken the lead:
Insert image description here

1.1.5. Summary

What is elasticsearch?

  • An open source distributed search engine that can be used to implement functions such as search, log statistics, analysis, and system monitoring.

What is elastic stack (ELK)?

  • It elasticsearchis the core technology stack, including beats, Logstash, kibana, elasticsearch

What is Lucene?

  • It is Apache's open source search engine class library (jar package), which provides the core API of the search engine.

1.2.Inverted index

Inverted indexThe concept is based on MySQLsuch a forward index.

1.2.1.forward index

So what isforward indexWoolen cloth? For example, create an index for the id in the following table ( tb_goods):
Insert image description here
If the query is based on the id, then the index is used directly, and the query speed is very fast.

But if you do a fuzzy query based on title, you can only scan the data line by line. The process is as follows:

1) The user searches for data, and the condition is that the title matches. "%手机%"
2) Obtains the data row by row, such as the data with id 1.
3) Determines whether the title in the data matches the user's search conditions.
4) If it matches, it will be put into the result set. If it does not match, it will be discarded. Go back to step 1

Progressive scan, that is, full table scan,As the amount of data increases, its query efficiency will become lower and lower. When the amount of data reaches millions, it is a disaster

1.2.2.Inverted index

There are two very important concepts in inverted index:

  • document( Document): Data used to search, of whichEach piece of data is a document. For example, a web page or product information
  • entryTerm):Words into which documents are divided semantically(For document data or user search data, use a certain algorithm to segment words, and the words with meaning obtained are the entries). For example: I am a native of Shangguo, which can be divided into several terms: I, am, a native of Shangguo, Shangguo, Shangren.

Creating an inverted index is a special processing of the forward index. The process is as follows:

  • Use the algorithm to segment the data of each document to obtain each entry.
  • Create a table. Each row of data includes information such as the entry, the document ID, and the location of the entry.
  • becauseThe uniqueness of the term can create an index for the term, such as hash table structure index

As shown in the picture:
Insert image description here

The search process of the inverted index is as follows (take searching for "Huawei mobile phone" as an example):

1) The user enters criteria "华为手机"to search.
2) Segment the user input content into words and obtain the entries: 华为, 手机.
3) Hold the entry inInverted indexSearching in , you can get the document IDs containing the terms: 1, 2, 3.
4) Take the document ID to find the specific document in the forward index.

As shown in the figure:
Insert image description here
  Although the inverted index needs to be queried first, and then the forward index is queried, both the term and the document ID are indexed, and the query speed is very fast!No need for full table scan

Insert image description here

1.2.3. Forward and reverse

So why is one calledforward index, one calledInverted indexWoolen cloth?

  • Forward index is the most traditional way of indexing based on id. However, when querying based on terms, you must first obtain each document one by one, and then determine whether the document contains the required terms. This is the process of finding terms based on the document .
  • On the contrary, the inverted index first finds the term that the user wants to search for, obtains the ID of the document that protects the term based on the term, and then obtains the document based on the ID. It is the process of finding documents based on entries .

Is it just the other way around?

So what are the advantages and disadvantages of the two methods?

Forward index :

  • advantage:
    • Indexes can be created for multiple fields
    • Searching and sorting based on index fields are very fast
  • shortcoming:
    • When searching based on non-indexed fields or partial entries in indexed fields, only the entire table can be scanned.

Inverted index :

  • advantage:
    • When searching based on terms or fuzzy search, the speed is very fast.
  • shortcoming:
    • Indexes can only be created for terms, not fields
    • Unable to sort based on fields

1.3. Some concepts of es

  elasticsearchThere are many unique concepts in , which mysqlare slightly different from , but there are also similarities.

1.3.1.Documents and fields

  elasticsearchIt is stored for documents .A piece of data is a document, which can be a piece of product data or an order information in the database.Document data will be serialized into json format and stored in elasticsearch:
Insert image description here
AndJson documents often contain many fields (Field) , similar to columns in the database, fields in the Json document

1.3.2.Indexing and mapping

Index (Index) isA collection of documents of the same type

For example:

  • All user documents can be organized together, called the user's index;
  • All product documents can be organized together and are called product indexes;
  • All order documents can be organized together, called the order index;
    Insert image description here

Therefore, we can think of indexes as tables in the database.

  Database tables will have constraint information, which is used to define the table structure, field names, types and other information. Therefore, there is mapping in the index library , which isField constraint information of documents in the index , such as field names and types, structural constraints similar to tables.

1.3.3.mysql与elasticsearch

Let us uniformly compare the concepts of mysqland :elasticsearch

MySQL Elasticsearch illustrate
Table Index An index is a collection of documents, similar to a database table
Row Document Document is a piece of data, similar to a row in a database. Documents are all in JSON format.
Column Field Field (Field) is a field in a JSON document, similar to a column (Column) in a database
Schema Mapping Mappings are constraints on documents in the index, such as field type constraints. Database-like table structure (Schema)
SQL DSL DSL is a JSON-style request statement provided by elasticsearch, which is used to operate elasticsearch and implement CRUD.

Does it mean that after we learn elasticsearch, we no longer need mysql?
This is not the case, each has its own advantages and disadvantages:

  • Mysql: Good at transaction type operations and can ensure data security and consistency
  • Elasticsearch: Good at searching, analyzing and calculating massive data

Therefore, in enterprises, the two are often used in combination:

  • rightWrite operations with high security requirements are implemented using mysql
  • rightSearch needs with high query performance requirements are implemented using elasticsearch.
  • The two are then based on a certain method to achieve data synchronization and ensure consistency.

Insert image description here

1.3.4. Summary

Insert image description here

1.4. Installation es,kibana

1.4.1.Installation

Reference pre-course materials:
Insert image description here
1. Deploy single-point es - 2. Deploy kibana

1.4.2.tokenizer

esWhen creating an inverted index, the document needs to be segmented; when searching, the user input content needs to be segmented. But the default word segmentation rules are not friendly to Chinese processing.
We tested kibanain DevTools:
Insert image description here
Insert image description here
To process Chinese word segmentation, IK word segmentation is generally used.. https://github.com/medcl/elasticsearch-analysis-ik
To install the IK word segmenter, refer to the pre-course material "Installing elasticsearch.md":
3. Install the IK word segmenter (applicable to Chinese word segmentation).
Refer to the pre-course material:
Insert image description here
ik word segmentation The device contains two modes (choose according to your needs):

  • ik_smart: Minimum segmentation, coarse-grained
  • ik_max_word: Finest segmentation, fine-grained

1.4.3.expanded word dictionary

  With the development of the Internet, "word-making movements" have become more and more frequent. Many new words appeared that did not exist in the original vocabulary list. For example: "Aoligei", "Chuanzhi Podcast", etc.

Therefore, our vocabulary also needs to be constantly updated. The IK word segmenter provides the function of expanding vocabulary.

1) Open the IK word segmenter config directory:
Insert image description here

2) Add the following to the IKAnalyzer.cfg.xml configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典-->
        <entry key="ext_dict">ext.dic</entry>
        <!--用户可以在这里配置自己的扩展停止词字典  *** 添加停用词词典-->
        <entry key="ext_stopwords">stopword.dic</entry>
</properties>

3) Create a new one ext.dic(in the same directory as IKAnalyzer.cfg.xml). You can configcopy a configuration file in the reference directory to modify it.

传智播客
奥力给

stopword.dicThe file itself has

4) Restartelasticsearch

docker restart es

# 查看 日志
docker logs -f elasticsearch

Insert image description here
ext.dicThe configuration file has been successfully loaded in the log

5) Test effect:

GET /_analyze
{
    
    
  "analyzer": "ik_max_word",
  "text": "传智播客Java就业超过90%,奥力给!"
}

Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited.

1.4.4.stop word dictionary

  •   In Internet projects, the transmission speed between networks is very fast, so many languages ​​​​are not allowed to be transmitted on the Internet, such as sensitive words such as religion and politics, so we should also ignore the current vocabulary when searching.

The IK word segmenter also provides a powerful stop word function, allowing us to directly ignore the contents of the current stop word list during indexing.

1) Add the contents of the IKAnalyzer.cfg.xml configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典-->
        <entry key="ext_dict">ext.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典  *** 添加停用词词典-->
        <entry key="ext_stopwords">stopword.dic</entry>
</properties>

3) stopword.dicAdd stop words

张大大

4) Restart elasticsearch

# 重启服务
docker restart elasticsearch
docker restart kibana

# 查看 日志
docker logs -f elasticsearch

stopword.dicThe configuration file has been successfully loaded in the log

5) Test effect:

GET /_analyze
{
    
    
  "analyzer": "ik_max_word",
  "text": "传智播客Java就业率超过95%,张大大都点赞,奥力给!"
}

Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited.

1.4.5. Summary

What is the function of tokenizer?

  • Tokenize documents when creating an inverted index
  • When the user searches, the input content is segmented into words

How many modes does the IK word segmenter have?

  • ik_smart: Intelligent segmentation, coarse-grained
  • ik_max_word: Finest segmentation, fine-grained

IKHow does the tokenizer expand terms? How to deactivate an entry?

  • Use the files configin the directory IkAnalyzer.cfg.xmlto add expanded dictionaries and deactivated dictionaries
  • Add expanded entries or deactivated entries in the dictionary

2. Index library operation

The index library is similar to a database table, and mappingthe mapping is similar to the table structure.
If we want to store data in ES, we must first create a "library" and "table".

2.1. mapping mapping properties

mappingIt is a constraint on documents in the index database. Common mappingattributes include:
(No array type, but allows multiple values ​​for an element

  • type: Field data type. Common simple types are:
    • String:text(separable text)keyword(exact value (inseparable word), for example: brand, country, IP address)
    • Numeric value: long, integer, short, byte, double, float,
    • Boolean: boolean
    • Date: date
    • Object: object
  • index: Whether to create an index, the default is true
  • analyzer: Which tokenizer to use
  • properties: Subfield of this field (such as firstName under name below)

For example, the following jsondocument:

{
    
    
    "age": 21,
    "weight": 52.1,
    "isMarried": false,
    "info": "黑马程序员Java讲师",
    "email": "[email protected]",
    "score": [99.1, 99.5, 98.9],
    "name": {
    
    
        "firstName": "云",
        "lastName": "赵"
    }
}

Corresponding mapping of each field ( mapping):

  • Age: type is integer; participates in search, so index needs to be true; no word separator is required
  • weight: type is float; participates in search, so index needs to be true; no word separator is required
  • isMarried: type is boolean; participates in search, so index needs to be true; no word separator is required
  • info: The type is a string and requires word segmentation, so it is text; it participates in search, so the index needs to be true; the word segmenter can be ik_smart
  • email: type is string, but does not require word segmentation, so it is keyword; does not participate in search, so index needs to be false; no word segmentation is required
  • Score: Although it is an array, we only look at the type of the element, which is float; it participates in the search, so the index needs to be true; no word separator is needed
  • name: type is object, multiple sub-attributes need to be defined
    • name.firstName; type is string, but does not require word segmentation, so it is keyword; participates in search, so index needs to be true; no word segmentation is required
    • name.lastName; type is string, but does not require word segmentation, so it is keyword; participates in search, so index needs to be true; no word segmentation is required

2.1.1 Summary

Insert image description here

2.2. CRUD of index library

Here we use the Kibanawriting DSLmethod to demonstrate uniformly.

2.2.1. Create index library and mapping

Basic syntax:

  • Request method: PUT
  • Request path:/index library name, can be customized
  • Request parameters: mapping mapping

Format:

PUT /索引库名称
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "字段名":{
    
    
        "type": "text",
        "analyzer": "ik_smart"
      },
      "字段名2":{
    
    
        "type": "keyword",
        "index": "false"
      },
      "字段名3":{
    
    
        "properties": {
    
    
          "子字段": {
    
    
            "type": "keyword"
          }
        }
      },
      // ...略
    }
  }
}

Example:

according to:
Insert image description here

#创建索引库
PUT /heima
{
    
    
  "mappings": {
    
      #做映射的
    "properties": {
    
          #代表里面是一个一个的字段了
      "info":{
    
          # 字段名1
        "type": "text",      #字段数据类型
        "analyzer": "ik_smart"  #分词器
      },
      "email":{
    
          # 字段名2
        "type": "keyword",
        "index": "false"   #是否创建爱你索引,默认为true;
      },
      "name":{
    
          # 字段名3(嵌套类型的)
        "type":"object",
        "properties": {
    
    
          "firstName": {
    
    
            "type": "keyword"
          },
          "lastName": {
    
    
            "type": "keyword"
          }
        }
      },
      // ... 略
    }
  }
}

operation result:
Insert image description here

2.2.2. Query index library

Basic syntax :

  • Request method: GET
  • Request path:/index library name
  • Request parameters: none

Format :

GET /索引库名

Example :
Insert image description here

2.2.3. Modify the index library

  Although the inverted index structure is not complicated, once the data structure changes (such as changing the word segmenter), the inverted index needs to be re-created, which is a disaster. Therefore the index libraryOnce created, the mapping cannot be modified

  AlthoughUnable to modify existing fields in mapping, butAllow adding new fieldsinto mapping, because it will not affect the inverted index.

Syntax description :

PUT /索引库名/_mapping
{
    
    
  "properties": {
    
    
    "新字段名":{
    
    
      "type": "integer"
    }
  }
}

Example :
Insert image description here

2.2.4. Delete index library

grammar:

  • Request method: DELETE
  • Request path:/index library name
  • Request parameters: none

Format:

DELETE /索引库名

kibanaTesting in China:
Insert image description here
Now it can’t be found

2.2.5. Summary

What are the index library operations?

  • Create index libraryPUT /索引库名
  • Query index libraryGET /索引库名
  • Add fields (modify index library)PUT /索引库名/_mapping
  • Delete index libraryDELETE /索引库名

Insert image description here

3.Document operations

3.1. Add new documents

grammar:

POST /索引库名/_doc/文档id
{
    
    
    "字段1": "值1",
    "字段2": "值2",
    "字段3": {
    
    
        "子属性1": "值3",
        "子属性2": "值4"
    },
    // ...
}

Example:

POST /heima/_doc/1
{
    
    
    "info": "程序员",
    "email": "[email protected]",
    "name": {
    
    
        "firstName": "云",
        "lastName": "赵"
    }
}

response:
Insert image description here

3.2. Query documents

  According to restthe style, the new addition is post, the query should be get, but queries generally require conditions, so here we bring the document id.
grammar:

GET /{
    
    索引库名称}/_doc/{
    
    id}

View data through kibana:

GET /heima/_doc/1

View Results:
Insert image description here

3.3.Delete documents

Delete using DELETErequest, again, need to idbe deleted according to:
syntax:

DELETE /{
    
    索引库名}/_doc/id值

Example:

# 根据id删除数据
DELETE /heima/_doc/1

result:
Insert image description here

3.4. Modify documents

There are two ways to modify:

  • Full modification: Directly overwrite the original document
  • Incremental modification: Modify some fields in the document

3.4.1.Full modification

Full modification covers the original document, its essence is:

  • Delete documents based on specified id
  • Add a new document with the same ID

Note : If the id does not exist when deleting based on the id, the addition in the second step will also be performed, which changes from a modification to a new operation.

grammar:

PUT /{
    
    索引库名}/_doc/文档id
{
    
    
    "字段1": "值1",
    "字段2": "值2",
    // ... 略
}

Example:

PUT /heima/_doc/1
{
    
    
    "info": "黑马程序员高级Java讲师",
    "email": "[email protected]",
    "name": {
    
    
        "firstName": "云",
        "lastName": "赵"
    }
}

3.4.2.Incremental modification

Incremental modification is to modify only some fields in the document matching the specified ID.

grammar:

POST /{
    
    索引库名}/_update/文档id
{
    
    
    "doc": {
    
    
         "字段名": "新的值",
    }
}

Example:

POST /heima/_update/1
{
    
    
  "doc": {
    
    
    "email": "[email protected]"
  }
}

3.5. Summary

What are the document operations?

  • Create documentPOST /{索引库名}/_doc/文档id { json文档 }
  • Query documentsGET /{索引库名}/_doc/文档id
  • Delete documentDELETE /{索引库名}/_doc/文档id
  • Modify document
    • Full modificationPUT /{索引库名}/_doc/文档id { json文档 }
    • Incremental modificationPOST /{索引库名}/_update/文档id { "doc": {字段}}

4. RestAPI [java operation es index library]

Our ultimate goal is to operate ES through Java code;
  ES officially provides clients in various languages ​​for operating ES. The essence of these clients is to assemble DSL statements and send them to ES through http requests. Official document address:https://www.elastic.co/guide/en/elasticsearch/client/index.html

There are two types of them Java Rest Client:

  • Java Low Level Rest Client
  • Java High Level Rest Client

Insert image description here
What we are learning is the Java HighLevel Rest Client client API

4.0.Import Demo project -Initialize RestHighLevelClient

4.0.1.Import data

First import the database data provided by the pre-course materials:
Insert image description here
the data structure is as follows:

CREATE TABLE `tb_hotel` (
  `id` bigint(20) NOT NULL COMMENT '酒店id',
  `name` varchar(255) NOT NULL COMMENT '酒店名称;例:7天酒店',
  `address` varchar(255) NOT NULL COMMENT '酒店地址;例:航头路',
  `price` int(10) NOT NULL COMMENT '酒店价格;例:329',
  `score` int(2) NOT NULL COMMENT '酒店评分;例:45,就是4.5分',
  `brand` varchar(32) NOT NULL COMMENT '酒店品牌;例:如家',
  `city` varchar(32) NOT NULL COMMENT '所在城市;例:上海',
  `star_name` varchar(16) DEFAULT NULL COMMENT '酒店星级,从低到高分别是:1星到5星,1钻到5钻',
  `business` varchar(255) DEFAULT NULL COMMENT '商圈;例:虹桥',
  `latitude` varchar(32) NOT NULL COMMENT '纬度;例:31.2497',
  `longitude` varchar(32) NOT NULL COMMENT '经度;例:120.3925',
  `pic` varchar(255) DEFAULT NULL COMMENT '酒店图片;例:/img/1.jpg',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Import Data:

-- ----------------------------
-- Records of tb_hotel
-- ----------------------------
INSERT INTO `tb_hotel` VALUES (36934, '7天连锁酒店(上海宝山路地铁站店)', '静安交通路40号', 336, 37, '7天酒店', '上海', '二钻', '四川北路商业区', '31.251433', '121.47522', 'https://m.tuniucdn.com/fb2/t1/G1/M00/3E/40/Cii9EVkyLrKIXo1vAAHgrxo_pUcAALcKQLD688AAeDH564_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (38609, '速8酒店(上海赤峰路店)', '广灵二路126号', 249, 35, '速8', '上海', '二钻', '四川北路商业区', '31.282444', '121.479385', 'https://m.tuniucdn.com/fb2/t1/G2/M00/DF/96/Cii-TFkx0ImIQZeiAAITil0LM7cAALCYwKXHQ4AAhOi377_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (38665, '速8酒店上海中山北路兰田路店', '兰田路38号', 226, 35, '速8', '上海', '二钻', '长风公园地区', '31.244288', '121.422419', 'https://m.tuniucdn.com/fb2/t1/G2/M00/EF/86/Cii-Tlk2mV2IMZ-_AAEucgG3dx4AALaawEjiycAAS6K083_w200_h200_c1_t0.jpg');

INSERT INTO `tb_hotel` VALUES (2003479905, '上海榕港万怡酒店', '新松江路1277号', 798, 46, '万怡', '上海', '四钻', '佘山/松江大学城', '31.038198', '121.210178', 'https://m.tuniucdn.com/fb3/s1/2n9c/2GM761BYH8k15qkNrJrja3cwfr2D_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2009548883, '和颐至尚酒店(北京首都机场新国展店)', '府前二街6号', 611, 46, '和颐', '北京', '三钻', '首都机场/新国展地区', '40.063953', '116.576829', 'https://m.tuniucdn.com/fb3/s1/2n9c/43zCTomkMSkUfZByZxn77YH2XidJ_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2011785622, '北京世园凯悦酒店', '阜康南路1号院1号楼A', 558, 47, '凯悦', '北京', '五星级', '延庆休闲度假区', '40.440732', '115.963259', 'https://m.tuniucdn.com/fb3/s1/2n9c/uhGcQze3zZQxe4avSU8BysgYVvx_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2022598930, '上海宝华喜来登酒店', '南奉公路3111弄228号', 2899, 46, '喜来登', '上海', '五钻', '奉贤开发区', '30.921659', '121.575572', 'https://m.tuniucdn.com/fb2/t1/G6/M00/45/BD/Cii-TF3ZaBmIStrbAASnoOyg7FoAAFpYwEoz9oABKe4992_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2048050570, '汉庭酒店(深圳坪山火车站店)', '新和路127-2号', 436, 47, '汉庭', '深圳', '二钻', '坪山高铁站商圈', '22.700753', '114.339089', 'https://m.tuniucdn.com/fb3/s1/2n9c/2nXN2bWjfoqoTkPwHvLJQPYz17qD_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2056132395, '深圳深铁皇冠假日酒店', '深南大道9819号', 340, 47, '皇冠假日', '深圳', '五钻', '科技园', '22.538923', '113.944794', 'https://m.tuniucdn.com/fb3/s1/2n9c/eBLtrED2uJs7yURWfjnWge9dT1P_w200_h200_c1_t0.jpg');
INSERT INTO `tb_hotel` VALUES (2062643512, '深圳国际会展中心希尔顿酒店', '展丰路80号', 285, 46, '希尔顿', '深圳', '五钻', '深圳国际会展中心商圈', '22.705335', '113.77794', 'https://m.tuniucdn.com/fb3/s1/2n9c/2SHUVXNrN5NsXsTUwcd1yaHKbrGq_w200_h200_c1_t0.jpg');

SET FOREIGN_KEY_CHECKS = 1;

4.0.2.Import project

Then import the project provided by the pre-course materials:
Insert image description here
The project structure is as shown below (the configuration file should be changed to your own information):
Insert image description here

4.0.3. Mapping analysis

When creating an index library, the most critical thing is mapping. The information to be considered in mapping includes:

  • Field name
  • Field data type
  • Whether to participate in the search
  • Whether word segmentation is needed
  • If word segmentation, what is the segmenter?

in:

  • Field name, field data type, you can refer to the name and type of the data table structure
  • Whether to participate in the search must be determined by analyzing the business , such as the image address, there is no need to participate in the search
  • Whether word segmentation depends on the content. If the content is a whole, there is no need for word segmentation. Otherwise, word segmentation is required.
  • Word separator, we can use it uniformlyik_max_word

Data table structure
Insert image description here

Let’s take a look at the index database structure of hotel data:

PUT /hotel
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "id": {
    
    
        "type": "keyword"
      },
      "name":{
    
    
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": "all"
      },
      "address":{
    
    
        "type": "keyword",
        "index": false  #不创建索引;
      },
      "price":{
    
    
        "type": "integer"
      },
      "score":{
    
    
        "type": "integer"
      },
      "brand":{
    
    
        "type": "keyword",
        "copy_to": "all"
      },
      "city":{
    
    
        "type": "keyword",
        "copy_to": "all"
      },
      "starName":{
    
    
        "type": "keyword"
      },
      "business":{
    
    
        "type": "keyword"
      },
      "location":{
    
    
        "type": "geo_point"
      },
      "pic":{
    
    
        "type": "keyword",
        "index": false
      },
      "all":{
    
    
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

Description of several special fields:

  • locationGeographic coordinates, including precision and latitude
  • allA combined field whose purpose is to merge the values ​​of multiple fields using copy_to and provide them to users for search

Geographical coordinates description
Insert image description here
copy_toillustrate
Insert image description here

4.0.4.Initialize RestHighLevelClient

  In the API provided by elasticsearch, all interactions with elasticsearch are encapsulated in a class named RestHighLevelClient.Complete the initialization of this object and establish a connection with elasticsearch

Divided into three steps:
1) Introduced esdependencies RestHighLevelClient:

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.12.1</version>
</dependency>

Insert image description here

2) Because SpringBootthe default ES version is 7.6.2, we need to override the default ES version:

<properties>
    <java.version>1.8</java.version>
    <!--这一句-->
    <elasticsearch.version>7.12.1</elasticsearch.version>
</properties>

3) Initialization RestHighLevelClient:
The initialization code is as follows:

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
        HttpHost.create("http://192.168.150.101:9200")
));

For the convenience of unit testing, we create a test class HotelIndexTestand write the initialization code in @BeforeEachthe method:

package cn.itcast.hotel;

public class HotelIndexTest {
    
    
    private RestHighLevelClient client;      //将 RestHighLevelClient 定位成员变量;下面可以复用;

    @BeforeEach
    //客户端初始化创建
    void setUp() {
    
    
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.150.101:9200")
        ));
    }

    @AfterEach
    //客户端销毁
    void tearDown() throws IOException {
    
    
        this.client.close();
    }
}

4.1. Create index library

4.1.1. Code interpretation

API for creating index libraryAs follows:
Insert image description here
The code is divided into three steps:

  • 1)Create a Request object. Because it is an operation to create an index library, the Request is CreateIndexRequest.
  • 2)Add request parameters, in factJSON parameter part of DSL. Because the json string is very long, the static string constant MAPPING_TEMPLATE is defined here to make the code look more elegant.
  • 3)send request, The return value of the client.indices() method is the IndicesClient type, which encapsulates all methods related to index library operations.

4.1.2. Complete example

  Under the cn.itcast.hotel.constants package of hotel-demo, create a class to define the JSON string constants of the mapping mapping:

package cn.itcast.hotel.constants;

public class HotelConstants {
    
    
    public static final String MAPPING_TEMPLATE = "{\n" +
            "  \"mappings\": {\n" +
            "    \"properties\": {\n" +
            "      \"id\": {\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"name\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"address\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"price\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"score\":{\n" +
            "        \"type\": \"integer\"\n" +
            "      },\n" +
            "      \"brand\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"city\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"copy_to\": \"all\"\n" +
            "      },\n" +
            "      \"starName\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"business\":{\n" +
            "        \"type\": \"keyword\"\n" +
            "      },\n" +
            "      \"location\":{\n" +
            "        \"type\": \"geo_point\"\n" +
            "      },\n" +
            "      \"pic\":{\n" +
            "        \"type\": \"keyword\",\n" +
            "        \"index\": false\n" +
            "      },\n" +
            "      \"all\":{\n" +
            "        \"type\": \"text\",\n" +
            "        \"analyzer\": \"ik_max_word\"\n" +
            "      }\n" +
            "    }\n" +
            "  }\n" +
            "}";
}

In the test class in hotel-demo, HotelIndexTestwrite the unit test,Implement index creation

@Test
void createHotelIndex() throws IOException {
    
    
    // 1.创建Request对象
    CreateIndexRequest request = new CreateIndexRequest("hotel");
    // 2.准备请求的参数:DSL语句
    request.source(MAPPING_TEMPLATE, XContentType.JSON);
    // 3.发送请求
    client.indices().create(request, RequestOptions.DEFAULT);
}

4.2. Delete the index library

Deleting the DSL statement of the index library is very simple:

DELETE /hotel

Compared to creating an index library:

  • The request method changes from PUT to DELTE
  • The request path remains unchanged
  • No request parameters

Therefore, the difference in code should be noted that it is reflected in the Request object. Still three steps:

  • 1)Create a Request object. This time it’s DeleteIndexRequestthe object
  • 2)Prepare parameters. There is no ginseng here
  • 3)send request. Use deletemethod instead

In the test class hotel-demoin HotelIndexTest, write a unit test to implement deletion of the index:

@Test
void testDeleteHotelIndex() throws IOException {
    
    
    // 1.创建Request对象
    DeleteIndexRequest request = new DeleteIndexRequest("hotel");
    // 2.发送请求
    client.indices().delete(request, RequestOptions.DEFAULT);
}

4.3. Determine whether the index library exists

Determining whether the index library exists is essentially a query. The corresponding DSL is:

GET /hotel

So the flow of deleted Java code is similar. Still three steps:

  • 1)Create Requestobjects. This time it’s GetIndexRequestthe object
  • 2)Prepare parameters. There is no ginseng here
  • 3)send request. Use existsmethod instead
@Test
void testExistsHotelIndex() throws IOException {
    
    
    // 1.创建Request对象
    GetIndexRequest request = new GetIndexRequest("hotel");
    // 2.发送请求
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    // 3.输出
    System.err.println(exists ? "索引库已经存在!" : "索引库不存在!");
}

4.4. Summary

  JavaRestClientThe operation elasticsearchprocess is basically similar. The core is client.indices()the method to obtain the operation object of the index library.

Basic steps for index library operation:

  • initializationRestHighLevelClient
  • Create XxxIndexRequest. XXXYes Create, Get,Delete
  • Preparation DSL(required when creating, others have no parameters)
  • send request. Calling RestHighLevelClient#indices().xxx()method, xxx is create, exists, delete

5. RestClient [java operation documentation]

In order to separate from the index library operation, we again join a test class to do two things:

  • initializationRestHighLevelClient
  • Our hotel data is in the database and needs to be IHotelServicequeried, so we inject this interface
package cn.itcast.hotel;

@SpringBootTest
public class HotelDocumentTest {
    
    
    @Autowired
    private IHotelService hotelService;      //利用对象完成酒店数据的查询;

    private RestHighLevelClient client;

    @BeforeEach
    //客户端初始化创建
    void setUp() {
    
    
        this.client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://192.168.150.101:9200")
        ));
    }

    @AfterEach
    //客户端销毁
    void tearDown() throws IOException {
    
    
        this.client.close();
    }
}

5.1. Add new documents - write database data into the index library

We need to query the hotel data from the database and write it into it elasticsearch.

5.1.1. Index library entity class

The result of the database query is a Hotel type object. The structure is as follows:

@Data
@TableName("tb_hotel")
public class Hotel {
    
    
    @TableId(type = IdType.INPUT)
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String longitude;
    private String latitude;
    private String pic;
}

There are differences with our index library structure:

  • longitudeand latitudeneed to be merged intolocation

Therefore, we need to define a new type that matches the index library structure:
Implement type conversion from database data to index database data;

package cn.itcast.hotel.pojo;

import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@NoArgsConstructor
public class HotelDoc {
    
    
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String location;
    private String pic;

    public HotelDoc(Hotel hotel) {
    
    
        this.id = hotel.getId();
        this.name = hotel.getName();
        this.address = hotel.getAddress();
        this.price = hotel.getPrice();
        this.score = hotel.getScore();
        this.brand = hotel.getBrand();
        this.city = hotel.getCity();
        this.starName = hotel.getStarName();
        this.business = hotel.getBusiness();
        this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
        this.pic = hotel.getPic();
    }
}

5.1.2.Grammar instructions

The DSL statement of the new document is as follows:

POST /{
    
    索引库名}/_doc/1
{
    
    
    "name": "Jack",
    "age": 21
}

The corresponding java code is as shown in the figure:
Insert image description here
you can see that it is similar to creating an index library, and it is also a three-step process:

  • 1) Create a Request object
  • 2) Prepare request parameters, which is the JSON document in DSL
  • 3) Send request

The change is that client.xxx()the API used directly here is no longer needed client.indices().

5.1.3. Complete code

When we import hotel data, the basic process is the same, but there are a few changes that need to be considered:

  • The hotel data comes from the database. We need to query it first and get the hotel object.
  • The hotel object needs to be converted into a HotelDoc object
  • HotelDoc needs to be serialized into json format

Therefore, the overall steps of the code are as follows:

  • 1) Query hotel data Hotel based on id
  • 2) Encapsulate Hotel as HotelDoc
  • 3) Serialize HotelDoc to JSON
  • 4) Create IndexRequest and specify the index library name and id
  • 5) Prepare request parameters, which is the JSON document
  • 6) Send request

In hotel-demothe HotelDocumentTesttest class, write the unit test:

@Test
void testAddDocument() throws IOException {
    
    
    // 1.根据id查询酒店数据
    Hotel hotel = hotelService.getById(61083L);
    // 2.转换为文档类型
    HotelDoc hotelDoc = new HotelDoc(hotel);      //从数据库类型转换成索引库所需要的类型;
    // 3.将HotelDoc转json
    String json = JSON.toJSONString(hotelDoc);   //json的序列化,把对象变成json的风格;

    // 1.准备Request对象
    IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
    // 2.准备Json文档
    request.source(json, XContentType.JSON);
    // 3.发送请求
    client.index(request, RequestOptions.DEFAULT);
}

5.2. Query documents

5.2.1. Grammar description

The query DSL statement is as follows:

GET /hotel/_doc/{
    
    id}

It's very simple, so the code is roughly divided into two steps:

  • Prepare the Request object
  • send request

However, the purpose of the query is to get the result and parse it as HotelDoc, so the difficulty is the parsing of the result. The complete code is as follows: As
Insert image description here
you can see, the result is a JSON, in which the document is placed _sourcein an attribute, so the parsing is to get it _sourceand deserialize it into a Java object.

Similar to before, it is also a three-step process:

  • 1) Prepare the Request object. This time it’s a query, so it’s GetRequest
  • 2) Send a request and get the result. Because it is a query, the client.get() method is called here.
  • 3) The parsing result is to deserialize JSON

5.2.2. Complete code

In hotel-demothe HotelDocumentTesttest class, write the unit test:

@Test
void testGetDocumentById() throws IOException {
    
    
    // 1.准备Request
    GetRequest request = new GetRequest("hotel", "61082");
    // 2.发送请求,得到响应
    GetResponse response = client.get(request, RequestOptions.DEFAULT);
    // 3.解析响应结果
    String json = response.getSourceAsString();

    HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);  //将json反序列化;
    System.out.println(hotelDoc);
}

5.3.Delete documents

The deleted DSL looks like this:

DELETE /hotel/_doc/{
    
    id}

Compared with query, only the request method has DELETEchanged GET. It can be imagined Javathat the code should still take three steps:

  • 1)Prepare Requestthe object, because it is deleted, this time it is DeleteRequestthe object. To specify the index library name andid
  • 2)Prepare parameters, no parameters
  • 3)send request. Because it is deletion, it is client.delete()a method

In hotel-demothe HotelDocumentTesttest class, write the unit test:

@Test
void testDeleteDocument() throws IOException {
    
    
    // 1.准备Request
    DeleteRequest request = new DeleteRequest("hotel", "61083");
    // 2.发送请求
    client.delete(request, RequestOptions.DEFAULT);
}

5.4. Modify documents

5.4.1. Grammar description

We have talked about two ways to modify:

  • Full modification: The essence is to delete based on the ID first and then add it.
  • Incremental modification: Modify the specified field value in the document

In RestClient's API,All modifications are completely consistent with the new API, the judgment is based on ID:

  • If the ID already exists when adding it, modify it.
  • If the ID does not exist when adding, add it

I won’t go into details here, we mainly focus onIncremental modification.
The code example is shown in the figure:
Insert image description here
similar to before, it is also a three-step process:

  • 1)Prepare Requestobjects. This time it is a modification, so it isUpdateRequest
  • 2)Prepare parameters. That is JSON, the document contains the fields to be modified.
  • 3)Update documentation. client.update()Call method here

5.4.2. Complete code

In hotel-demothe HotelDocumentTesttest class, write the unit test:

@Test
void testUpdateDocument() throws IOException {
    
    
    // 1.准备Request
    UpdateRequest request = new UpdateRequest("hotel", "61083");
    // 2.准备请求参数
    request.doc(
        "price", "952",
        "starName", "四钻"
    );
    // 3.发送请求
    client.update(request, RequestOptions.DEFAULT);
}

5.5. Import documents in batches - write database data into the index library in batches

Case requirements: BulkRequestImport database data into the index library in batches.

Proceed as follows:

  • Use mybatis-plus to query hotel data
  • Convert the queried hotel data (Hotel) into document type data (HotelDoc)
  • Use BulkRequest batch processing in JavaRestClient to add new documents in batches

5.5.1. Grammar description

The essence of batch processing BulkRequest is to combine multiple ordinary CRUD requests and send them together.

An add method is provided to add other requests:
Insert image description here
As you can see, the requests that can be added include:

  • IndexRequest, that is, new
  • UpdateRequest, which is to modify
  • DeleteRequest, that is, delete

Therefore, Bulkadding multiple IndexRequestfunctions is a batch of new functions. Example:
Insert image description here

In fact, there are still three steps:

  • 1)Create a Request object. Here is BulkRequest
  • 2)Prepare parameters. The parameters of batch processing are other Request objects, here are multiple IndexRequests
  • 3)Make a request. Here is batch processing, the method called is client.bulk() method

When we import hotel data,Just transform the above code into a for loop.

5.5.2. Complete code

In hotel-demothe HotelDocumentTesttest class, write the unit test:

@Test
void testBulkRequest() throws IOException {
    
    
    // 批量查询酒店数据
    List<Hotel> hotels = hotelService.list();      //查询所有酒店数据;

    // 1.创建Request
    BulkRequest request = new BulkRequest();
    // 2.准备参数,添加多个新增的Request;(文档的转换与请求的创建)
    for (Hotel hotel : hotels) {
    
    
        // 2.1.转换为文档类型HotelDoc
        HotelDoc hotelDoc = new HotelDoc(hotel);
        // 2.2.创建新增文档的Request对象
        request.add(new IndexRequest("hotel")      //指定索引库名
                    .id(hotelDoc.getId().toString())      //指定id;
                    .source(JSON.toJSONString(hotelDoc), XContentType.JSON));      //指定json
    }
    // 3.发送请求
    client.bulk(request, RequestOptions.DEFAULT);
}

Insert image description here

5.6. Summary

Basic steps for document operations:

  • Initialize RestHighLevelClient
  • Create XxxRequest. XXX is Index, Get, Update, Delete, Bulk
  • Prepare parameters (required for Index, Update, and Bulk)
  • send request. Call RestHighLevelClient#.xxx() method, xxx is index, get, update, delete
  • Parse results (required when Get)

Insert image description here

Guess you like

Origin blog.csdn.net/weixin_52223770/article/details/128672612