Stage 8: Advanced Service Framework (Chapter 6: ElasticSearch3)
- Chapter 6: `ElasticSearch`
- Distributed search engine 3
- 0.Learning Objectives
- 1. Data aggregation
- 2.Autocomplete _
- 3. Data synchronization
- 4. Cluster
Chapter Six:ElasticSearch
Distributed search engine 3
0.Learning Objectives
1. Data aggregation
Aggregations allow us to implement extremely convenientlyStatistics, analysis, and calculation of data. For example:
- What brand of mobile phones is the most popular?
- What is the average price, maximum price, and minimum price of these mobile phones?
- What are the monthly sales of these phones?
sql
It is much more convenient to implement these statistical functions than the database , and the query speed is very fast, and real-time search effects can be achieved.
1.1. Types of aggregation
There are three common types of aggregation:
-
Bucket aggregation:Used to group documents and count the number of each group
TermAggregation
:according toDocument field valueGrouping, such as grouping by brand value, grouping by countryDate Histogram
:according todate ladderGrouping, such as one group per week or one group per month
Aggregated fields are not segmented
-
Metric aggregation: used tocalculate some values, such as: maximum value, minimum value, average value, etc.
- Avg: average value
- Max: Find the maximum value
- Min: Find the minimum value
Stats
: Find max, min, avg, sum, etc. at the same time
-
Pipeline aggregation:Aggregate based on the results of other aggregations
Notice: The fields participating in aggregation must be
keyword
, ,日期
,数值
布尔类型
;
(That is, the fields participating in aggregation are all fields that cannot be segmented)
1.2. DSL implements aggregation
Now, we want to count the types of hotel brands in all the data. In fact, we group the data according to the brand. At this time, aggregation can be done based on the name of the hotel brand, that is, Bucket
aggregation.
1.2.1.Bucket
Aggregation syntax (bucket aggregation)
The syntax is as follows:
GET /hotel/_search
{
"size": 0, // 设置size为0,结果中不包含文档,只包含聚合结果
"aggs": {
// 定义聚合
"brandAgg": {
//给聚合起个名字,随便起;
"terms": {
// 聚合的类型,按照字段值聚合,所以选择term,代表TermAggregation,按照文档字段值分组
"field": "brand", // 参与聚合的字段
"size": 20 // 希望获取的聚合结果数量
}
}
}
}
aboveWhat is important is the following: Aggregation name, aggregate type, field value ;
"brandAgg": {
//给聚合起个名字,随便起;
"terms": {
// 聚合的类型,按照字段值聚合,所以选择term
"field": "brand", // 参与聚合的字段
The result is as shown below:
1.2.2.Aggregation result sorting
By default, Bucket
aggregation counts Bucket
the number of documents within, denoted by _count
, and follows_count
Sort descending。
We can specify the order attribute to customize the sorting method of the aggregation:
GET /hotel/_search
{
"size": 0, // 没置size为0,结果中不包含文档,只包含聚合结果
"aggs": {
// 定义聚合
"brandAgg": {
//给聚合起个名字,随便起;
"terms": {
// 聚合的类型,按照字民值聚合,所以选择term,代表TermAggregation,按照文档字段值分组
"field": "brand", // 参与案合的字段
"order": {
"_count": "asc" // 按照_count升序排列
},
"size": 20 // 希望获取的聚合结果数量;
}
}
}
}
1.2.3.Limit aggregation scope
By default, Bucket aggregation aggregates all documents in the index database, but in real scenarios, users will enter search conditions, soAggregation must be an aggregation of search results. SoAggregation must be qualified。
we canLimit the scope of documents to be aggregated, just add query
a condition:
GET /hotel/_search
{
"query": {
"range": {
"price": {
"lte": 200 // 只对200元以下的文档聚合
}
}
},
"size": 0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"size": 20
}
}
}
}
This time, the number of brands aggregated was significantly smaller:
1.2.4.Metric
Aggregation syntax (metric aggregation)
Above, we grouped hotels according to brands, forming buckets. Now we need to do calculations on the hotels in the bucket,Get the user rating of each brand , , equivalent valuesmin
max
avg
。
This requires the use of Metric
aggregation, such as stat aggregation: you can obtain results such min
as, max
, and so on.avg
The syntax is as follows:
GET /hotel/_search
{
"size": 0,
"aggs": {
//定义聚合
"brandAgg": {
//给聚合起个名字,随便起
"terms": {
//terms聚合
"field": "brand",
"size": 20 //希望获取的聚合结果数量
},
"aggs": {
// 是brands聚合的子聚合,也就是分组后对每组分别计算
"score_stats": {
// 聚合名称
"stats": {
// 聚合类型,这里stats可以计算min、max、avg等
"field": "score" // 聚合字段,这里是score
}
}
}
}
}
}
This score_stats
aggregation is a sub-aggregation nestedbrandAgg
within the aggregation . Because we need to calculate it separately in each bucket.
In addition, we can also sort the aggregation results, for example, by the average hotel score in each bucket:
1.2.5.summary
aggs
Represents aggregation, and query同级
, query
what is its role at this time?
- Limit the scope of documents to be aggregated
Three elements are necessary for aggregation:
- Aggregation name
- Aggregation type
- Aggregate fields
Aggregate configurable properties are:
size
:Specify the number of aggregation resultsorder
: Specify the sorting method of aggregation resultsfield
:Specify the aggregation field
1.3. RestAPI implements aggregation
1.3.1.API syntax
Aggregation conditionsandquery
conditionAt the same level, you need to use it request.source()
to specify aggregation conditions.
Syntax of aggregation conditions:
Aggregation results are also different from query results, and the API is also special. But the same JSON is parsed layer by layer:
1.3.2.Business needs
Case: Define methods in IUserService to achieve aggregation of brands, cities, and star ratings.
Requirements: The brand, city, and other information on the search page should not be hard-coded on the page, but obtained by aggregating the hotel data in the index library:
Analysis:
Currently, the city list, star list, and brand list on the page are hard-coded and will not change with the search results. But when the user's search conditions change, the search results will change accordingly.
For example, if a user searches for "Oriental Pearl Tower", the hotel searched must be near the Oriental Pearl Tower in Shanghai. Therefore, the city can only be Shanghai. At this time, the city list should not display information such as Beijing, Shenzhen, and Hangzhou.
That is to say,Which cities are included in the search results, which cities should be listed on the page; which brands are included in the search results, which brands should be listed on the page。
How do I know which brands are included in my search results? How do I know which cities are included in my search results?
Use the aggregation function and Bucket aggregation to group the documents in the search results based on brands and cities, so you can know which brands and which cities are included.。
Because the search results are aggregated, the aggregation isscoped aggregation,In other words, the conditions for aggregation are consistent with the conditions for searching documents.。
Looking at the browser, you can find that the front end has actually issued such a request:
the request parameters are exactly the same as the parameters of the search document .
The return value type is the final result to be displayed on the page:
the result is a Map
structure:
key
It is a string, city, star rating, brand, pricevalue
is a collection, such as the names of multiple cities
1.3.3.Business realization
Add a method to cn.itcast.hotel.web
the package HotelController
with the following requirements:
- Request method:
POST
- Request path:
/hotel/filters
- Request parameters:
RequestParams
, consistent with the parameters for searching documents - Return value type:
Map<String, List<String>>
Code:
@PostMapping("filters")
public Map<String, List<String>> getFilters(@RequestBody RequestParams params){
return hotelService.getFilters(params);
}
The method called here IHotelService中
has getFilters
not yet been implemented.
cn.itcast.hotel.service.IHotelService
Define new method in :
Map<String, List<String>> filters(RequestParams params);
Implement this method in cn.itcast.hotel.service.impl.HotelService
:
@Override
public Map<String, List<String>> filters(RequestParams params) {
try {
// 1.准备Request
SearchRequest request = new SearchRequest("hotel");
// 2.准备DSL
// 2.1.query只限定范围
buildBasicQuery(params, request);
// 2.2.设置size
request.source().size(0);
// 2.3.聚合
buildAggregation(request);
// 3.发出请求
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4.解析结果
Map<String, List<String>> result = new HashMap<>();
Aggregations aggregations = response.getAggregations();
// 4.1.根据品牌名称,获取品牌结果
List<String> brandList = getAggByName(aggregations, "brandAgg");
result.put("品牌", brandList);
// 4.2.根据品牌名称,获取品牌结果
List<String> cityList = getAggByName(aggregations, "cityAgg");
result.put("城市", cityList);
// 4.3.根据品牌名称,获取品牌结果
List<String> starList = getAggByName(aggregations, "starAgg");
result.put("星级", starList);
return result;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private void buildAggregation(SearchRequest request) {
request.source().aggregation(AggregationBuilders
.terms("brandAgg")
.field("brand")
.size(100)
);
request.source().aggregation(AggregationBuilders
.terms("cityAgg")
.field("city")
.size(100)
);
request.source().aggregation(AggregationBuilders
.terms("starAgg")
.field("starName")
.size(100)
);
}
private List<String> getAggByName(Aggregations aggregations, String aggName) {
// 4.1.根据聚合名称获取聚合结果
Terms brandTerms = aggregations.get(aggName);
// 4.2.获取buckets
List<? extends Terms.Bucket> buckets = brandTerms.getBuckets();
// 4.3.遍历
List<String> brandList = new ArrayList<>();
for (Terms.Bucket bucket : buckets) {
// 4.4.获取key
String key = bucket.getKeyAsString();
brandList.add(key);
}
return brandList;
}
2.Autocomplete _
When the user enters a character in the search box, we should prompt search terms related to the character, as shown in the figure:
This function that prompts complete entries based on the letters entered by the user is automatic completion.。
Because it needs to be inferred based on Pinyin letters, the Pinyin word segmentation function is used.
2.1. Pinyin word segmenter
To achieve completion based on letters, the document must be segmented according to Pinyin. There happens to elasticsearch
be . Address: https://github.com/medcl/elasticsearch-analysis-pinyin
The pre-course materials also provide an installation package for the Pinyin word segmenter:
the installation method is the same as the IK word segmenter, which is divided into three steps:
①Unzip
②Upload to the directory elasticsearch
in the virtual machine ③Restart ④Testplugin
elasticsearch
For detailed installation steps, please refer to: Installation process of IK word segmenter .
The test usage is as follows:
POST /_analyze
{
"text": "如家酒店还不错", #要分词的内容;
"analyzer": "pinyin" #分词器
}
result:
2.2. Custom word segmenter
The default Pinyin tokenizer will separate each Chinese character into Pinyin, and what we want is for each entry to form a group of Pinyin, so we need to do some modifications to the Pinyin tokenizer.Personalized customization,formCustom tokenizer。
elasticsearch
The middle word segmenter ( analyzer
) consists of three parts::
character filters
:tokenizer
Process the text before. For example, delete characters, replace characterstokenizer
: Cut the text into terms according to certain rules (term
). For examplekeyword
, there is no word segmentation; andik_smart
tokenizer filter
:tokenizer
Further process the output entries. For example, case conversion, synonym processing, pinyin processing, etc.
When the document is segmented, the document will be processed by these three parts in sequence:
We can configure a custom analyzer (word segmenter) through settings when creating an index library:
The syntax for declaring a custom tokenizer is as follows:
PUT /test //创建名为test的索引库
{
"settings": {
//定义索引库的分词器的
"analysis": {
"analyzer": {
// 自定义分词器
"my_analyzer": {
// 分词器名称
"tokenizer": "ik_max_word",
"filter": "py"
}
},
"filter": {
// 自定义tokenizer filter
"py": {
// 过滤器名称
"type": "pinyin", // 过滤器类型,这里是pinyin
"keep_full_pinyin": false, //解决单个字拼的问题;
"keep_joined_full_pinyin": true, //全拼
"keep_original": true, //保留中文
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
//mappings映射时
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer", //name使用自定义的my_analyzer分词器;analyzer在创建索引时使用
"search_analyzer": "ik_smart" //search_analyzer在搜索索引时使用;
}
}
}
}
Test: (The results include pinyin and Chinese characters)
Summarize:
How to use Pinyin tokenizer?
- ①Download
pinyin
the word segmenter - ②Unzip and place it in
elasticsearch
theplugin
directory - ③Restart
How to customize the tokenizer?
- ① When creating an index library,
settings
configure it in and can include three parts - ②
character filter
- ③
tokenizer
- ④
filter
What should I pay attention to when using Pinyin word segmenter?
- To avoid searching for homophones, use the Pinyin tokenizer when creating the index; do not use the Pinyin tokenizer when searching.
2.3.Autocomplete query
elasticsearch
Completion Suggester query is provided to implement automatic completion function. This query will match and return terms that begin with what the user entered. In order to improve the efficiency of completion queries, there are some constraints on the types of fields in the document:
- The fields participating in the completion query must be
completion
of type。 - The content of the field is generally an array formed by multiple entries used for completion.。
For example, an index library like this:
// 创建索引库
PUT test
{
"mappings": {
"properties": {
"title":{
"type": "completion"
}
}
}
}
Then insert the following data:
// 示例数据
POST test/_doc
{
"title": ["Sony", "WH-1000XM3"]
}
POST test/_doc
{
"title": ["SK-II", "PITERA"]
}
POST test/_doc
{
"title": ["Nintendo", "switch"]
}
The query DSL statement is as follows:
// 自动补全查询
GET /test/_search
{
"suggest": {
"title_suggest": {
//随意起的名称;
"text": "s", // 关键字
"completion": {
//自动补全的类型
"field": "title", // 补全查询的字段
"skip_duplicates": true, // 跳过重复的
"size": 10 // 获取前10条结果
}
}
}
}
summary:
2.4. Implement automatic completion of hotel search box
Now, our hotel
index library has not set up a pinyin word segmenter, and we need to modify the configuration in the index library. but we knowThe index library cannot be modified, it can only be deleted and re-created.。
In addition, we need to add a field for automatic completion, put brand, suggestion, city, etc. into it as an automatic completion prompt.
So, to summarize, weThings that need to be done include:
-
Modify the hotel index database structure and set up a custom pinyin word segmenter
-
Modify the name and all fields of the index library and use a custom word segmenter
-
The index library adds a new field suggestion, whose type is completion type and uses a custom word segmenter.
-
Add a suggestion field to the HotelDoc class, containing brand and business
-
Re-import data into the hotel database
2.4.1.Modify hotel mapping structure
code show as below:
// 酒店数据索引库
PUT /hotel
{
"settings": {
//定义分词器
"analysis": {
"analyzer": {
"text_anlyzer": {
//全文检索使用的分词器
"tokenizer": "ik_max_word",
"filter": "py"
},
"completion_analyzer": {
//自动补全使用的分词器
"tokenizer": "keyword",
"filter": "py"
}
},
"filter": {
"py": {
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "text_anlyzer", //创建索引时使用的分词器
"search_analyzer": "ik_smart", //搜索时的分词器
"copy_to": "all"
},
"address":{
"type": "keyword",
"index": false
},
"price":{
"type": "integer"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword",
"copy_to": "all"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword",
"index": false
},
"all":{
"type": "text",
"analyzer": "text_anlyzer", //创建索引时使用的分词器"text_anlyzer"
"search_analyzer": "ik_smart" //搜索时使用的分词器"ik_smart"
},
"suggestion":{
//自动补全的字段
"type": "completion",
"analyzer": "completion_analyzer" //分词器;
}
}
}
}
2.4.2.Modify HotelDoc entity
HotelDoc
A field needs to be added for automatic completion. The content can be hotel brand, city, business district and other information. According to the requirements of auto-complete fields, it is best to be an array of these fields.
Therefore, we HotelDoc
add a suggestion
field of type List<String>
, and then put information such as brand
, city
, and so on inside.business
code show as below:
package cn.itcast.hotel.pojo;
@Data
@NoArgsConstructor
public class HotelDoc {
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String location;
private String pic;
private Object distance;
private Boolean isAD;
private List<String> suggestion;
public HotelDoc(Hotel hotel) {
this.id = hotel.getId();
this.name = hotel.getName();
this.address = hotel.getAddress();
this.price = hotel.getPrice();
this.score = hotel.getScore();
this.brand = hotel.getBrand();
this.city = hotel.getCity();
this.starName = hotel.getStarName();
this.business = hotel.getBusiness();
this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
this.pic = hotel.getPic();
// 组装suggestion
if(this.business.contains("/")){
// business有多个值,需要切割
String[] arr = this.business.split("/"); //数组;
// 添加元素
this.suggestion = new ArrayList<>();
this.suggestion.add(this.brand);
Collections.addAll(this.suggestion, arr); //批量添加,将数组中的元素一个一个的添加进集合;
}else {
this.suggestion = Arrays.asList(this.brand, this.business); //集合
}
}
}
2.4.3.Reimport
Re-execute the previously written import data function, and you can see that the new hotel data contains suggestions:
2.4.4.JavaAPI for autocomplete queries
Previously, we learned the DSL for automatic query completion, but did not learn the corresponding Java API. Here is an example:
The fields of the above completion query should be written as your own
The results of automatic completion are also quite special. The parsed code is as follows:
2.4.5.Implement automatic completion of search box
Looking at the front-end page, we can find that when we type in the input box, the front-end will initiate ajax
a request:
the return value is a collection of completed terms, of typeList<String>
1) Add a new interface cn.itcast.hotel.web
under the package HotelController
to receive new requests:
@GetMapping("suggestion")
public List<String> getSuggestions(@RequestParam("key") String prefix) {
return hotelService.getSuggestions(prefix);
}
2) Add methods under cn.itcast.hotel.service
the package :IhotelService
List<String> getSuggestions(String prefix);
3) cn.itcast.hotel.service.impl.HotelService
Implement the method in:
@Override
public List<String> getSuggestions(String prefix) {
//prefix是前端传过来的参数,是关键字
try {
// 1.准备Request
SearchRequest request = new SearchRequest("hotel");
// 2.准备DSL
request.source().suggest(new SuggestBuilder().addSuggestion(
"suggestions",
SuggestBuilders.completionSuggestion("suggestion") //补全字段
.prefix(prefix) //前段传过来的参数,是关键字
.skipDuplicates(true)
.size(10)
));
// 3.发起请求
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4.解析结果
Suggest suggest = response.getSuggest();
// 4.1.根据补全查询名称,获取补全结果
CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
// 4.2.获取options
List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
// 4.3.遍历
List<String> list = new ArrayList<>(options.size());
for (CompletionSuggestion.Entry.Option option : options) {
String text = option.getText().toString();
list.add(text);
}
return list;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
3. Data synchronization
elasticsearch
The hotel data in comes from mysql
the database, so mysql
when the data changes, elasticsearch
it must also change. This is the data synchronizationelasticsearch
betweenmysql
。
3.1. Idea analysis
There are three common data synchronization solutions:
- Synchronous call
- Asynchronous notification
- monitor
binlog
3.1.1.Synchronous call
Option 1: Synchronous call
The basic steps are as follows:
hotel-demo
Provides an external interface to modifyelasticsearch
the data in- Hotel management services are being completedDatabase operationsAfter that, directly call
hotel-demo
the provided interface,
3.1.2. Asynchronous notification
Option 2: Asynchronous notification
The process is as follows:
hotel-admin
Aftermysql
adding, deleting, and modifying database data, sendMQ
a messagehotel-demo
Monitor and complete data modificationMQ
after receiving the messageelasticsearch
3.1.3.monitorbinlog
Option 3: Monitoring binlog
process is as follows:
- Turn
mysql
onbinlog
the function mysql
Completed addition, deletion, and modification operations will be recordedbinlog
inhotel-demo
Based oncanal
monitoringbinlog
changes,elasticsearch
the content is updated in real time
3.1.4.Select
Method 1: Synchronous call
- Advantages: simple to implement, crude
- Disadvantages: high degree of business coupling
Method 2: Asynchronous notification
- Advantages: low coupling, average implementation difficulty
- Disadvantages: Rely on the reliability of mq
Method 3: Monitoringbinlog
- Advantages: Completely decoupling services
- Disadvantages: Opening
binlog
increases database burden and high implementation complexity
3.2. Implement data synchronization
Utilize MQ
implementation mysql
and elasticsearch
data synchronization
3.2.1. Ideas
Use the project provided in the pre-course material hotel-admin
as a microservice for hotel management. When hotel data is added, deleted, or modified, the elasticsearch
same operations must be completed for the data in the center.。
step:
- Import the project provided by the pre-course materials
hotel-admin
, start and test the hotel dataCRUD
- declare
exchange
(switch),queue
(queue),RoutingKey
- Complete the message sending in
hotel-admin
the add, delete and modify business in hotel-demo
Complete message monitoring in and updateelasticsearch
data in- Start and test data sync functionality
3.2.2.Import demo
Import the projects provided by the pre-class materials hotel-admin
:
after running, visithttp://localhost:8099
It includes hotel CRUD
functions:
3.2.3. Declare switches and queues
MQ
The structure is as shown in the figure:
In the tutorial, the switch and queue are declared hotel-demo
in the consumer;
1)Introduce dependencies
hotel-admin
Dependencies hotel-demo
introduced in rabbitmq
:
<!--amqp-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-amqp</artifactId>
</dependency>
1)configured amqp
address
Address configured in .yml
file amqp
:
2)Declare queue switch name
Create a new class under the package in hotel-admin
and :hotel-demo
cn.itcast.hotel.constatnts
MqConstants
package cn.itcast.hotel.constatnts;
public class MqConstants {
/**
* 交换机
*/
public final static String HOTEL_EXCHANGE = "hotel.topic";
/**
* 监听新增和修改的队列
*/
public final static String HOTEL_INSERT_QUEUE = "hotel.insert.queue";
/**
* 监听删除的队列
*/
public final static String HOTEL_DELETE_QUEUE = "hotel.delete.queue";
/**
* 新增或修改的RoutingKey
*/
public final static String HOTEL_INSERT_KEY = "hotel.insert";
/**
* 删除的RoutingKey
*/
public final static String HOTEL_DELETE_KEY = "hotel.delete";
}
3)Declare the queue switch (define the switch queue, routingKey binding relationship)
In hotel-demo
, cn.itcast.hotel.config
define the configuration class under the package MqConfig
and declare the queue and switch:
package cn.itcast.hotel.config;
@Configuration
public class MqConfig {
@Bean
public TopicExchange topicExchange(){
//交换机
return new TopicExchange(MqConstants.HOTEL_EXCHANGE, true, false); //true代表持久化
}
@Bean
public Queue insertQueue(){
//增加和修改的队列
return new Queue(MqConstants.HOTEL_INSERT_QUEUE, true);
}
@Bean
public Queue deleteQueue(){
//删除的队列;
return new Queue(MqConstants.HOTEL_DELETE_QUEUE, true);
}
@Bean
public Binding insertQueueBinding(){
//绑定关系;
//insertQueue()队列绑定到topicExchange()交换机,使用MqConstants.HOTEL_INSERT_KEY这个RoutingKey
return BindingBuilder.bind(insertQueue()).to(topicExchange()).with(MqConstants.HOTEL_INSERT_KEY);
}
@Bean
public Binding deleteQueueBinding(){
//绑定关系;
//deleteQueue()队列绑定到topicExchange()交换机,使用MqConstants.HOTEL_DELETE_KEY这个RoutingKey
return BindingBuilder.bind(deleteQueue()).to(topicExchange()).with(MqConstants.HOTEL_DELETE_KEY);
}
}
3.2.4.Send MQ messages
Under the package hotel-demo
of the generalcn.itcast.hotel.constatnts
Declare queue switch nameMqConstants
Copy the class hotel-admin
to cn.itcast.hotel.constatnts
the package in so that there will be no errors when using various names;
also add the 3.2.3
same dependencies as and configure the same amqp
address ;
Send messages respectively in hotel-admin
the add, delete and modify business : (in the class under the package under the project ):MQ
hotel-admin
cn.itcast.hotel.web
HotelController
Inject what is needed to send the message api
:
3.2.5.Receive MQ messages
hotel-demo
MQ
Things to do when receiving a message include:
新增
hotel
Message: According to the passedid
queryhotel
information, then add a piece of data to the index database删除
Message: Delete a piece of data in the index database based on thehotel
passedid
1) First add and delete services in hotel-demo
the cn.itcast.hotel.service
packageIHotelService
void deleteById(Long id);
void insertById(Long id);
2) Implement the business under the hotel-demo
package :cn.itcast.hotel.service.impl
HotelService
@Override
public void deleteById(Long id) {
try {
// 1.准备Request
DeleteRequest request = new DeleteRequest("hotel", id.toString());
// 2.发送请求
client.delete(request, RequestOptions.DEFAULT);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
@Override
public void insertById(Long id) {
try {
// 0.根据id查询酒店数据
Hotel hotel = getById(id);
// 转换为文档类型
HotelDoc hotelDoc = new HotelDoc(hotel);
// 1.准备Request对象
IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
// 2.准备Json文档
request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
// 3.发送请求
client.index(request, RequestOptions.DEFAULT);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
3) Write a listener and add a new class
in hotel-demo
the cn.itcast.hotel.mq
package:
package cn.itcast.hotel.mq;
@Component
public class HotelListener {
@Autowired
private IHotelService hotelService;
/**
* 监听酒店新增或修改的业务
* @param id 酒店id
*/
@RabbitListener(queues = MqConstants.HOTEL_INSERT_QUEUE)
public void listenHotelInsertOrUpdate(Long id){
hotelService.insertById(id);
}
/**
* 监听酒店删除的业务
* @param id 酒店id
*/
@RabbitListener(queues = MqConstants.HOTEL_DELETE_QUEUE)
public void listenHotelDelete(Long id){
hotelService.deleteById(id);
}
}
4. Cluster
When doing data storage on a single machine elasticsearch
, you will inevitably face two problems:Mass data storage problem、Single point of failure problem。
-
Mass data storage problem: Logically split the index library into N shards (
shard
) and store them on multiple nodes
-
Single point of failure problem: Back up sharded data on different nodes (
replica
)
ES cluster related concepts :
-
Cluster: A group of nodes with a common cluster name.
-
Node : an Elasticsearch instance in the cluster
-
Sharding : The index can be split into different parts for storage, called sharding. In a cluster environment, different shards of an index can be split into different nodes
Solve the problem: The amount of data is too large and the storage capacity at a single point is limited.
Here, we divide the data into 3 slices: shard0, shard1, shard2
-
Primary shard (
Primary shard
): relative to the definition of replica shards. -
Replica shards (
Replica shard
) Each primary shard can have one or more replicas, and the data is the same as the primary shard.
Data backup can ensure high availability, but if you back up each shard, the number of nodes required will double, and the cost is too high!
In order to find a balance between high availability and cost, we can do this:
- First, the data is fragmented and stored in different nodes.
- Then back up each shard and put it on the other node to complete mutual backup.
This can greatly reduce the number of service nodes required. As shown in the figure, we take 3 shards and one backup for each shard as an example:
Now, each shard has 1 backup, stored on 3 nodes:
- node0: saves shards 0 and 1
- node1: saves shards 0 and 2
- node2: saved shards 1 and 2
4.1. Build ES cluster
Reference documents for pre-course materials :
Chapter 4:
4.2. Cluster split-brain problem
4.2.1. Division of cluster responsibilities
elasticsearch
Medium cluster nodes have different responsibilities:
By default, any node in the cluster has the above four roles.
But a real cluster must separate cluster responsibilities:
master节点
: High CPU requirements, but low memory requirementsdata节点
: High requirements for both CPU and memorycoordinating节点
: High requirements for network bandwidth and CPU
Separation of duties allows us to allocate different hardware for deployment according to the needs of different nodes. And avoid mutual interference between businesses.
A typical es
division of cluster responsibilities is as follows:
4.2.2.split brain problem
Split-brain is caused by nodes in the cluster being disconnected.
For example, in a cluster, the master node loses contact with other nodes:
at this time, node2
if node3
it is considered node1
to be down, it will re-elect the master:
when node3 is elected, the cluster will continue to provide services to the outside world, node2 and node3 form their own cluster, and node1 forms its own cluster.The data of the two clusters are out of sync and data differences occur.。
When the network is restored, because there are two master nodes in the cluster, the cluster status is inconsistent and a split-brain situation occurs:
Solution to split brainYes, requiredThe number of votes must exceed (number of eligible nodes + 1)/2 to be elected as the leader., so the number of eligible nodes is preferably an odd number. The corresponding configuration item is discovery.zen.minimum_master_nodes, which has become the default configuration after es7.0, so split-brain problems generally do not occur.
For example: for a cluster formed by 3 nodes, the votes must exceed (3 + 1) / 2, which is 2 votes. node3 received votes from node2 and node3 and was elected as the leader. Node1 only has 1 vote of its own and was not elected. There is still only one master node in the cluster, and there is no split brain.
4.2.3. Summary
master eligible
What is the role of nodes?
- Participate in cluster leader election
- The master node can manage cluster status, manage shard information, and handle requests to create and delete index libraries.
data
What is the role of nodes?
- Data CRUD
coordinator
What is the role of nodes?
- Route requests to other nodes
- Combine the query results and return them to the user
4.3. Cluster distributed storage
When a new document is added, it should be saved to different shards to ensure data balance. So coordinating node
how to determine which shard the data should be stored in?
4.3.1. Sharded storage test
Insert three pieces of data:
From the test, you can see that the three pieces of data are in different shards:
result:
4.3.2. Principle of sharded storage
elasticsearch
An algorithm is used hash
to calculate which shard the document should be stored in:
illustrate:
_routing
The default is documentid
- The algorithm is related to the number of shards, so once the index library is created, the number of shards cannot be modified !
The process for adding new documents is as follows:
Interpretation:
- 1) Add a
id=1
new document - 2) Do the operation
id
on ithash
. If the result is 2, it should be stored inshard-2
- 3)
shard-2
The primary shard is onnode3
the node and routes the data tonode3
- 4) Save the document
- 5) Synchronize the given
shard-2
copyreplica-2
onnode2
the node - 6) Return the result to
coordinating-node
the node
4.4. Cluster distributed query
elasticsearch
The query is divided into two stages:
scatter phase
:dispersion stage,coordinating node
(coordinating node) will distribute the request to each shardgather phase
:aggregation stage,coordinating node
summarizedata node
the search results and process them into a final result set returned to the user
summary:
4.5.Cluster failover
The nodes in the cluster master
will monitor the status of the nodes in the cluster. If a node is found to be down, the fragmented data of the down node will be immediately migrated to other nodes to ensure data security. This is calledfailover。
1) For example, a cluster structure is as shown in the figure:
Now, node1
it is the master node, and the other two nodes are slave nodes.
2) Suddenly, node1
a failure occurs:
the first thing after the downtime is to re-elect the master. For example, if you select node2
:
node2
After becoming the master node, the cluster monitoring status will be checked and it will be found that: shard-1
, shard-0
there is no replica node. node1
Therefore, the data above needs to be migrated to node2
:node3
summary: