SpringCloud microservice technology stack. Dark horse follow-up 7
- today's goal
- 1. Data aggregation
- 2. Auto-completion
- 3. Data synchronization
- 4. Cluster
today's goal
1. Data aggregation
** Aggregations ( aggregations ) ** allow us to realize the statistics, analysis and operation of data extremely conveniently. For example:
- What brand of mobile phone is the most popular?
- The average price, the highest price, the lowest price of these phones?
- How are these phones selling monthly?
It is much more convenient to implement these statistical functions than the sql of the database, and the query speed is very fast, which can realize near real-time search effect.
1.1. Types of Aggregation
There are three common types of aggregation:
-
**Bucket** aggregation: used to group documents
- TermAggregation: group by document field value, such as group by brand value, group by country
- Date Histogram: Group by date ladder, for example, a week as a group, or a month as a group
-
**Metric** aggregation: used to calculate some values, such as: maximum value, minimum value, average value, etc.
- Avg: Average
- Max: find the maximum value
- Min: Find the minimum value
- Stats: Simultaneously seek max, min, avg, sum, etc.
-
**Pipeline** Aggregation: Aggregation based on the results of other aggregations
**Note:** The fields participating in the aggregation must be keyword, date, value, and Boolean
1.2.DSL realizes aggregation
Now, we want to count the hotel brands in all the data. In fact, we group the data according to the brand. At this point, aggregation can be done based on the name of the hotel brand, that is, Bucket aggregation. ### 1.2.1. Bucket aggregation syntax
The syntax is as follows:
GET /hotel/_search
{
"size": 0, // 设置size为0,结果中不包含文档,只包含聚合结果
"aggs": {
// 定义聚合
"brandAgg": {
//给聚合起个名字
"terms": {
// 聚合的类型,按照品牌值聚合,所以选择term
"field": "brand", // 参与聚合的字段
"size": 20 // 希望获取的聚合结果数量
}
}
}
}
The result is shown in the figure:
1.2.2. Aggregation result sorting
By default, Bucket aggregation will count the number of documents in the Bucket, record it as _count, and sort in descending order of _count.
We can specify the order attribute to customize the sorting method of the aggregation:
GET /hotel/_search
{
"size": 0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"order": {
"_count": "asc" // 按照_count升序排列
},
"size": 20
}
}
}
}
1.2.3. Limit the scope of aggregation
By default, Bucket aggregation aggregates all documents in the index library, but in real scenarios, users will enter search conditions, so the aggregation must be the aggregation of search results. Then the aggregation has to be qualified.
We can limit the range of documents to be aggregated by adding query conditions:
GET /hotel/_search
{
"query": {
"range": {
"price": {
"lte": 200 // 只对200元以下的文档聚合
}
}
},
"size": 0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"size": 20
}
}
}
}
This time, the aggregated brands are significantly less:
1.2.4. Metric aggregation syntax
In the last class, we grouped hotels by brand to form buckets. Now we need to perform calculations on the hotels in the bucket to obtain the min, max, and avg values of the user ratings for each brand.
This requires the use of Metric aggregation, such as stat aggregation: you can get results such as min, max, and avg.
The syntax is as follows:
GET /hotel/_search
{
"size": 0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"size": 20
},
"aggs": {
// 是brands聚合的子聚合,也就是分组后对每组分别计算
"score_stats": {
// 聚合名称
"stats": {
// 聚合类型,这里stats可以计算min、max、avg等
"field": "score" // 聚合字段,这里是score
}
}
}
}
}
}
This time the score_stats aggregation is a sub-aggregation nested inside the brandAgg aggregation. Because we need to calculate separately in each bucket.
In addition, we can also sort the aggregation results, for example, according to the average hotel score of each bucket:
1.2.5. Summary
aggs stands for aggregation, which is at the same level as query. What is the function of query at this time?
-
Limit the scope of aggregated documents
Aggregation must have three elements: -
aggregate name
-
aggregation type
-
aggregate field
Aggregate configurable properties are:
- size: specify the number of aggregation results
- order: specify the sorting method of aggregation results
- field: specify the aggregation field
1.3. RestAPI implements aggregation
1.3.1. API Syntax
Aggregation conditions are at the same level as query conditions, so request.source() needs to be used to specify aggregation conditions.
Syntax for aggregate conditions:
The aggregation result is also different from the query result, and the API is also special. But it is also JSON layer by layer analysis:
the final code
HotelSearchTest.java
@Test
public void testAggregation() throws IOException {
// 1.准备request
SearchRequest searchRequest = new SearchRequest("hotel");
// 2.准备DSL
searchRequest.source().size(0);
searchRequest.source().aggregation(AggregationBuilders.terms("brandAgg").field("brand").size(10).order(BucketOrder.aggregation("_count", true)));
// 3.发出请求
SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
// 4.给出结果
//System.out.println(response);
Aggregations aggregations = response.getAggregations();
Terms brandTerms = aggregations.get("brandAgg");
List<? extends Terms.Bucket> buckets = brandTerms.getBuckets();
for (Terms.Bucket bucket : buckets) {
String brandName = bucket.getKeyAsString();
System.out.println(brandName);
}
}
Output result:
1.3.2. Business requirements
Requirement: The brand, city and other information of the search page should not be hard-coded on the page, but obtained by aggregated hotel data in the index library:
analyze:
At present, the city list, star list, and brand list on the page are all hard-coded, and will not change as the search results change. But when the user's search conditions change, the search results will change accordingly.
For example, if a user searches for "Oriental Pearl", the searched hotel must be near the Shanghai Oriental Pearl Tower. Therefore, the city can only be Shanghai. At this time, Beijing, Shenzhen, and Hangzhou should not be displayed in the city list.
That is to say, which cities are included in the search results, which cities should be listed on the page; which brands are included in the search results, which brands should be listed on the page.
How do I know which brands are included in my search results? How do I know which cities are included in my search results?
Use the aggregation function and Bucket aggregation to group the documents in the search results based on brands and cities, and you can know which brands and cities are included.
Because it is an aggregation of search results, the aggregation is a limited-range aggregation , that is to say, the limiting conditions of the aggregation are consistent with the conditions of the search document.
Looking at the browser, we can find that the front end has actually sent such a request:
The request parameters are exactly the same as those for searching documents .
The return value type is the final result to be displayed on the page:
The result is a Map structure:
- key is a string, city, star, brand, price
- value is a collection, such as the names of multiple cities
1.3.3. Business Realization
Add a method to cn.itcast.hotel.web
the package HotelController
, following the requirements:
- Request method:
POST
- Request path:
/hotel/filters
- Request parameters:
RequestParams
, consistent with the parameters of the search document - Return value type:
Map<String, List<String>>
code:
@PostMapping("filters")
public Map<String, List<String>> getFilters(@RequestBody RequestParams params){
return hotelService.getFilters(params);
}
The getFilters method in IHotelService is called here, which has not been implemented yet. Define the new method in
:cn.itcast.hotel.service.IHotelService
Map<String, List<String>> filters(RequestParams params);
cn.itcast.hotel.service.impl.HotelService
Implement the method in :
@Override
public Map<String, List<String>> filters(RequestParams params) {
try {
// 1.准备Request
SearchRequest request = new SearchRequest("hotel");
// 2.准备DSL
// 2.1.query
buildBasicQuery(params, request);
// 2.2.设置size
request.source().size(0);
// 2.3.聚合
buildAggregation(request);
// 3.发出请求
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4.解析结果
Map<String, List<String>> result = new HashMap<>();
Aggregations aggregations = response.getAggregations();
// 4.1.根据品牌名称,获取品牌结果
List<String> brandList = getAggByName(aggregations, "brandAgg");
result.put("brand", brandList);
// 4.2.根据品牌名称,获取品牌结果
List<String> cityList = getAggByName(aggregations, "cityAgg");
result.put("city", cityList);
// 4.3.根据品牌名称,获取品牌结果
List<String> starList = getAggByName(aggregations, "starAgg");
result.put("starName", starList);
return result;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private void buildAggregation(SearchRequest request) {
request.source().aggregation(AggregationBuilders
.terms("brandAgg")
.field("brand")
.size(100)
);
request.source().aggregation(AggregationBuilders
.terms("cityAgg")
.field("city")
.size(100)
);
request.source().aggregation(AggregationBuilders
.terms("starAgg")
.field("starName")
.size(100)
);
}
private List<String> getAggByName(Aggregations aggregations, String aggName) {
// 4.1.根据聚合名称获取聚合结果
Terms brandTerms = aggregations.get(aggName);
// 4.2.获取buckets
List<? extends Terms.Bucket> buckets = brandTerms.getBuckets();
// 4.3.遍历
List<String> brandList = new ArrayList<>();
for (Terms.Bucket bucket : buckets) {
// 4.4.获取key
String key = bucket.getKeyAsString();
brandList.add(key);
}
return brandList;
}
View Results:
2. Auto-completion
When the user enters a character in the search box, we should prompt the search item related to the character, as shown in the figure:
This function of prompting the complete entry according to the letter entered by the user is automatic completion.
Because it needs to be inferred based on the pinyin letters, the pinyin word segmentation function is used.
2.1. Pinyin tokenizer
To achieve completion based on letters, it is necessary to segment the document according to pinyin. There happens to be a pinyin word segmentation plugin for elasticsearch on GitHub. Address: Pinyin word breaker plug-
in The installation package of the Pinyin word breaker is also provided in the pre-class materials: the
installation method is the same as the IK word breaker, in three steps:
①Decompress
②Upload to the virtual machine, the plugin directory of elasticsearch Directory
location:
/var/lib/docker/volumes/es-plugins/_data
③Restart elasticsearch
docker restart es
④Test For
detailed installation steps, please refer to the installation process of the IK tokenizer.
The test usage is as follows:
POST /_analyze
{
"text": "如家酒店还不错",
"analyzer": "pinyin"
}
Result:
You can see that there is a problem with the Pinyin word segmenter
1. Only Pinyin has no Chinese characters, and Pinyin should be the icing on the cake, not just Pinyin
2. Pinyin does not implement word segmentation, but the full name
Based on the above problems, we need to customize the pinyin word breaker
2.2. Custom tokenizer
The default pinyin word breaker divides each Chinese character into pinyin, but we want each entry to form a set of pinyin, so we need to customize the pinyin word breaker to form a custom word breaker.
The composition of the analyzer in elasticsearch consists of three parts:
- Character filters: Process the text before the tokenizer. e.g. delete characters, replace characters
- tokenizer: Cut the text into terms according to certain rules. For example, keyword is not participle; there is also ik_smart
- tokenizer filter: further process the entries output by the tokenizer. For example, case conversion, synonyms processing, pinyin processing, etc.
These three parts will process the document in turn when segmenting the document:
the syntax for declaring a custom tokenizer is as follows:
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
// 自定义分词器
"my_analyzer": {
// 分词器名称
"tokenizer": "ik_max_word",
"filter": "py"
}
},
"filter": {
// 自定义tokenizer filter
"py": {
// 过滤器名称
"type": "pinyin", // 过滤器类型,这里是pinyin
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
}
}
}
}
Test:
Another way to test:
# 测试分词器
POST /test/_doc/1
{
"id" : 1,
"name":"狮子"
}
POST /test/_doc/2
{
"id" : 2,
"name":"虱子"
}
GET /test/_search
{
"query": {
"match": {
"name": "shizi"
}
}
}
Test result:
We searched for Chinese characters and found lice, which is obviously wrong
Summarize:
How to use Pinyin tokenizer?
-
①Download the pinyin tokenizer
-
② Unzip and put it in the plugin directory of elasticsearch
-
③Restart
How to customize the tokenizer?
-
① When creating an index library, configure it in settings, which can contain three parts
-
②character filter
-
③tokenizer
-
④filter
Precautions for pinyin word breaker?
- In order to avoid searching for homophones, do not use the pinyin word breaker when searching.
Solution: add search_analyzer
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
// 自定义分词器
"my_analyzer": {
// 分词器名称
"tokenizer": "ik_max_word",
"filter": "py"
}
},
"filter": {
// 自定义tokenizer filter
"py": {
// 过滤器名称
"type": "pinyin", // 过滤器类型,这里是pinyin
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "ik_smart"
}
}
}
}
Search here again
2.3. Autocomplete query
Elasticsearch provides Completion Suggester query to achieve automatic completion. This query will match terms beginning with the user input and return them. In order to improve the efficiency of the completion query, there are some constraints on the types of fields in the document:
- The fields participating in the completion query must be of completion type.
- The content of the field is generally an array formed by multiple entries for completion.
You can delete the previously tested index library
DELETE /test
For example, an index library like this:
// 创建索引库
PUT test
{
"mappings": {
"properties": {
"title":{
"type": "completion"
}
}
}
}
Then insert the following data:
// 示例数据
POST test/_doc
{
"title": ["Sony", "WH-1000XM3"]
}
POST test/_doc
{
"title": ["SK-II", "PITERA"]
}
POST test/_doc
{
"title": ["Nintendo", "switch"]
}
The query DSL statement is as follows:
// 自动补全查询
GET /test/_search
{
"suggest": {
"title_suggest": {
"text": "s", // 关键字
"completion": {
"field": "title", // 补全查询的字段
"skip_duplicates": true, // 跳过重复的
"size": 10 // 获取前10条结果
}
}
}
}
Display after query:
Summary:
Auto-completion requirements for fields:
● The type is completion type
● The field value is an array of multiple entries
2.4. Realize automatic completion of hotel search box
Now, our hotel index library has not set up a pinyin word breaker, and we need to modify the configuration in the index library. But we know that the index library cannot be modified, it can only be deleted and then recreated.
In addition, we need to add a field for auto-completion, and put the brand, suggestion, city, etc. into it as a prompt for auto-completion.
So, to summarize, the things we need to do include:
- Modify the structure of the hotel index library and set a custom pinyin word breaker
- Modify the name and all fields of the index library and use a custom tokenizer
- The index library adds a new field suggestion, the type is completion type, using a custom tokenizer
- Add a suggestion field to the HotelDoc class, which contains brand and business
- Re-import data to the hotel library
2.4.1. Modify the hotel mapping structure
The code is as follows:
first delete the previous index library
DELETE /hotel
# 酒店数据索引库
PUT /hotel
{
"settings": {
"analysis": {
"analyzer": {
"text_anlyzer": {
"tokenizer": "ik_max_word",
"filter": "py"
},
"completion_analyzer": {
"tokenizer": "keyword",
"filter": "py"
}
},
"filter": {
"py": {
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "text_anlyzer",
"search_analyzer": "ik_smart",
"copy_to": "all"
},
"address":{
"type": "keyword",
"index": false
},
"price":{
"type": "integer"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword",
"copy_to": "all"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword",
"index": false
},
"all":{
"type": "text",
"analyzer": "text_anlyzer",
"search_analyzer": "ik_smart"
},
"suggestion":{
"type": "completion",
"analyzer": "completion_analyzer"
}
}
}
}
2.4.2. Modify the HotelDoc entity
A field needs to be added in HotelDoc for automatic completion, and the content can be information such as hotel brand, city, business district, etc. As required for autocomplete fields, preferably an array of these fields.
So we add a suggestion field in HotelDoc, the type is List<String>
, and then put information such as brand, city, business, etc. into it.
code show as below:
package cn.itcast.hotel.pojo;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
@Data
@NoArgsConstructor
public class HotelDoc {
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String location;
private String pic;
private Object distance;
private Boolean isAD;
private List<String> suggestion;
public HotelDoc(Hotel hotel) {
this.id = hotel.getId();
this.name = hotel.getName();
this.address = hotel.getAddress();
this.price = hotel.getPrice();
this.score = hotel.getScore();
this.brand = hotel.getBrand();
this.city = hotel.getCity();
this.starName = hotel.getStarName();
this.business = hotel.getBusiness();
this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
this.pic = hotel.getPic();
this.suggestion = Arrays.asList(this.brand, this.business);
}
}
2.4.3. Reimport
Imported by the previous unit test batch
Re-execute the import data function written before, and you can see that the new hotel data contains suggestion: but if it contains 2 business districts, it will be a comma, then we need to split the comma
Modify the entity class HotelDoc.java
to increase the division of commas
package cn.itcast.hotel.pojo;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
@Data
@NoArgsConstructor
public class HotelDoc {
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String location;
private String pic;
private Object distance;
private Boolean isAD;
private List<String> suggestion;
public HotelDoc(Hotel hotel) {
this.id = hotel.getId();
this.name = hotel.getName();
this.address = hotel.getAddress();
this.price = hotel.getPrice();
this.score = hotel.getScore();
this.brand = hotel.getBrand();
this.city = hotel.getCity();
this.starName = hotel.getStarName();
this.business = hotel.getBusiness();
this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
this.pic = hotel.getPic();
// 组装suggestion
if(this.business.contains("/")){
// business有多个值,需要切割
String[] arr = this.business.split("、");
// 添加元素
this.suggestion = new ArrayList<>();
this.suggestion.add(this.brand);
this.suggestion.add(this.city);
Collections.addAll(this.suggestion, arr);
}else {
this.suggestion = Arrays.asList(this.brand, this.business, this.city);
}
}
}
Then check the results:
Also test the auto-completion
# 测试提示查询
GET /hotel/_search
{
"suggest": {
"suggestions": {
"text": "sd",
"completion": {
"field": "suggestion",
"skip_duplicates" : true,
"size" : 10
}
}
}
}
Query result:
all start with SD
2.4.4. Java API for auto-completion query
Before, we learned the DSL of auto-completion query, but did not learn the corresponding Java API. Here is an example: the
result of auto-completion is also special, and the parsing code is as follows:
Let's first write a test class to test
and modify HotelSearchTest. java
@Test
public void testSuggestionsSearch() throws IOException {
// 1.准备SearchRequest
SearchRequest searchRequest = new SearchRequest("hotel");
// 2.准备DSL
searchRequest.source().suggest(new SuggestBuilder().addSuggestion("suggestions",
SuggestBuilders.completionSuggestion("suggestion")
.prefix("sd").skipDuplicates(true).size(10)));
// 3.发送请求
SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
// 4.解析结果
Suggest suggest = response.getSuggest();
CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
List<String> list = new ArrayList<>(options.size());
for (CompletionSuggestion.Entry.Option option : options) {
String text = option.getText().toString();
list.add(text);
}
System.out.println(list);
}
search result:
2.4.5. Realize the automatic completion of the search box
Looking at the front-end page, we can find that when we type in the input box, the front-end will initiate an ajax request:
the return value is a collection of completed entries, and the type isList<String>
1) Add a new interface cn.itcast.hotel.web
under the package HotelController
to receive new requests:
@GetMapping("suggestion")
public List<String> getSuggestions(@RequestParam("key") String prefix) {
return hotelService.getSuggestions(prefix);
}
2) Add the method in cn.itcast.hotel.service
the package :IhotelService
List<String> getSuggestions(String prefix);
3) cn.itcast.hotel.service.impl.HotelService
Implement the method in:
@Override
public List<String> getSuggestions(String prefix) {
try {
// 1.准备Request
SearchRequest request = new SearchRequest("hotel");
// 2.准备DSL
request.source().suggest(new SuggestBuilder().addSuggestion(
"suggestions",
SuggestBuilders.completionSuggestion("suggestion")
.prefix(prefix)
.skipDuplicates(true)
.size(10)
));
// 3.发起请求
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4.解析结果
Suggest suggest = response.getSuggest();
// 4.1.根据补全查询名称,获取补全结果
CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
// 4.2.获取options
List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
// 4.3.遍历
List<String> list = new ArrayList<>(options.size());
for (CompletionSuggestion.Entry.Option option : options) {
String text = option.getText().toString();
list.add(text);
}
return list;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
search result:
3. Data synchronization
The hotel data in elasticsearch comes from the mysql database, so when the mysql data changes, the elasticsearch must also change accordingly. This is the data synchronization between elasticsearch and mysql .
3.1. Thinking analysis
There are three common data synchronization schemes:
- synchronous call
- asynchronous notification
- monitor binlog
3.1.1. Synchronous call
Solution 1: Synchronous call
The basic steps are as follows:
- hotel-demo provides an interface to modify the data in elasticsearch
- After the hotel management service completes the database operation, it directly calls the interface provided by hotel-demo,
3.1.2. Asynchronous notification
Solution 2: Asynchronous notification
The process is as follows:
- Hotel-admin sends MQ message after adding, deleting and modifying mysql database data
- Hotel-demo listens to MQ and completes elasticsearch data modification after receiving the message
3.1.3. Monitor binlog
Solution 3:
The process of monitoring binlog is as follows:
- Enable the binlog function for mysql
- The addition, deletion, and modification operations of mysql will be recorded in the binlog
- Hotel-demo monitors binlog changes based on canal, and updates the content in elasticsearch in real time
3.1.4. Selection
Method 1: Synchronous call
- Advantages: simple to implement, rough
- Disadvantages: high degree of business coupling
Method 2: Asynchronous notification
- Advantages: low coupling, generally difficult to implement
- Disadvantages: rely on the reliability of mq
Method 3: Monitor binlog
- Advantages: Complete decoupling between services
- Disadvantages: Enabling binlog increases database burden and high implementation complexity
3.2. Realize data synchronization
3.2.1. Ideas
Use the hotel-admin project provided by the pre-class materials as a microservice for hotel management. When the hotel data is added, deleted, or modified, the same operation is required for the data in elasticsearch.
step:
- Import the hotel-admin project provided by the pre-course materials, start and test the CRUD of hotel data
- Declare exchange, queue, RoutingKey
- Complete message sending in the add, delete, and change business in hotel-admin
- Complete message monitoring in hotel-demo and update data in elasticsearch
- Start and test the data sync function
3.2.2. Import demo
Import the hotel-admin project provided by the pre-course materials:
After running, visit http://localhost:8099
, which contains the CRUD function of the hotel:
### 3.2.3. Declare the switch and queue
MQ structure as shown in the figure:
start mq
docker start mq
1) Introduce dependencies
Introduce the dependency of rabbitmq in hotel-admin and hotel-demo:
<!--amqp-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-amqp</artifactId>
</dependency>
2) Declare the queue switch name
cn.itcast.hotel.constatnts
Create a new class under the packages in hotel-admin and hotel-demo MqConstants
:
package cn.itcast.hotel.constatnts;
public class MqConstants {
/**
* 交换机
*/
public final static String HOTEL_EXCHANGE = "hotel.topic";
/**
* 监听新增和修改的队列
*/
public final static String HOTEL_INSERT_QUEUE = "hotel.insert.queue";
/**
* 监听删除的队列
*/
public final static String HOTEL_DELETE_QUEUE = "hotel.delete.queue";
/**
* 新增或修改的RoutingKey
*/
public final static String HOTEL_INSERT_KEY = "hotel.insert";
/**
* 删除的RoutingKey
*/
public final static String HOTEL_DELETE_KEY = "hotel.delete";
}
3) Declare a queue switch
Define configuration classes in hotel-demo and hotel-admin respectively, and declare queues and switches:
package cn.itcast.hotel.config;
import cn.itcast.hotel.constants.MqConstants;
import org.springframework.amqp.core.Binding;
import org.springframework.amqp.core.BindingBuilder;
import org.springframework.amqp.core.Queue;
import org.springframework.amqp.core.TopicExchange;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class MqConfig {
@Bean
public TopicExchange topicExchange(){
return new TopicExchange(MqConstants.HOTEL_EXCHANGE, true, false);
}
@Bean
public Queue insertQueue(){
return new Queue(MqConstants.HOTEL_INSERT_QUEUE, true);
}
@Bean
public Queue deleteQueue(){
return new Queue(MqConstants.HOTEL_DELETE_QUEUE, true);
}
@Bean
public Binding insertQueueBinding(){
return BindingBuilder.bind(insertQueue()).to(topicExchange()).with(MqConstants.HOTEL_INSERT_KEY);
}
@Bean
public Binding deleteQueueBinding(){
return BindingBuilder.bind(deleteQueue()).to(topicExchange()).with(MqConstants.HOTEL_DELETE_KEY);
}
}
3.2.4. Send MQ message
Send MQ messages separately in the add, delete, and modify services in hotel-admin:
the code is as follows:
@PostMapping
public void saveHotel(@RequestBody Hotel hotel) {
hotelService.save(hotel);
rabbitTemplate.convertAndSend(MqConstant.HOTEL_EXCHANGE, MqConstant.HOTEL_INSERT_KEY, hotel.getId());
}
@PutMapping()
public void updateById(@RequestBody Hotel hotel) {
if (hotel.getId() == null) {
throw new InvalidParameterException("id不能为空");
}
hotelService.updateById(hotel);
rabbitTemplate.convertAndSend(MqConstant.HOTEL_EXCHANGE, MqConstant.HOTEL_INSERT_KEY, hotel.getId());
}
@DeleteMapping("/{id}")
public void deleteById(@PathVariable("id") Long id) {
hotelService.removeById(id);
rabbitTemplate.convertAndSend(MqConstant.HOTEL_EXCHANGE, MqConstant.HOTEL_DELETE_KEY, id);
}
3.2.5. Receive MQ message
Things to do when hotel-demo receives MQ messages include:
- New message: Query hotel information according to the passed hotel id, and then add a piece of data to the index library
- Delete message: Delete a piece of data in the index library according to the passed hotel id
1) First, add new and delete services under cn.itcast.hotel.service
the package of hotel-demoIHotelService
void deleteById(Long id);
void insertById(Long id);
2) cn.itcast.hotel.service.impl
Implement business in HotelService under the package in hotel-demo:
@Override
public void deleteById(Long id) {
try {
// 1.准备Request
DeleteRequest request = new DeleteRequest("hotel", id.toString());
// 2.发送请求
client.delete(request, RequestOptions.DEFAULT);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
@Override
public void insertById(Long id) {
try {
// 0.根据id查询酒店数据
Hotel hotel = getById(id);
// 转换为文档类型
HotelDoc hotelDoc = new HotelDoc(hotel);
// 1.准备Request对象
IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
// 2.准备Json文档
request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
// 3.发送请求
client.index(request, RequestOptions.DEFAULT);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
3) Write a listener
Add a new class to the package in hotel-demo cn.itcast.hotel.mq
:
package cn.itcast.hotel.mq;
import cn.itcast.hotel.constants.MqConstants;
import cn.itcast.hotel.service.IHotelService;
import org.springframework.amqp.rabbit.annotation.RabbitListener;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
@Component
public class HotelListener {
@Autowired
private IHotelService hotelService;
/**
* 监听酒店新增或修改的业务
* @param id 酒店id
*/
@RabbitListener(queues = MqConstants.HOTEL_INSERT_QUEUE)
public void listenHotelInsertOrUpdate(Long id){
hotelService.insertById(id);
}
/**
* 监听酒店删除的业务
* @param id 酒店id
*/
@RabbitListener(queues = MqConstants.HOTEL_DELETE_QUEUE)
public void listenHotelDelete(Long id){
hotelService.deleteById(id);
}
}
Start SpringBoot first, check the mq client, and you can see the switch
You can see that the binding relationship of the queue is as follows:
Let's test the function:
download the browser plug-in of Vue.js, click Expand
to add a new extension
Search for Vue, download Vue.js Devtools
first check the hotel id
, then we go to the hotel management to change the price to 334
Then we went to the management page of MQ to see if there was any message sent, and found that there was indeed 1 message
Look at the hotel query page, the modification is indeed successful
Test the deletion, let’s delete the Shanghai Hilton Hotel, first copy the information of Vue,
and then go to the hotel management to delete the Hilton Hotel.
After deletion, we check the MQ message interface and find that there is a new deleted message.
We search the hotel and find Hilton It is indeed gone, (originally 13 items)
and then we will add it back to Hilton, check it out,
copy the pasted value just now and
add it successfully
After the addition, we checked the management of MQ and found that 1 new message was added.
Finally, we checked the hotel search and the addition was successful.
4. Cluster
Stand-alone elasticsearch for data storage will inevitably face two problems: massive data storage and single point of failure.
- Massive data storage problem: Logically split the index library into N shards (shards) and store them in multiple nodes
- Single point of failure problem: back up fragmented data on different nodes (replica)
ES cluster related concepts :
-
Cluster (cluster): A group of nodes with a common cluster name.
-
Node (node) : an Elasticearch instance in the cluster
-
Shard : Indexes can be split into different parts for storage, called shards. In a cluster environment, different shards of an index can be split into different nodes
Solve the problem: the amount of data is too large and the storage capacity of a single point is limited.
Here, we divide the data into 3 pieces: shard0, shard1, shard2
-
Primary shard (Primary shard): relative to the definition of replica shards.
-
Replica shard (Replica shard) Each primary shard can have one or more copies, and the data is the same as the primary shard.
Data backup can ensure high availability, but if each shard is backed up, the number of nodes required will double, and the cost is too high!
In order to find a balance between high availability and cost, we can do this:
- First shard the data and store it in different nodes
- Then back up each shard and put it on the other node to complete mutual backup
This can greatly reduce the number of required service nodes. As shown in the figure, we take 3 shards and each shard as a backup copy as an example:
Now, each shard has 1 backup, stored on 3 nodes:
- node0: holds shards 0 and 1
- node1: holds shards 0 and 2
- node2: saved shards 1 and 2
4.1. Building an ES cluster
Refer to the documentation of the pre-class materials:
the fourth chapter of it:
1. Deploy the es cluster
We will use the docker container to run multiple es instances on a single machine to simulate the es cluster. However, in the production environment, it is recommended that you only deploy one es instance on each service node.
Deploying an es cluster can be done directly using docker-compose, but this requires your Linux virtual machineAt least 4G of memory space
1.1. Create es cluster
First write a docker-compose file with the following content:
version: '2.2'
services:
es01:
image: elasticsearch:7.12.1
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- data01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
es02:
image: elasticsearch:7.12.1
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- data02:/usr/share/elasticsearch/data
ports:
- 9201:9200
networks:
- elastic
es03:
image: elasticsearch:7.12.1
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
- data03:/usr/share/elasticsearch/data
networks:
- elastic
ports:
- 9202:9200
volumes:
data01:
driver: local
data02:
driver: local
data03:
driver: local
networks:
elastic:
driver: bridge
es operation needs to modify some linux system permissions, modify /etc/sysctl.conf
files
vi /etc/sysctl.conf
Add the following content:
vm.max_map_count=262144
Then execute the command to make the configuration take effect:
sysctl -p
Start the cluster via docker-compose:
docker-compose up -d
View container status
docker ps
1.2. Cluster status monitoring
Kibana can monitor es clusters, but the new version needs to rely on the x-pack function of es, and the configuration is more complicated.
It is recommended to use cerebro to monitor the status of es cluster, official website: https://github.com/lmenezes/cerebro
The pre-class materials have provided the installation package:
it can be used after decompression, which is very convenient.
The decompressed directory is as follows:
Enter the corresponding bin directory:
Double-click the cerebro.bat file to start the service.
Visit http://localhost:9000 to enter the management interface:
Enter the address and port of any node of your elasticsearch, and click connect:
A green bar indicates that the cluster is green (healthy).
1.3. Create an index library
1) Use kibana's DevTools to create an index library
Enter the command in DevTools:
PUT /itcast
{
"settings": {
"number_of_shards": 3, // 分片数量
"number_of_replicas": 1 // 副本数量
},
"mappings": {
"properties": {
// mapping映射定义 ...
}
}
}
2) Use cerebro to create an index library
You can also create an index library with cerebro:
Fill in the index library information:
Click the create button in the lower right corner:
1.4. View fragmentation effect
Go back to the home page, and you can view the fragmentation effect of the index library:
4.2. Cluster split-brain problem
4.2.1. Division of Cluster Responsibilities
Cluster nodes in elasticsearch have different responsibilities:
By default , any node in the cluster has the above four roles at the same time .
But a real cluster must separate cluster responsibilities:
- master node: high CPU requirements, but memory requirements
- data node: high requirements for CPU and memory
- Coordinating nodes: High requirements for network bandwidth and CPU
Separation of duties allows us to allocate different hardware for deployment according to the needs of different nodes. And avoid mutual interference between services.
A typical es cluster responsibility division is shown in the figure:
4.2.2. Split brain problem
A split-brain is caused by the disconnection of nodes in the cluster.
For example, in a cluster, the master node loses connection with other nodes:
at this time, node2 and node3 think that node1 is down, and they will re-elect the master:
After node3 is elected, the cluster continues to provide external services. Node2 and node3 form a cluster, and node1 forms a cluster. The data of the two clusters is not synchronized, and data differences occur.
When the network is restored, because there are two master nodes in the cluster, the status of the cluster is inconsistent, and a split-brain situation occurs:
The solution to split-brain is to require votes to exceed (number of eligible nodes + 1)/2 to be elected as the master, so the number of eligible nodes should preferably be an odd number. The corresponding configuration item is discovery.zen.minimum_master_nodes, which has become the default configuration after es7.0, so the problem of split brain generally does not occur
For example: for a cluster formed by 3 nodes, the votes must exceed (3 + 1) / 2, which is 2 votes. node3 gets the votes of node2 and node3, and is elected as the master. node1 has only 1 vote for itself and was not elected. There is still only one master node in the cluster, and there is no split brain.
4.2.3. Summary
What is the role of the master eligible node?
-
Participate in group election
-
The master node can manage the cluster state, manage sharding information, and process requests to create and delete index libraries.
What is the role of the data node? -
CRUD of data
What is the role of the coordinator node?
-
Route requests to other nodes
-
Combine the query results and return them to the user
4.3. Cluster distributed storage
When a new document is added, it should be saved in different shards to ensure data balance, so how does the coordinating node determine which shard the data should be stored in?
4.3.1. Shard storage test
Insert three pieces of data:
You can see from the test that the three pieces of data are in different shards:
result:
4.3.2. Shard storage principle
Elasticsearch will use the hash algorithm to calculate which shard the document should be stored in:
illustrate:
- _routing defaults to the id of the document
- The algorithm is related to the number of shards, so once the index library is created, the number of shards cannot be modified!
The process of adding new documents is as follows:
Interpretation:
- 1) Add a document with id=1
- 2) Do a hash operation on the id, if the result is 2, it should be stored in shard-2
- 3) The primary shard of shard-2 is on node3, and the data is routed to node3
- 4) Save the document
- 5) Synchronize to replica-2 of shard-2, on the node2 node
- 6) Return the result to the coordinating-node node
4.4. Cluster distributed query
The elasticsearch query is divided into two stages:
-
scatter phase: In the scatter phase, the coordinating node will distribute the request to each shard
-
gather phase: the gathering phase, the coordinating node summarizes the search results of the data node, and processes it as the final result set and returns it to the user
4.5. Cluster failover
The master node of the cluster will monitor the status of the nodes in the cluster. If a node is found to be down, it will immediately migrate the fragmented data of the down node to other nodes to ensure data security. This is called failover.
1) For example, a cluster structure is shown in the figure:
now, node1 is the master node, and the other two nodes are slave nodes.
2) Suddenly, node1 fails:
the first thing after the downtime is to re-elect the master. For example, node2 is selected:
After node2 becomes the master node, it will detect the cluster monitoring status and find that: shard-1 and shard-0 are not replica node. Therefore, the data on node1 needs to be migrated to node2 and node3: