Stage 8: Advanced Service Framework (Chapter 6: ElasticSearch3)

Stage 8: Advanced Service Framework (Chapter 6: ElasticSearch3)

Chapter Six:ElasticSearch

Distributed search engine 3

Insert image description here

0.Learning Objectives

Insert image description here

1. Data aggregation

Aggregations allow us to implement extremely convenientlyStatistics, analysis, and calculation of data. For example:

  • What brand of mobile phones is the most popular?
  • What is the average price, maximum price, and minimum price of these mobile phones?
  • What are the monthly sales of these phones?

sqlIt is much more convenient to   implement these statistical functions than the database , and the query speed is very fast, and real-time search effects can be achieved.

1.1. Types of aggregation

There are three common types of aggregation:

  • Bucket aggregation:Used to group documents and count the number of each group

    • TermAggregation:according toDocument field valueGrouping, such as grouping by brand value, grouping by country
    • Date Histogram:according todate ladderGrouping, such as one group per week or one group per month

    Aggregated fields are not segmented

  • Metric aggregation: used tocalculate some values, such as: maximum value, minimum value, average value, etc.

    • Avg: average value
    • Max: Find the maximum value
    • Min: Find the minimum value
    • Stats: Find max, min, avg, sum, etc. at the same time
  • Pipeline aggregation:Aggregate based on the results of other aggregations

Notice: The fields participating in aggregation must be keyword, , 日期,数值布尔类型;
(That is, the fields participating in aggregation are all fields that cannot be segmented)

Insert image description here

1.2. DSL implements aggregation

  Now, we want to count the types of hotel brands in all the data. In fact, we group the data according to the brand. At this time, aggregation can be done based on the name of the hotel brand, that is, Bucketaggregation.

1.2.1.BucketAggregation syntax (bucket aggregation)

The syntax is as follows:

GET /hotel/_search
{
    
    
  "size": 0,  // 设置size为0,结果中不包含文档,只包含聚合结果
  "aggs": {
    
     // 定义聚合
    "brandAgg": {
    
     //给聚合起个名字,随便起;
      "terms": {
    
     // 聚合的类型,按照字段值聚合,所以选择term,代表TermAggregation,按照文档字段值分组
        "field": "brand", // 参与聚合的字段
        "size": 20 // 希望获取的聚合结果数量
      }
    }
  }
}

aboveWhat is important is the following: Aggregation name, aggregate type, field value ;

    "brandAgg": {
    
     //给聚合起个名字,随便起;
      "terms": {
    
     // 聚合的类型,按照字段值聚合,所以选择term
        "field": "brand", // 参与聚合的字段

The result is as shown below:
Insert image description here

1.2.2.Aggregation result sorting

  By default, Bucketaggregation counts Bucketthe number of documents within, denoted by _count, and follows_countSort descending

  We can specify the order attribute to customize the sorting method of the aggregation:

GET /hotel/_search
{
    
    
  "size": 0,  // 没置size为0,结果中不包含文档,只包含聚合结果
  "aggs": {
    
       // 定义聚合
    "brandAgg": {
    
       //给聚合起个名字,随便起;
      "terms": {
    
       // 聚合的类型,按照字民值聚合,所以选择term,代表TermAggregation,按照文档字段值分组
        "field": "brand",   // 参与案合的字段
        "order": {
    
    
          "_count": "asc" // 按照_count升序排列
        },
        "size": 20   // 希望获取的聚合结果数量;
      }
    }
  }
}

1.2.3.Limit aggregation scope

  By default, Bucket aggregation aggregates all documents in the index database, but in real scenarios, users will enter search conditions, soAggregation must be an aggregation of search results. SoAggregation must be qualified

we canLimit the scope of documents to be aggregated, just add querya condition:

GET /hotel/_search
{
    
    
  "query": {
    
    
    "range": {
    
    
      "price": {
    
    
        "lte": 200 // 只对200元以下的文档聚合
      }
    }
  }, 
  "size": 0, 
  "aggs": {
    
    
    "brandAgg": {
    
    
      "terms": {
    
    
        "field": "brand",
        "size": 20
      }
    }
  }
}

This time, the number of brands aggregated was significantly smaller:
Insert image description here

1.2.4.MetricAggregation syntax (metric aggregation)

  Above, we grouped hotels according to brands, forming buckets. Now we need to do calculations on the hotels in the bucket,Get the user rating of each brand , , equivalent valuesminmaxavg

  This requires the use of Metricaggregation, such as stat aggregation: you can obtain results such minas, max, and so on.avg

The syntax is as follows:

GET /hotel/_search
{
    
    
  "size": 0, 
  "aggs": {
    
     //定义聚合
    "brandAgg": {
    
       //给聚合起个名字,随便起
      "terms": {
    
       //terms聚合
        "field": "brand", 
        "size": 20  //希望获取的聚合结果数量
      },
      "aggs": {
    
     // 是brands聚合的子聚合,也就是分组后对每组分别计算
        "score_stats": {
    
     // 聚合名称
          "stats": {
    
     // 聚合类型,这里stats可以计算min、max、avg等
            "field": "score" // 聚合字段,这里是score
          }
        }
      }
    }
  }
}

  This score_statsaggregation is a sub-aggregation nestedbrandAgg within the aggregation . Because we need to calculate it separately in each bucket.
Insert image description here

  In addition, we can also sort the aggregation results, for example, by the average hotel score in each bucket:
Insert image description here

1.2.5.summary

aggsRepresents aggregation, and query同级, querywhat is its role at this time?

  • Limit the scope of documents to be aggregated

Three elements are necessary for aggregation:

  • Aggregation name
  • Aggregation type
  • Aggregate fields

Aggregate configurable properties are:

  • size:Specify the number of aggregation results
  • order: Specify the sorting method of aggregation results
  • field:Specify the aggregation field

Insert image description here

1.3. RestAPI implements aggregation

1.3.1.API syntax

Aggregation conditionsandqueryconditionAt the same level, you need to use it request.source()to specify aggregation conditions.
Syntax of aggregation conditions:
Insert image description here
Aggregation results are also different from query results, and the API is also special. But the same JSON is parsed layer by layer:
Insert image description here

1.3.2.Business needs

Case: Define methods in IUserService to achieve aggregation of brands, cities, and star ratings.
Requirements: The brand, city, and other information on the search page should not be hard-coded on the page, but obtained by aggregating the hotel data in the index library:
Insert image description here

Analysis:
  Currently, the city list, star list, and brand list on the page are hard-coded and will not change with the search results. But when the user's search conditions change, the search results will change accordingly.

  For example, if a user searches for "Oriental Pearl Tower", the hotel searched must be near the Oriental Pearl Tower in Shanghai. Therefore, the city can only be Shanghai. At this time, the city list should not display information such as Beijing, Shenzhen, and Hangzhou.

  That is to say,Which cities are included in the search results, which cities should be listed on the page; which brands are included in the search results, which brands should be listed on the page

  How do I know which brands are included in my search results? How do I know which cities are included in my search results?

  Use the aggregation function and Bucket aggregation to group the documents in the search results based on brands and cities, so you can know which brands and which cities are included.

  Because the search results are aggregated, the aggregation isscoped aggregationIn other words, the conditions for aggregation are consistent with the conditions for searching documents.

  Looking at the browser, you can find that the front end has actually issued such a request:
Insert image description here
the request parameters are exactly the same as the parameters of the search document .

The return value type is the final result to be displayed on the page:
Insert image description here
the result is a Mapstructure:

  • keyIt is a string, city, star rating, brand, price
  • valueis a collection, such as the names of multiple cities

1.3.3.Business realization

Insert image description here

Add a method to cn.itcast.hotel.webthe package HotelControllerwith the following requirements:

  • Request method:POST
  • Request path:/hotel/filters
  • Request parameters: RequestParams, consistent with the parameters for searching documents
  • Return value type:Map<String, List<String>>

Code:

    @PostMapping("filters")
    public Map<String, List<String>> getFilters(@RequestBody RequestParams params){
    
    
        return hotelService.getFilters(params);
    }

The method called here IHotelService中has getFiltersnot yet been implemented.

cn.itcast.hotel.service.IHotelServiceDefine new method in :

Map<String, List<String>> filters(RequestParams params);

Implement this method in cn.itcast.hotel.service.impl.HotelService:

@Override
public Map<String, List<String>> filters(RequestParams params) {
    
    
    try {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        // 2.1.query只限定范围
        buildBasicQuery(params, request);
        // 2.2.设置size
        request.source().size(0);
        // 2.3.聚合
        buildAggregation(request);
        // 3.发出请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析结果
        Map<String, List<String>> result = new HashMap<>();
        Aggregations aggregations = response.getAggregations();
        // 4.1.根据品牌名称,获取品牌结果
        List<String> brandList = getAggByName(aggregations, "brandAgg");
        result.put("品牌", brandList);
        // 4.2.根据品牌名称,获取品牌结果
        List<String> cityList = getAggByName(aggregations, "cityAgg");
        result.put("城市", cityList);
        // 4.3.根据品牌名称,获取品牌结果
        List<String> starList = getAggByName(aggregations, "starAgg");
        result.put("星级", starList);

        return result;
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

private void buildAggregation(SearchRequest request) {
    
    
    request.source().aggregation(AggregationBuilders
                                 .terms("brandAgg")
                                 .field("brand")
                                 .size(100)
                                );
    request.source().aggregation(AggregationBuilders
                                 .terms("cityAgg")
                                 .field("city")
                                 .size(100)
                                );
    request.source().aggregation(AggregationBuilders
                                 .terms("starAgg")
                                 .field("starName")
                                 .size(100)
                                );
}

private List<String> getAggByName(Aggregations aggregations, String aggName) {
    
    
    // 4.1.根据聚合名称获取聚合结果
    Terms brandTerms = aggregations.get(aggName);
    // 4.2.获取buckets
    List<? extends Terms.Bucket> buckets = brandTerms.getBuckets();
    // 4.3.遍历
    List<String> brandList = new ArrayList<>();
    for (Terms.Bucket bucket : buckets) {
    
    
        // 4.4.获取key
        String key = bucket.getKeyAsString();
        brandList.add(key);
    }
    return brandList;
}

Insert image description here

2.Autocomplete _

Insert image description here

When the user enters a character in the search box, we should prompt search terms related to the character, as shown in the figure:
Insert image description here
This function that prompts complete entries based on the letters entered by the user is automatic completion.

Because it needs to be inferred based on Pinyin letters, the Pinyin word segmentation function is used.

2.1. Pinyin word segmenter

  To achieve completion based on letters, the document must be segmented according to Pinyin. There happens to elasticsearchbe . Address: https://github.com/medcl/elasticsearch-analysis-pinyin
Insert image description here

The pre-course materials also provide an installation package for the Pinyin word segmenter:
Insert image description here
the installation method is the same as the IK word segmenter, which is divided into three steps:

​ ①Unzip
​ ②Upload to the directory elasticsearchin the virtual machine​ ③Restart​ ④Testplugin
elasticsearch

For detailed installation steps, please refer to: Installation process of IK word segmenter .

The test usage is as follows:

POST /_analyze
{
    
    
  "text": "如家酒店还不错",  #要分词的内容;
  "analyzer": "pinyin"   #分词器
}

result:
Insert image description here

2.2. Custom word segmenter

  The default Pinyin tokenizer will separate each Chinese character into Pinyin, and what we want is for each entry to form a group of Pinyin, so we need to do some modifications to the Pinyin tokenizer.Personalized customization,formCustom tokenizer

elasticsearchThe middle word segmenter ( analyzer) consists of three parts:

  • character filters: tokenizerProcess the text before. For example, delete characters, replace characters
  • tokenizer: Cut the text into terms according to certain rules ( term). For example keyword, there is no word segmentation; andik_smart
  • tokenizer filter: tokenizerFurther process the output entries. For example, case conversion, synonym processing, pinyin processing, etc.

When the document is segmented, the document will be processed by these three parts in sequence:
Insert image description here

We can configure a custom analyzer (word segmenter) through settings when creating an index library:
The syntax for declaring a custom tokenizer is as follows:

PUT /test   //创建名为test的索引库
{
    
    

  "settings": {
    
       //定义索引库的分词器的
    "analysis": {
    
    
    
      "analyzer": {
    
     // 自定义分词器
        "my_analyzer": {
    
      // 分词器名称
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      
      "filter": {
    
     // 自定义tokenizer filter
        "py": {
    
     // 过滤器名称
          "type": "pinyin", // 过滤器类型,这里是pinyin
		  "keep_full_pinyin": false,      //解决单个字拼的问题;
          "keep_joined_full_pinyin": true,      //全拼
          "keep_original": true,      //保留中文
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
      
    }
  },



  "mappings": {
    
          //mappings映射时
    "properties": {
    
    
      "name": {
    
    
        "type": "text",
        "analyzer": "my_analyzer",      //name使用自定义的my_analyzer分词器;analyzer在创建索引时使用
        "search_analyzer": "ik_smart"      //search_analyzer在搜索索引时使用;
      }
    }
  }
}

Test: (The results include pinyin and Chinese characters)
Insert image description here

Summarize:

How to use Pinyin tokenizer?

  • ①Download pinyinthe word segmenter
  • ②Unzip and place it in elasticsearchthe plugindirectory
  • ③Restart

How to customize the tokenizer?

  • ① When creating an index library, settingsconfigure it in and can include three parts
  • character filter
  • tokenizer
  • filter

What should I pay attention to when using Pinyin word segmenter?

  • To avoid searching for homophones, use the Pinyin tokenizer when creating the index; do not use the Pinyin tokenizer when searching.

2.3.Autocomplete query

elasticsearchCompletion Suggester query is provided to implement automatic completion function. This query will match and return terms that begin with what the user entered. In order to improve the efficiency of completion queries, there are some constraints on the types of fields in the document:

  • The fields participating in the completion query must be completionof type
  • The content of the field is generally an array formed by multiple entries used for completion.

For example, an index library like this:

// 创建索引库
PUT test
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type": "completion"
      }
    }
  }
}

Then insert the following data:

// 示例数据
POST test/_doc
{
    
    
  "title": ["Sony", "WH-1000XM3"]
}
POST test/_doc
{
    
    
  "title": ["SK-II", "PITERA"]
}
POST test/_doc
{
    
    
  "title": ["Nintendo", "switch"]
}

The query DSL statement is as follows:

// 自动补全查询
GET /test/_search
{
    
    
  "suggest": {
    
    
    "title_suggest": {
    
       //随意起的名称;
      "text": "s", // 关键字
      "completion": {
    
          //自动补全的类型
        "field": "title", // 补全查询的字段
        "skip_duplicates": true, // 跳过重复的
        "size": 10 // 获取前10条结果
      }
    }
  }
}

summary:
Insert image description here

2.4. Implement automatic completion of hotel search box

  Now, our hotelindex library has not set up a pinyin word segmenter, and we need to modify the configuration in the index library. but we knowThe index library cannot be modified, it can only be deleted and re-created.

  In addition, we need to add a field for automatic completion, put brand, suggestion, city, etc. into it as an automatic completion prompt.

So, to summarize, weThings that need to be done include

  1. Modify the hotel index database structure and set up a custom pinyin word segmenter

  2. Modify the name and all fields of the index library and use a custom word segmenter

  3. The index library adds a new field suggestion, whose type is completion type and uses a custom word segmenter.

  4. Add a suggestion field to the HotelDoc class, containing brand and business

  5. Re-import data into the hotel database

2.4.1.Modify hotel mapping structure

code show as below:

// 酒店数据索引库
PUT /hotel
{
    
    



  "settings": {
    
          //定义分词器
    "analysis": {
    
    
    
      "analyzer": {
    
    
      
        "text_anlyzer": {
    
        //全文检索使用的分词器
          "tokenizer": "ik_max_word",
          "filter": "py"
        },
        "completion_analyzer": {
    
          //自动补全使用的分词器
          "tokenizer": "keyword",
          "filter": "py"
        }
        
      },
      
      "filter": {
    
    
        "py": {
    
    
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
      
    }
  },



  "mappings": {
    
    
    "properties": {
    
    
      "id":{
    
    
        "type": "keyword"
      },
      "name":{
    
    
        "type": "text",
        "analyzer": "text_anlyzer",      //创建索引时使用的分词器
        "search_analyzer": "ik_smart",      //搜索时的分词器
        "copy_to": "all"
      },
      "address":{
    
    
        "type": "keyword",
        "index": false
      },
      "price":{
    
    
        "type": "integer"
      },
      "score":{
    
    
        "type": "integer"
      },
      "brand":{
    
    
        "type": "keyword",
        "copy_to": "all"
      },
      "city":{
    
    
        "type": "keyword"
      },
      "starName":{
    
    
        "type": "keyword"
      },
      "business":{
    
    
        "type": "keyword",
        "copy_to": "all"
      },
      "location":{
    
    
        "type": "geo_point"
      },
      "pic":{
    
    
        "type": "keyword",
        "index": false
      },
      "all":{
    
    
        "type": "text",
        "analyzer": "text_anlyzer",      //创建索引时使用的分词器"text_anlyzer"
        "search_analyzer": "ik_smart"      //搜索时使用的分词器"ik_smart"
      },
      "suggestion":{
    
          //自动补全的字段
          "type": "completion",
          "analyzer": "completion_analyzer"   //分词器;
      }
    }
  }
}

2.4.2.Modify HotelDoc entity

  HotelDocA field needs to be added for automatic completion. The content can be hotel brand, city, business district and other information. According to the requirements of auto-complete fields, it is best to be an array of these fields.

  Therefore, we HotelDocadd a suggestionfield of type List<String>, and then put information such as brand, city, and so on inside.business

code show as below:

package cn.itcast.hotel.pojo;

@Data
@NoArgsConstructor
public class HotelDoc {
    
    
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String location;
    private String pic;
    private Object distance;
    private Boolean isAD;
    
    private List<String> suggestion;

    public HotelDoc(Hotel hotel) {
    
    
        this.id = hotel.getId();
        this.name = hotel.getName();
        this.address = hotel.getAddress();
        this.price = hotel.getPrice();
        this.score = hotel.getScore();
        this.brand = hotel.getBrand();
        this.city = hotel.getCity();
        this.starName = hotel.getStarName();
        this.business = hotel.getBusiness();
        this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
        this.pic = hotel.getPic();
        // 组装suggestion
        if(this.business.contains("/")){
    
    
            // business有多个值,需要切割
            String[] arr = this.business.split("/"); //数组;
            // 添加元素
            this.suggestion = new ArrayList<>();
            this.suggestion.add(this.brand);
            Collections.addAll(this.suggestion, arr); //批量添加,将数组中的元素一个一个的添加进集合;
        }else {
    
    
            this.suggestion = Arrays.asList(this.brand, this.business); //集合
        }
    }
}

Insert image description here

2.4.3.Reimport

Re-execute the previously written import data function, and you can see that the new hotel data contains suggestions:
Insert image description here

2.4.4.JavaAPI for autocomplete queries

Previously, we learned the DSL for automatic query completion, but did not learn the corresponding Java API. Here is an example:
Insert image description here

The fields of the above completion query should be written as your own
Insert image description here

The results of automatic completion are also quite special. The parsed code is as follows:
Insert image description here

2.4.5.Implement automatic completion of search box

Looking at the front-end page, we can find that when we type in the input box, the front-end will initiate ajaxa request:
Insert image description here
the return value is a collection of completed terms, of typeList<String>

1) Add a new interface cn.itcast.hotel.webunder the package HotelControllerto receive new requests:

@GetMapping("suggestion")
public List<String> getSuggestions(@RequestParam("key") String prefix) {
    
    
    return hotelService.getSuggestions(prefix);
}

2) Add methods under cn.itcast.hotel.servicethe package :IhotelService

List<String> getSuggestions(String prefix);

3) cn.itcast.hotel.service.impl.HotelServiceImplement the method in:

@Override
public List<String> getSuggestions(String prefix) {
    
          //prefix是前端传过来的参数,是关键字
    try {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        request.source().suggest(new SuggestBuilder().addSuggestion(
            "suggestions",
            SuggestBuilders.completionSuggestion("suggestion")   //补全字段
            .prefix(prefix)  //前段传过来的参数,是关键字
            .skipDuplicates(true)
            .size(10)
        ));
        // 3.发起请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析结果
        Suggest suggest = response.getSuggest();
        // 4.1.根据补全查询名称,获取补全结果
        CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
        // 4.2.获取options
        List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
        // 4.3.遍历
        List<String> list = new ArrayList<>(options.size());
        for (CompletionSuggestion.Entry.Option option : options) {
    
    
            String text = option.getText().toString();
            list.add(text);
        }
        return list;
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

3. Data synchronization

  elasticsearchThe hotel data in comes from mysqlthe database, so mysqlwhen the data changes, elasticsearchit must also change. This is the data synchronizationelasticsearch betweenmysql
Insert image description here

3.1. Idea analysis

There are three common data synchronization solutions:

  • Synchronous call
  • Asynchronous notification
  • monitorbinlog

3.1.1.Synchronous call

Option 1: Synchronous call
Insert image description here

The basic steps are as follows:

  • hotel-demoProvides an external interface to modify elasticsearchthe data in
  • Hotel management services are being completedDatabase operationsAfter that, directly call hotel-demothe provided interface,

3.1.2. Asynchronous notification

Option 2: Asynchronous notification
Insert image description here

The process is as follows:

  • hotel-adminAfter mysqladding, deleting, and modifying database data, send MQa message
  • hotel-demoMonitor and complete data modification MQafter receiving the messageelasticsearch

3.1.3.monitorbinlog

Option 3: Monitoring binlog
Insert image description here
process is as follows:

  • Turn mysqlon binlogthe function
  • mysqlCompleted addition, deletion, and modification operations will be recorded binlogin
  • hotel-demoBased on canalmonitoring binlogchanges, elasticsearchthe content is updated in real time

3.1.4.Select

Method 1: Synchronous call

  • Advantages: simple to implement, crude
  • Disadvantages: high degree of business coupling

Method 2: Asynchronous notification

  • Advantages: low coupling, average implementation difficulty
  • Disadvantages: Rely on the reliability of mq

Method 3: Monitoringbinlog

  • Advantages: Completely decoupling services
  • Disadvantages: Opening binlogincreases database burden and high implementation complexity

3.2. Implement data synchronization

Utilize MQimplementation mysqland elasticsearchdata synchronization

3.2.1. Ideas

  Use the project provided in the pre-course material hotel-adminas a microservice for hotel management. When hotel data is added, deleted, or modified, the elasticsearchsame operations must be completed for the data in the center.

step:

  • Import the project provided by the pre-course materials hotel-admin, start and test the hotel dataCRUD
  • declare exchange(switch), queue(queue),RoutingKey
  • Complete the message sending in hotel-adminthe add, delete and modify business in
  • hotel-demoComplete message monitoring in and update elasticsearchdata in
  • Start and test data sync functionality

3.2.2.Import demo

Import the projects provided by the pre-class materials hotel-admin:
Insert image description here
after running, visithttp://localhost:8099
Insert image description here

It includes hotel CRUDfunctions:
Insert image description here

3.2.3. Declare switches and queues

MQThe structure is as shown in the figure:
Insert image description here
In the tutorial, the switch and queue are declared hotel-demoin the consumer;

1)Introduce dependencies

hotel-adminDependencies hotel-demointroduced in rabbitmq:

<!--amqp-->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-amqp</artifactId>
</dependency>

1)configured amqpaddress

Address configured in .ymlfile amqp:
Insert image description here

2)Declare queue switch name

Create a new class under the package in hotel-adminand :hotel-democn.itcast.hotel.constatntsMqConstants

package cn.itcast.hotel.constatnts;

    public class MqConstants {
    
    
    /**
     * 交换机
     */
    public final static String HOTEL_EXCHANGE = "hotel.topic";
    /**
     * 监听新增和修改的队列
     */
    public final static String HOTEL_INSERT_QUEUE = "hotel.insert.queue";
    /**
     * 监听删除的队列
     */
    public final static String HOTEL_DELETE_QUEUE = "hotel.delete.queue";
    /**
     * 新增或修改的RoutingKey
     */
    public final static String HOTEL_INSERT_KEY = "hotel.insert";
    /**
     * 删除的RoutingKey
     */
    public final static String HOTEL_DELETE_KEY = "hotel.delete";
}

3)Declare the queue switch (define the switch queue, routingKey binding relationship)

In hotel-demo, cn.itcast.hotel.configdefine the configuration class under the package MqConfigand declare the queue and switch:

package cn.itcast.hotel.config;

@Configuration
public class MqConfig {
    
    
    @Bean
    public TopicExchange topicExchange(){
    
          //交换机
        return new TopicExchange(MqConstants.HOTEL_EXCHANGE, true, false); //true代表持久化
    }

    @Bean
    public Queue insertQueue(){
    
          //增加和修改的队列
        return new Queue(MqConstants.HOTEL_INSERT_QUEUE, true);
    }

    @Bean
    public Queue deleteQueue(){
    
          //删除的队列;
        return new Queue(MqConstants.HOTEL_DELETE_QUEUE, true);
    }

    @Bean
    public Binding insertQueueBinding(){
    
       //绑定关系;
    //insertQueue()队列绑定到topicExchange()交换机,使用MqConstants.HOTEL_INSERT_KEY这个RoutingKey
        return BindingBuilder.bind(insertQueue()).to(topicExchange()).with(MqConstants.HOTEL_INSERT_KEY); 
    }

    @Bean
    public Binding deleteQueueBinding(){
    
       //绑定关系;
       //deleteQueue()队列绑定到topicExchange()交换机,使用MqConstants.HOTEL_DELETE_KEY这个RoutingKey
        return BindingBuilder.bind(deleteQueue()).to(topicExchange()).with(MqConstants.HOTEL_DELETE_KEY);
    }
}

3.2.4.Send MQ messages

  Under the package hotel-demoof the generalcn.itcast.hotel.constatntsDeclare queue switch nameMqConstantsCopy the class hotel-adminto cn.itcast.hotel.constatntsthe package in so that there will be no errors when using various names;
  also add the 3.2.3same dependencies as and configure the same amqpaddress ;

Send messages respectively in hotel-adminthe add, delete and modify business : (in the class under the package under the project ):MQhotel-admincn.itcast.hotel.webHotelController

Inject what is needed to send the message api:
Insert image description here

Insert image description here

3.2.5.Receive MQ messages

hotel-demoMQThings to do when receiving a message include:

  • 新增hotelMessage: According to the passed idquery hotelinformation, then add a piece of data to the index database
  • 删除Message: Delete a piece of data in the index database based on the hotelpassedid

1) First add and delete services in hotel-demothe cn.itcast.hotel.servicepackageIHotelService

void deleteById(Long id);

void insertById(Long id);

2) Implement the business under the hotel-demopackage :cn.itcast.hotel.service.implHotelService

@Override
public void deleteById(Long id) {
    
    
    try {
    
    
        // 1.准备Request
        DeleteRequest request = new DeleteRequest("hotel", id.toString());
        // 2.发送请求
        client.delete(request, RequestOptions.DEFAULT);
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

@Override
public void insertById(Long id) {
    
    
    try {
    
    
        // 0.根据id查询酒店数据
        Hotel hotel = getById(id);
        // 转换为文档类型
        HotelDoc hotelDoc = new HotelDoc(hotel);

        // 1.准备Request对象
        IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
        // 2.准备Json文档
        request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
        // 3.发送请求
        client.index(request, RequestOptions.DEFAULT);
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

3) Write a listener and add a new class
in hotel-demothe cn.itcast.hotel.mqpackage:

package cn.itcast.hotel.mq;

@Component
public class HotelListener {
    
    
    @Autowired
    private IHotelService hotelService;

    /**
     * 监听酒店新增或修改的业务
     * @param id 酒店id
     */
    @RabbitListener(queues = MqConstants.HOTEL_INSERT_QUEUE)
    public void listenHotelInsertOrUpdate(Long id){
    
    
        hotelService.insertById(id);
    }

    /**
     * 监听酒店删除的业务
     * @param id 酒店id
     */
    @RabbitListener(queues = MqConstants.HOTEL_DELETE_QUEUE)
    public void listenHotelDelete(Long id){
    
    
        hotelService.deleteById(id);
    }
}

4. Cluster

Insert image description here

When doing data storage on a single machine elasticsearch, you will inevitably face two problems:Mass data storage problemSingle point of failure problem

  • Mass data storage problem: Logically split the index library into N shards ( shard) and store them on multiple nodes
    Insert image description here

  • Single point of failure problem: Back up sharded data on different nodes ( replica )

ES cluster related concepts :

  • Cluster: A group of nodes with a common cluster name.

  • Node : an Elasticsearch instance in the cluster

  • Sharding : The index can be split into different parts for storage, called sharding. In a cluster environment, different shards of an index can be split into different nodes

    Solve the problem: The amount of data is too large and the storage capacity at a single point is limited.
    Insert image description here

    Here, we divide the data into 3 slices: shard0, shard1, shard2

  • Primary shard ( Primary shard): relative to the definition of replica shards.

  • Replica shards ( Replica shard) Each primary shard can have one or more replicas, and the data is the same as the primary shard.

  Data backup can ensure high availability, but if you back up each shard, the number of nodes required will double, and the cost is too high!
In order to find a balance between high availability and cost, we can do this:

  • First, the data is fragmented and stored in different nodes.
  • Then back up each shard and put it on the other node to complete mutual backup.

This can greatly reduce the number of service nodes required. As shown in the figure, we take 3 shards and one backup for each shard as an example:
Insert image description here
Now, each shard has 1 backup, stored on 3 nodes:

  • node0: saves shards 0 and 1
  • node1: saves shards 0 and 2
  • node2: saved shards 1 and 2

4.1. Build ES cluster

Reference documents for pre-course materials :
Insert image description here

Chapter 4:
Insert image description here

4.2. Cluster split-brain problem

4.2.1. Division of cluster responsibilities

elasticsearchMedium cluster nodes have different responsibilities:
Insert image description here

By default, any node in the cluster has the above four roles.
But a real cluster must separate cluster responsibilities

  • master节点: High CPU requirements, but low memory requirements
  • data节点: High requirements for both CPU and memory
  • coordinating节点: High requirements for network bandwidth and CPU

Separation of duties allows us to allocate different hardware for deployment according to the needs of different nodes. And avoid mutual interference between businesses.

A typical esdivision of cluster responsibilities is as follows:
Insert image description here

4.2.2.split brain problem

Split-brain is caused by nodes in the cluster being disconnected.
For example, in a cluster, the master node loses contact with other nodes:
Insert image description here
at this time, node2if node3it is considered node1to be down, it will re-elect the master:
Insert image description here
when node3 is elected, the cluster will continue to provide services to the outside world, node2 and node3 form their own cluster, and node1 forms its own cluster.The data of the two clusters are out of sync and data differences occur.

When the network is restored, because there are two master nodes in the cluster, the cluster status is inconsistent and a split-brain situation occurs:
Insert image description here
  Solution to split brainYes, requiredThe number of votes must exceed (number of eligible nodes + 1)/2 to be elected as the leader., so the number of eligible nodes is preferably an odd number. The corresponding configuration item is discovery.zen.minimum_master_nodes, which has become the default configuration after es7.0, so split-brain problems generally do not occur.

  For example: for a cluster formed by 3 nodes, the votes must exceed (3 + 1) / 2, which is 2 votes. node3 received votes from node2 and node3 and was elected as the leader. Node1 only has 1 vote of its own and was not elected. There is still only one master node in the cluster, and there is no split brain.

4.2.3. Summary

master eligibleWhat is the role of nodes?

  • Participate in cluster leader election
  • The master node can manage cluster status, manage shard information, and handle requests to create and delete index libraries.

dataWhat is the role of nodes?

  • Data CRUD

coordinatorWhat is the role of nodes?

  • Route requests to other nodes
  • Combine the query results and return them to the user

4.3. Cluster distributed storage

When a new document is added, it should be saved to different shards to ensure data balance. So coordinating nodehow to determine which shard the data should be stored in?

4.3.1. Sharded storage test

Insert three pieces of data:
Insert image description here

Insert image description here

Insert image description here
From the test, you can see that the three pieces of data are in different shards:
Insert image description here

result:
Insert image description here

4.3.2. Principle of sharded storage

elasticsearchAn algorithm is used hashto calculate which shard the document should be stored in:
Insert image description here

illustrate:

  • _routingThe default is documentid
  • The algorithm is related to the number of shards, so once the index library is created, the number of shards cannot be modified !

The process for adding new documents is as follows:
Insert image description here

Interpretation:

  • 1) Add a id=1new document
  • 2) Do the operation idon it hash. If the result is 2, it should be stored inshard-2
  • 3) shard-2The primary shard is on node3the node and routes the data tonode3
  • 4) Save the document
  • 5) Synchronize the given shard-2copy replica-2on node2the node
  • 6) Return the result to coordinating-nodethe node

4.4. Cluster distributed query

elasticsearchThe query is divided into two stages:

  • scatter phasedispersion stage, coordinating node(coordinating node) will distribute the request to each shard
  • gather phaseaggregation stage, coordinating nodesummarize data nodethe search results and process them into a final result set returned to the user

Insert image description here

summary:
Insert image description here

4.5.Cluster failover

  The nodes in the cluster masterwill monitor the status of the nodes in the cluster. If a node is found to be down, the fragmented data of the down node will be immediately migrated to other nodes to ensure data security. This is calledfailover

1) For example, a cluster structure is as shown in the figure:
Insert image description here
Now, node1it is the master node, and the other two nodes are slave nodes.

2) Suddenly, node1a failure occurs:
Insert image description here
the first thing after the downtime is to re-elect the master. For example, if you select node2:
Insert image description here
  node2After becoming the master node, the cluster monitoring status will be checked and it will be found that: shard-1, shard-0there is no replica node. node1Therefore, the data above needs to be migrated to node2:node3
Insert image description here

summary:
Insert image description here

Guess you like

Origin blog.csdn.net/weixin_52223770/article/details/128701275