SpringCloud microservice technology stack. Dark horse follow-up (7)

SpringCloud microservice technology stack. Dark horse follow-up 7

today's goal

insert image description here

1. Data aggregation

** Aggregations ( aggregations ) ** allow us to realize the statistics, analysis and operation of data extremely conveniently. For example:

  • What brand of mobile phone is the most popular?
  • The average price, the highest price, the lowest price of these phones?
  • How are these phones selling monthly?

It is much more convenient to implement these statistical functions than the sql of the database, and the query speed is very fast, which can realize near real-time search effect.

1.1. Types of Aggregation

There are three common types of aggregation:

  • **Bucket** aggregation: used to group documents

    • TermAggregation: group by document field value, such as group by brand value, group by country
    • Date Histogram: Group by date ladder, for example, a week as a group, or a month as a group
  • **Metric** aggregation: used to calculate some values, such as: maximum value, minimum value, average value, etc.

    • Avg: Average
    • Max: find the maximum value
    • Min: Find the minimum value
    • Stats: Simultaneously seek max, min, avg, sum, etc.
  • **Pipeline** Aggregation: Aggregation based on the results of other aggregations

**Note:** The fields participating in the aggregation must be keyword, date, value, and Boolean

1.2.DSL realizes aggregation

Now, we want to count the hotel brands in all the data. In fact, we group the data according to the brand. At this point, aggregation can be done based on the name of the hotel brand, that is, Bucket aggregation. ### 1.2.1. Bucket aggregation syntax
The syntax is as follows:

GET /hotel/_search
{
    
    
  "size": 0,  // 设置size为0,结果中不包含文档,只包含聚合结果
  "aggs": {
    
     // 定义聚合
    "brandAgg": {
    
     //给聚合起个名字
      "terms": {
    
     // 聚合的类型,按照品牌值聚合,所以选择term
        "field": "brand", // 参与聚合的字段
        "size": 20 // 希望获取的聚合结果数量
      }
    }
  }
}

The result is shown in the figure:
insert image description here

1.2.2. Aggregation result sorting

By default, Bucket aggregation will count the number of documents in the Bucket, record it as _count, and sort in descending order of _count.
We can specify the order attribute to customize the sorting method of the aggregation:

GET /hotel/_search
{
    
    
  "size": 0, 
  "aggs": {
    
    
    "brandAgg": {
    
    
      "terms": {
    
    
        "field": "brand",
        "order": {
    
    
          "_count": "asc" // 按照_count升序排列
        },
        "size": 20
      }
    }
  }
}

1.2.3. Limit the scope of aggregation

By default, Bucket aggregation aggregates all documents in the index library, but in real scenarios, users will enter search conditions, so the aggregation must be the aggregation of search results. Then the aggregation has to be qualified.

We can limit the range of documents to be aggregated by adding query conditions:

GET /hotel/_search
{
    
    
  "query": {
    
    
    "range": {
    
    
      "price": {
    
    
        "lte": 200 // 只对200元以下的文档聚合
      }
    }
  }, 
  "size": 0, 
  "aggs": {
    
    
    "brandAgg": {
    
    
      "terms": {
    
    
        "field": "brand",
        "size": 20
      }
    }
  }
}

This time, the aggregated brands are significantly less:
insert image description here

1.2.4. Metric aggregation syntax

In the last class, we grouped hotels by brand to form buckets. Now we need to perform calculations on the hotels in the bucket to obtain the min, max, and avg values ​​of the user ratings for each brand.

This requires the use of Metric aggregation, such as stat aggregation: you can get results such as min, max, and avg.

The syntax is as follows:

GET /hotel/_search
{
    
    
  "size": 0, 
  "aggs": {
    
    
    "brandAgg": {
    
     
      "terms": {
    
     
        "field": "brand", 
        "size": 20
      },
      "aggs": {
    
     // 是brands聚合的子聚合,也就是分组后对每组分别计算
        "score_stats": {
    
     // 聚合名称
          "stats": {
    
     // 聚合类型,这里stats可以计算min、max、avg等
            "field": "score" // 聚合字段,这里是score
          }
        }
      }
    }
  }
}

This time the score_stats aggregation is a sub-aggregation nested inside the brandAgg aggregation. Because we need to calculate separately in each bucket.

In addition, we can also sort the aggregation results, for example, according to the average hotel score of each bucket:
insert image description here

1.2.5. Summary

aggs stands for aggregation, which is at the same level as query. What is the function of query at this time?

  • Limit the scope of aggregated documents
    Aggregation must have three elements:

  • aggregate name

  • aggregation type

  • aggregate field

Aggregate configurable properties are:

  • size: specify the number of aggregation results
  • order: specify the sorting method of aggregation results
  • field: specify the aggregation field

1.3. RestAPI implements aggregation

1.3.1. API Syntax

Aggregation conditions are at the same level as query conditions, so request.source() needs to be used to specify aggregation conditions.

Syntax for aggregate conditions:
insert image description here

The aggregation result is also different from the query result, and the API is also special. But it is also JSON layer by layer analysis:
insert image description here
the final code
HotelSearchTest.java

 @Test
    public void testAggregation() throws IOException {
    
    
        // 1.准备request
        SearchRequest searchRequest = new SearchRequest("hotel");
        // 2.准备DSL
        searchRequest.source().size(0);
        searchRequest.source().aggregation(AggregationBuilders.terms("brandAgg").field("brand").size(10).order(BucketOrder.aggregation("_count", true)));

        // 3.发出请求
        SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 4.给出结果
        //System.out.println(response);
        Aggregations aggregations = response.getAggregations();
        Terms brandTerms = aggregations.get("brandAgg");
        List<? extends Terms.Bucket> buckets = brandTerms.getBuckets();
        for (Terms.Bucket bucket : buckets) {
    
    
            String brandName = bucket.getKeyAsString();
            System.out.println(brandName);
        }
    }

Output result:
insert image description here

1.3.2. Business requirements

Requirement: The brand, city and other information of the search page should not be hard-coded on the page, but obtained by aggregated hotel data in the index library:
insert image description here

analyze:

At present, the city list, star list, and brand list on the page are all hard-coded, and will not change as the search results change. But when the user's search conditions change, the search results will change accordingly.

For example, if a user searches for "Oriental Pearl", the searched hotel must be near the Shanghai Oriental Pearl Tower. Therefore, the city can only be Shanghai. At this time, Beijing, Shenzhen, and Hangzhou should not be displayed in the city list.

That is to say, which cities are included in the search results, which cities should be listed on the page; which brands are included in the search results, which brands should be listed on the page.

How do I know which brands are included in my search results? How do I know which cities are included in my search results?

Use the aggregation function and Bucket aggregation to group the documents in the search results based on brands and cities, and you can know which brands and cities are included.

Because it is an aggregation of search results, the aggregation is a limited-range aggregation , that is to say, the limiting conditions of the aggregation are consistent with the conditions of the search document.

Looking at the browser, we can find that the front end has actually sent such a request:
insert image description here

The request parameters are exactly the same as those for searching documents .

The return value type is the final result to be displayed on the page:
insert image description here

The result is a Map structure:

  • key is a string, city, star, brand, price
  • value is a collection, such as the names of multiple cities

1.3.3. Business Realization

Add a method to cn.itcast.hotel.webthe package HotelController, following the requirements:

  • Request method:POST
  • Request path:/hotel/filters
  • Request parameters: RequestParams, consistent with the parameters of the search document
  • Return value type:Map<String, List<String>>

code:

    @PostMapping("filters")
    public Map<String, List<String>> getFilters(@RequestBody RequestParams params){
    
    
        return hotelService.getFilters(params);
    }

The getFilters method in IHotelService is called here, which has not been implemented yet. Define the new method in
:cn.itcast.hotel.service.IHotelService

Map<String, List<String>> filters(RequestParams params);

cn.itcast.hotel.service.impl.HotelServiceImplement the method in :

@Override
public Map<String, List<String>> filters(RequestParams params) {
    
    
    try {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        // 2.1.query
        buildBasicQuery(params, request);
        // 2.2.设置size
        request.source().size(0);
        // 2.3.聚合
        buildAggregation(request);
        // 3.发出请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析结果
        Map<String, List<String>> result = new HashMap<>();
        Aggregations aggregations = response.getAggregations();
        // 4.1.根据品牌名称,获取品牌结果
        List<String> brandList = getAggByName(aggregations, "brandAgg");
        result.put("brand", brandList);
        // 4.2.根据品牌名称,获取品牌结果
        List<String> cityList = getAggByName(aggregations, "cityAgg");
        result.put("city", cityList);
        // 4.3.根据品牌名称,获取品牌结果
        List<String> starList = getAggByName(aggregations, "starAgg");
        result.put("starName", starList);

        return result;
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

private void buildAggregation(SearchRequest request) {
    
    
    request.source().aggregation(AggregationBuilders
                                 .terms("brandAgg")
                                 .field("brand")
                                 .size(100)
                                );
    request.source().aggregation(AggregationBuilders
                                 .terms("cityAgg")
                                 .field("city")
                                 .size(100)
                                );
    request.source().aggregation(AggregationBuilders
                                 .terms("starAgg")
                                 .field("starName")
                                 .size(100)
                                );
}

private List<String> getAggByName(Aggregations aggregations, String aggName) {
    
    
    // 4.1.根据聚合名称获取聚合结果
    Terms brandTerms = aggregations.get(aggName);
    // 4.2.获取buckets
    List<? extends Terms.Bucket> buckets = brandTerms.getBuckets();
    // 4.3.遍历
    List<String> brandList = new ArrayList<>();
    for (Terms.Bucket bucket : buckets) {
    
    
        // 4.4.获取key
        String key = bucket.getKeyAsString();
        brandList.add(key);
    }
    return brandList;
}

View Results:
insert image description here

2. Auto-completion

When the user enters a character in the search box, we should prompt the search item related to the character, as shown in the figure:
insert image description here
This function of prompting the complete entry according to the letter entered by the user is automatic completion.
Because it needs to be inferred based on the pinyin letters, the pinyin word segmentation function is used.

2.1. Pinyin tokenizer

To achieve completion based on letters, it is necessary to segment the document according to pinyin. There happens to be a pinyin word segmentation plugin for elasticsearch on GitHub. Address: Pinyin word breaker plug-
insert image description here
in The installation package of the Pinyin word breaker is also provided in the pre-class materials: the
insert image description here
installation method is the same as the IK word breaker, in three steps:
​①Decompress
​②Upload to the virtual machine, the plugin directory of elasticsearch Directory
location:
/var/lib/docker/volumes/es-plugins/_data
insert image description here

​③Restart elasticsearch

docker restart es

insert image description here

​④Test For
detailed installation steps, please refer to the installation process of the IK tokenizer.
The test usage is as follows:

POST /_analyze
{
    
    
  "text": "如家酒店还不错",
  "analyzer": "pinyin"
}

Result:
insert image description here
You can see that there is a problem with the Pinyin word segmenter
1. Only Pinyin has no Chinese characters, and Pinyin should be the icing on the cake, not just Pinyin
2. Pinyin does not implement word segmentation, but the full name

Based on the above problems, we need to customize the pinyin word breaker

2.2. Custom tokenizer

The default pinyin word breaker divides each Chinese character into pinyin, but we want each entry to form a set of pinyin, so we need to customize the pinyin word breaker to form a custom word breaker.

The composition of the analyzer in elasticsearch consists of three parts:

  • Character filters: Process the text before the tokenizer. e.g. delete characters, replace characters
  • tokenizer: Cut the text into terms according to certain rules. For example, keyword is not participle; there is also ik_smart
  • tokenizer filter: further process the entries output by the tokenizer. For example, case conversion, synonyms processing, pinyin processing, etc.

These three parts will process the document in turn when segmenting the document:
insert image description here
the syntax for declaring a custom tokenizer is as follows:

PUT /test
{
    
    
  "settings": {
    
    
    "analysis": {
    
    
      "analyzer": {
    
     // 自定义分词器
        "my_analyzer": {
    
      // 分词器名称
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
    
     // 自定义tokenizer filter
        "py": {
    
     // 过滤器名称
          "type": "pinyin", // 过滤器类型,这里是pinyin
		  "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    
    
    "properties": {
    
    
      "name": {
    
    
        "type": "text",
        "analyzer": "my_analyzer",
      }
    }
  }
}

Test:
insert image description here
Another way to test:

# 测试分词器
POST /test/_doc/1
{
    
    
  "id" : 1,
  "name":"狮子"
}

POST /test/_doc/2
{
    
    
  "id" : 2,
  "name":"虱子"
}

GET /test/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "name": "shizi"
    }
  }
}

Test result:
insert image description here
We searched for Chinese characters and found lice, which is obviously wrong
insert image description here

Summarize:

How to use Pinyin tokenizer?

  • ①Download the pinyin tokenizer

  • ② Unzip and put it in the plugin directory of elasticsearch

  • ③Restart

How to customize the tokenizer?

  • ① When creating an index library, configure it in settings, which can contain three parts

  • ②character filter

  • ③tokenizer

  • ④filter

Precautions for pinyin word breaker?

  • In order to avoid searching for homophones, do not use the pinyin word breaker when searching.
    insert image description here
    Solution: add search_analyzer
PUT /test
{
    
    
  "settings": {
    
    
    "analysis": {
    
    
      "analyzer": {
    
     // 自定义分词器
        "my_analyzer": {
    
      // 分词器名称
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
    
     // 自定义tokenizer filter
        "py": {
    
     // 过滤器名称
          "type": "pinyin", // 过滤器类型,这里是pinyin
		  "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    
    
    "properties": {
    
    
      "name": {
    
    
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

Search here again
insert image description here

2.3. Autocomplete query

Elasticsearch provides Completion Suggester query to achieve automatic completion. This query will match terms beginning with the user input and return them. In order to improve the efficiency of the completion query, there are some constraints on the types of fields in the document:

  • The fields participating in the completion query must be of completion type.
  • The content of the field is generally an array formed by multiple entries for completion.

You can delete the previously tested index library

DELETE /test

For example, an index library like this:

// 创建索引库
PUT test
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type": "completion"
      }
    }
  }
}

Then insert the following data:

// 示例数据
POST test/_doc
{
    
    
  "title": ["Sony", "WH-1000XM3"]
}
POST test/_doc
{
    
    
  "title": ["SK-II", "PITERA"]
}
POST test/_doc
{
    
    
  "title": ["Nintendo", "switch"]
}

The query DSL statement is as follows:

// 自动补全查询
GET /test/_search
{
    
    
  "suggest": {
    
    
    "title_suggest": {
    
    
      "text": "s", // 关键字
      "completion": {
    
    
        "field": "title", // 补全查询的字段
        "skip_duplicates": true, // 跳过重复的
        "size": 10 // 获取前10条结果
      }
    }
  }
}

Display after query:
insert image description here
Summary:
Auto-completion requirements for fields:
● The type is completion type
● The field value is an array of multiple entries

2.4. Realize automatic completion of hotel search box

Now, our hotel index library has not set up a pinyin word breaker, and we need to modify the configuration in the index library. But we know that the index library cannot be modified, it can only be deleted and then recreated.
In addition, we need to add a field for auto-completion, and put the brand, suggestion, city, etc. into it as a prompt for auto-completion.

So, to summarize, the things we need to do include:

  1. Modify the structure of the hotel index library and set a custom pinyin word breaker
  2. Modify the name and all fields of the index library and use a custom tokenizer
  3. The index library adds a new field suggestion, the type is completion type, using a custom tokenizer
  4. Add a suggestion field to the HotelDoc class, which contains brand and business
  5. Re-import data to the hotel library

2.4.1. Modify the hotel mapping structure

The code is as follows:
first delete the previous index library

DELETE /hotel
# 酒店数据索引库
PUT /hotel
{
    
    
  "settings": {
    
    
    "analysis": {
    
    
      "analyzer": {
    
    
        "text_anlyzer": {
    
    
          "tokenizer": "ik_max_word",
          "filter": "py"
        },
        "completion_analyzer": {
    
    
          "tokenizer": "keyword",
          "filter": "py"
        }
      },
      "filter": {
    
    
        "py": {
    
    
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    
    
    "properties": {
    
    
      "id":{
    
    
        "type": "keyword"
      },
      "name":{
    
    
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart",
        "copy_to": "all"
      },
      "address":{
    
    
        "type": "keyword",
        "index": false
      },
      "price":{
    
    
        "type": "integer"
      },
      "score":{
    
    
        "type": "integer"
      },
      "brand":{
    
    
        "type": "keyword",
        "copy_to": "all"
      },
      "city":{
    
    
        "type": "keyword"
      },
      "starName":{
    
    
        "type": "keyword"
      },
      "business":{
    
    
        "type": "keyword",
        "copy_to": "all"
      },
      "location":{
    
    
        "type": "geo_point"
      },
      "pic":{
    
    
        "type": "keyword",
        "index": false
      },
      "all":{
    
    
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart"
      },
      "suggestion":{
    
    
          "type": "completion",
          "analyzer": "completion_analyzer"
      }
    }
  }
}

2.4.2. Modify the HotelDoc entity

A field needs to be added in HotelDoc for automatic completion, and the content can be information such as hotel brand, city, business district, etc. As required for autocomplete fields, preferably an array of these fields.

So we add a suggestion field in HotelDoc, the type is List<String>, and then put information such as brand, city, business, etc. into it.

code show as below:

package cn.itcast.hotel.pojo;

import lombok.Data;
import lombok.NoArgsConstructor;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

@Data
@NoArgsConstructor
public class HotelDoc {
    
    
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String location;
    private String pic;
    private Object distance;
    private Boolean isAD;
    private List<String> suggestion;

    public HotelDoc(Hotel hotel) {
    
    
        this.id = hotel.getId();
        this.name = hotel.getName();
        this.address = hotel.getAddress();
        this.price = hotel.getPrice();
        this.score = hotel.getScore();
        this.brand = hotel.getBrand();
        this.city = hotel.getCity();
        this.starName = hotel.getStarName();
        this.business = hotel.getBusiness();
        this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
        this.pic = hotel.getPic();
        this.suggestion = Arrays.asList(this.brand, this.business);
        
    }
}

2.4.3. Reimport

Imported by the previous unit test batch
insert image description here

Re-execute the import data function written before, and you can see that the new hotel data contains suggestion: but if it contains 2 business districts, it will be a comma, then we need to split the comma
insert image description here

Modify the entity class HotelDoc.java
to increase the division of commas

package cn.itcast.hotel.pojo;

import lombok.Data;
import lombok.NoArgsConstructor;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

@Data
@NoArgsConstructor
public class HotelDoc {
    
    
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String location;
    private String pic;
    private Object distance;
    private Boolean isAD;
    private List<String> suggestion;

    public HotelDoc(Hotel hotel) {
    
    
        this.id = hotel.getId();
        this.name = hotel.getName();
        this.address = hotel.getAddress();
        this.price = hotel.getPrice();
        this.score = hotel.getScore();
        this.brand = hotel.getBrand();
        this.city = hotel.getCity();
        this.starName = hotel.getStarName();
        this.business = hotel.getBusiness();
        this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
        this.pic = hotel.getPic();
        // 组装suggestion
        if(this.business.contains("/")){
    
    
            // business有多个值,需要切割
            String[] arr = this.business.split("、");
            // 添加元素
            this.suggestion = new ArrayList<>();
            this.suggestion.add(this.brand);
            this.suggestion.add(this.city);
            Collections.addAll(this.suggestion, arr);
        }else {
    
    
            this.suggestion = Arrays.asList(this.brand, this.business, this.city);
        }
    }
}

Then check the results:
insert image description here
Also test the auto-completion

# 测试提示查询
GET /hotel/_search
{
    
    
  "suggest": {
    
    
    "suggestions": {
    
    
      "text": "sd",
      "completion": {
    
    
        "field": "suggestion",
        "skip_duplicates" : true,
        "size" : 10
      }
    }
  }
}

Query result:
all start with SD
insert image description here

2.4.4. Java API for auto-completion query

Before, we learned the DSL of auto-completion query, but did not learn the corresponding Java API. Here is an example: the
insert image description here
result of auto-completion is also special, and the parsing code is as follows:
insert image description here
Let's first write a test class to test
and modify HotelSearchTest. java

@Test
    public void testSuggestionsSearch() throws IOException {
    
    
        // 1.准备SearchRequest
        SearchRequest searchRequest = new SearchRequest("hotel");
        // 2.准备DSL
        searchRequest.source().suggest(new SuggestBuilder().addSuggestion("suggestions",
                SuggestBuilders.completionSuggestion("suggestion")
                        .prefix("sd").skipDuplicates(true).size(10)));

        // 3.发送请求
        SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        // 4.解析结果
        Suggest suggest = response.getSuggest();
        CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
        List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
        List<String> list = new ArrayList<>(options.size());
        for (CompletionSuggestion.Entry.Option option : options) {
    
    
            String text = option.getText().toString();
            list.add(text);
        }
        System.out.println(list);
    }

search result:
insert image description here

2.4.5. Realize the automatic completion of the search box

Looking at the front-end page, we can find that when we type in the input box, the front-end will initiate an ajax request:
insert image description here
the return value is a collection of completed entries, and the type isList<String>

1) Add a new interface cn.itcast.hotel.webunder the package HotelControllerto receive new requests:

@GetMapping("suggestion")
public List<String> getSuggestions(@RequestParam("key") String prefix) {
    
    
    return hotelService.getSuggestions(prefix);
}

2) Add the method in cn.itcast.hotel.servicethe package :IhotelService

List<String> getSuggestions(String prefix);

3) cn.itcast.hotel.service.impl.HotelServiceImplement the method in:

@Override
public List<String> getSuggestions(String prefix) {
    
    
    try {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        request.source().suggest(new SuggestBuilder().addSuggestion(
            "suggestions",
            SuggestBuilders.completionSuggestion("suggestion")
            .prefix(prefix)
            .skipDuplicates(true)
            .size(10)
        ));
        // 3.发起请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析结果
        Suggest suggest = response.getSuggest();
        // 4.1.根据补全查询名称,获取补全结果
        CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
        // 4.2.获取options
        List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
        // 4.3.遍历
        List<String> list = new ArrayList<>(options.size());
        for (CompletionSuggestion.Entry.Option option : options) {
    
    
            String text = option.getText().toString();
            list.add(text);
        }
        return list;
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

search result:
insert image description here

3. Data synchronization

The hotel data in elasticsearch comes from the mysql database, so when the mysql data changes, the elasticsearch must also change accordingly. This is the data synchronization between elasticsearch and mysql .
insert image description here

3.1. Thinking analysis

There are three common data synchronization schemes:

  • synchronous call
  • asynchronous notification
  • monitor binlog

3.1.1. Synchronous call

Solution 1: Synchronous call
insert image description here

The basic steps are as follows:

  • hotel-demo provides an interface to modify the data in elasticsearch
  • After the hotel management service completes the database operation, it directly calls the interface provided by hotel-demo,

3.1.2. Asynchronous notification

Solution 2: Asynchronous notification
insert image description here

The process is as follows:

  • Hotel-admin sends MQ message after adding, deleting and modifying mysql database data
  • Hotel-demo listens to MQ and completes elasticsearch data modification after receiving the message

3.1.3. Monitor binlog

Solution 3:
insert image description here
The process of monitoring binlog is as follows:

  • Enable the binlog function for mysql
  • The addition, deletion, and modification operations of mysql will be recorded in the binlog
  • Hotel-demo monitors binlog changes based on canal, and updates the content in elasticsearch in real time

3.1.4. Selection

Method 1: Synchronous call

  • Advantages: simple to implement, rough
  • Disadvantages: high degree of business coupling

Method 2: Asynchronous notification

  • Advantages: low coupling, generally difficult to implement
  • Disadvantages: rely on the reliability of mq

Method 3: Monitor binlog

  • Advantages: Complete decoupling between services
  • Disadvantages: Enabling binlog increases database burden and high implementation complexity

3.2. Realize data synchronization

3.2.1. Ideas

Use the hotel-admin project provided by the pre-class materials as a microservice for hotel management. When the hotel data is added, deleted, or modified, the same operation is required for the data in elasticsearch.
step:

  • Import the hotel-admin project provided by the pre-course materials, start and test the CRUD of hotel data
  • Declare exchange, queue, RoutingKey
  • Complete message sending in the add, delete, and change business in hotel-admin
  • Complete message monitoring in hotel-demo and update data in elasticsearch
  • Start and test the data sync function

3.2.2. Import demo

Import the hotel-admin project provided by the pre-course materials:
insert image description here

After running, visit http://localhost:8099
insert image description here
, which contains the CRUD function of the hotel:
insert image description here### 3.2.3. Declare the switch and queue
MQ structure as shown in the figure:
insert image description here
start mq

docker start mq

insert image description here

1) Introduce dependencies

Introduce the dependency of rabbitmq in hotel-admin and hotel-demo:

<!--amqp-->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-amqp</artifactId>
</dependency>

2) Declare the queue switch name

cn.itcast.hotel.constatntsCreate a new class under the packages in hotel-admin and hotel-demo MqConstants:

package cn.itcast.hotel.constatnts;

    public class MqConstants {
    
    
    /**
     * 交换机
     */
    public final static String HOTEL_EXCHANGE = "hotel.topic";
    /**
     * 监听新增和修改的队列
     */
    public final static String HOTEL_INSERT_QUEUE = "hotel.insert.queue";
    /**
     * 监听删除的队列
     */
    public final static String HOTEL_DELETE_QUEUE = "hotel.delete.queue";
    /**
     * 新增或修改的RoutingKey
     */
    public final static String HOTEL_INSERT_KEY = "hotel.insert";
    /**
     * 删除的RoutingKey
     */
    public final static String HOTEL_DELETE_KEY = "hotel.delete";
}

3) Declare a queue switch

Define configuration classes in hotel-demo and hotel-admin respectively, and declare queues and switches:

package cn.itcast.hotel.config;

import cn.itcast.hotel.constants.MqConstants;
import org.springframework.amqp.core.Binding;
import org.springframework.amqp.core.BindingBuilder;
import org.springframework.amqp.core.Queue;
import org.springframework.amqp.core.TopicExchange;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class MqConfig {
    
    
    @Bean
    public TopicExchange topicExchange(){
    
    
        return new TopicExchange(MqConstants.HOTEL_EXCHANGE, true, false);
    }

    @Bean
    public Queue insertQueue(){
    
    
        return new Queue(MqConstants.HOTEL_INSERT_QUEUE, true);
    }

    @Bean
    public Queue deleteQueue(){
    
    
        return new Queue(MqConstants.HOTEL_DELETE_QUEUE, true);
    }

    @Bean
    public Binding insertQueueBinding(){
    
    
        return BindingBuilder.bind(insertQueue()).to(topicExchange()).with(MqConstants.HOTEL_INSERT_KEY);
    }

    @Bean
    public Binding deleteQueueBinding(){
    
    
        return BindingBuilder.bind(deleteQueue()).to(topicExchange()).with(MqConstants.HOTEL_DELETE_KEY);
    }
}

3.2.4. Send MQ message

Send MQ messages separately in the add, delete, and modify services in hotel-admin:
insert image description here
the code is as follows:

@PostMapping
    public void saveHotel(@RequestBody Hotel hotel) {
    
    
        hotelService.save(hotel);
        rabbitTemplate.convertAndSend(MqConstant.HOTEL_EXCHANGE, MqConstant.HOTEL_INSERT_KEY, hotel.getId());
    }

    @PutMapping()
    public void updateById(@RequestBody Hotel hotel) {
    
    
        if (hotel.getId() == null) {
    
    
            throw new InvalidParameterException("id不能为空");
        }
        hotelService.updateById(hotel);
        rabbitTemplate.convertAndSend(MqConstant.HOTEL_EXCHANGE, MqConstant.HOTEL_INSERT_KEY, hotel.getId());
    }

    @DeleteMapping("/{id}")
    public void deleteById(@PathVariable("id") Long id) {
    
    
        hotelService.removeById(id);
        rabbitTemplate.convertAndSend(MqConstant.HOTEL_EXCHANGE, MqConstant.HOTEL_DELETE_KEY, id);
    }

3.2.5. Receive MQ message

Things to do when hotel-demo receives MQ messages include:

  • New message: Query hotel information according to the passed hotel id, and then add a piece of data to the index library
  • Delete message: Delete a piece of data in the index library according to the passed hotel id

1) First, add new and delete services under cn.itcast.hotel.servicethe package of hotel-demoIHotelService

void deleteById(Long id);
void insertById(Long id);

2) cn.itcast.hotel.service.implImplement business in HotelService under the package in hotel-demo:

@Override
public void deleteById(Long id) {
    
    
    try {
    
    
        // 1.准备Request
        DeleteRequest request = new DeleteRequest("hotel", id.toString());
        // 2.发送请求
        client.delete(request, RequestOptions.DEFAULT);
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

@Override
public void insertById(Long id) {
    
    
    try {
    
    
        // 0.根据id查询酒店数据
        Hotel hotel = getById(id);
        // 转换为文档类型
        HotelDoc hotelDoc = new HotelDoc(hotel);

        // 1.准备Request对象
        IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
        // 2.准备Json文档
        request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
        // 3.发送请求
        client.index(request, RequestOptions.DEFAULT);
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

3) Write a listener

Add a new class to the package in hotel-demo cn.itcast.hotel.mq:

package cn.itcast.hotel.mq;

import cn.itcast.hotel.constants.MqConstants;
import cn.itcast.hotel.service.IHotelService;
import org.springframework.amqp.rabbit.annotation.RabbitListener;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

@Component
public class HotelListener {
    
    

    @Autowired
    private IHotelService hotelService;

    /**
     * 监听酒店新增或修改的业务
     * @param id 酒店id
     */
    @RabbitListener(queues = MqConstants.HOTEL_INSERT_QUEUE)
    public void listenHotelInsertOrUpdate(Long id){
    
    
        hotelService.insertById(id);
    }

    /**
     * 监听酒店删除的业务
     * @param id 酒店id
     */
    @RabbitListener(queues = MqConstants.HOTEL_DELETE_QUEUE)
    public void listenHotelDelete(Long id){
    
    
        hotelService.deleteById(id);
    }
}

Start SpringBoot first, check the mq client, and you can see the switch
insert image description here


insert image description here
You can see that the binding relationship of the queue is as follows:
insert image description here
Let's test the function:
download the browser plug-in of Vue.js, click Expand
insert image description here
to add a new extension

insert image description here
Search for Vue, download Vue.js Devtools
insert image description here
first check the hotel id
insert image description here
, then we go to the hotel management to change the price to 334

insert image description here
Then we went to the management page of MQ to see if there was any message sent, and found that there was indeed 1 message
insert image description here

Look at the hotel query page, the modification is indeed successful
insert image description here

Test the deletion, let’s delete the Shanghai Hilton Hotel, first copy the information of Vue,
insert image description here
and then go to the hotel management to delete the Hilton Hotel.
insert image description here
After deletion, we check the MQ message interface and find that there is a new deleted message.
insert image description here
We search the hotel and find Hilton It is indeed gone, (originally 13 items)
insert image description here
and then we will add it back to Hilton, check it out,
insert image description here
copy the pasted value just now and
insert image description here
add it successfully
insert image description here

After the addition, we checked the management of MQ and found that 1 new message was added.
insert image description here
Finally, we checked the hotel search and the addition was successful.
insert image description here

4. Cluster

Stand-alone elasticsearch for data storage will inevitably face two problems: massive data storage and single point of failure.

  • Massive data storage problem: Logically split the index library into N shards (shards) and store them in multiple nodes
  • Single point of failure problem: back up fragmented data on different nodes (replica)

ES cluster related concepts :

  • Cluster (cluster): A group of nodes with a common cluster name.

  • Node (node) : an Elasticearch instance in the cluster

  • Shard : Indexes can be split into different parts for storage, called shards. In a cluster environment, different shards of an index can be split into different nodes

    Solve the problem: the amount of data is too large and the storage capacity of a single point is limited.
    insert image description here

    Here, we divide the data into 3 pieces: shard0, shard1, shard2

  • Primary shard (Primary shard): relative to the definition of replica shards.

  • Replica shard (Replica shard) Each primary shard can have one or more copies, and the data is the same as the primary shard.

Data backup can ensure high availability, but if each shard is backed up, the number of nodes required will double, and the cost is too high!

In order to find a balance between high availability and cost, we can do this:

  • First shard the data and store it in different nodes
  • Then back up each shard and put it on the other node to complete mutual backup

This can greatly reduce the number of required service nodes. As shown in the figure, we take 3 shards and each shard as a backup copy as an example:
insert image description here

Now, each shard has 1 backup, stored on 3 nodes:

  • node0: holds shards 0 and 1
  • node1: holds shards 0 and 2
  • node2: saved shards 1 and 2

4.1. Building an ES cluster

Refer to the documentation of the pre-class materials:
insert image description here
the fourth chapter of it:
insert image description here

1. Deploy the es cluster

We will use the docker container to run multiple es instances on a single machine to simulate the es cluster. However, in the production environment, it is recommended that you only deploy one es instance on each service node.

Deploying an es cluster can be done directly using docker-compose, but this requires your Linux virtual machineAt least 4G of memory space

1.1. Create es cluster

First write a docker-compose file with the following content:

version: '2.2'
services:
  es01:
    image: elasticsearch:7.12.1
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic
  es02:
    image: elasticsearch:7.12.1
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - data02:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
    networks:
      - elastic
  es03:
    image: elasticsearch:7.12.1
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - data03:/usr/share/elasticsearch/data
    networks:
      - elastic
    ports:
      - 9202:9200
volumes:
  data01:
    driver: local
  data02:
    driver: local
  data03:
    driver: local

networks:
  elastic:
    driver: bridge

es operation needs to modify some linux system permissions, modify /etc/sysctl.conffiles

vi /etc/sysctl.conf

Add the following content:

vm.max_map_count=262144

insert image description here

Then execute the command to make the configuration take effect:

sysctl -p

insert image description here

Start the cluster via docker-compose:

docker-compose up -d

insert image description here
View container status

docker ps

insert image description here

1.2. Cluster status monitoring

Kibana can monitor es clusters, but the new version needs to rely on the x-pack function of es, and the configuration is more complicated.

It is recommended to use cerebro to monitor the status of es cluster, official website: https://github.com/lmenezes/cerebro

The pre-class materials have provided the installation package:
insert image description here
it can be used after decompression, which is very convenient.
The decompressed directory is as follows:
insert image description here
Enter the corresponding bin directory:
insert image description here

Double-click the cerebro.bat file to start the service.
insert image description here
Visit http://localhost:9000 to enter the management interface:
insert image description here

Enter the address and port of any node of your elasticsearch, and click connect:
insert image description here

A green bar indicates that the cluster is green (healthy).
insert image description here

1.3. Create an index library

1) Use kibana's DevTools to create an index library

Enter the command in DevTools:

PUT /itcast
{
    
    
  "settings": {
    
    
    "number_of_shards": 3, // 分片数量
    "number_of_replicas": 1 // 副本数量
  },
  "mappings": {
    
    
    "properties": {
    
    
      // mapping映射定义 ...
    }
  }
}
2) Use cerebro to create an index library

You can also create an index library with cerebro:
insert image description here

Fill in the index library information:
insert image description here

Click the create button in the lower right corner:

insert image description here

1.4. View fragmentation effect

Go back to the home page, and you can view the fragmentation effect of the index library:

insert image description here

4.2. Cluster split-brain problem

4.2.1. Division of Cluster Responsibilities

Cluster nodes in elasticsearch have different responsibilities:
insert image description here

By default , any node in the cluster has the above four roles at the same time .
But a real cluster must separate cluster responsibilities:

  • master node: high CPU requirements, but memory requirements
  • data node: high requirements for CPU and memory
  • Coordinating nodes: High requirements for network bandwidth and CPU
    Separation of duties allows us to allocate different hardware for deployment according to the needs of different nodes. And avoid mutual interference between services.

A typical es cluster responsibility division is shown in the figure:
insert image description here

4.2.2. Split brain problem

A split-brain is caused by the disconnection of nodes in the cluster.

For example, in a cluster, the master node loses connection with other nodes:
insert image description here
at this time, node2 and node3 think that node1 is down, and they will re-elect the master:
insert image description here

After node3 is elected, the cluster continues to provide external services. Node2 and node3 form a cluster, and node1 forms a cluster. The data of the two clusters is not synchronized, and data differences occur.

When the network is restored, because there are two master nodes in the cluster, the status of the cluster is inconsistent, and a split-brain situation occurs:
insert image description here

The solution to split-brain is to require votes to exceed (number of eligible nodes + 1)/2 to be elected as the master, so the number of eligible nodes should preferably be an odd number. The corresponding configuration item is discovery.zen.minimum_master_nodes, which has become the default configuration after es7.0, so the problem of split brain generally does not occur

For example: for a cluster formed by 3 nodes, the votes must exceed (3 + 1) / 2, which is 2 votes. node3 gets the votes of node2 and node3, and is elected as the master. node1 has only 1 vote for itself and was not elected. There is still only one master node in the cluster, and there is no split brain.

4.2.3. Summary

What is the role of the master eligible node?

  • Participate in group election

  • The master node can manage the cluster state, manage sharding information, and process requests to create and delete index libraries.
    What is the role of the data node?

  • CRUD of data

What is the role of the coordinator node?

  • Route requests to other nodes

  • Combine the query results and return them to the user

4.3. Cluster distributed storage

When a new document is added, it should be saved in different shards to ensure data balance, so how does the coordinating node determine which shard the data should be stored in?

4.3.1. Shard storage test

Insert three pieces of data:
insert image description here

insert image description here
insert image description here

You can see from the test that the three pieces of data are in different shards:
insert image description here

result:
insert image description here

4.3.2. Shard storage principle

Elasticsearch will use the hash algorithm to calculate which shard the document should be stored in:
insert image description here

illustrate:

  • _routing defaults to the id of the document
  • The algorithm is related to the number of shards, so once the index library is created, the number of shards cannot be modified!

The process of adding new documents is as follows:
insert image description here

Interpretation:

  • 1) Add a document with id=1
  • 2) Do a hash operation on the id, if the result is 2, it should be stored in shard-2
  • 3) The primary shard of shard-2 is on node3, and the data is routed to node3
  • 4) Save the document
  • 5) Synchronize to replica-2 of shard-2, on the node2 node
  • 6) Return the result to the coordinating-node node

4.4. Cluster distributed query

The elasticsearch query is divided into two stages:

  • scatter phase: In the scatter phase, the coordinating node will distribute the request to each shard

  • gather phase: the gathering phase, the coordinating node summarizes the search results of the data node, and processes it as the final result set and returns it to the user

insert image description here

4.5. Cluster failover

The master node of the cluster will monitor the status of the nodes in the cluster. If a node is found to be down, it will immediately migrate the fragmented data of the down node to other nodes to ensure data security. This is called failover.

1) For example, a cluster structure is shown in the figure:
insert image description here
now, node1 is the master node, and the other two nodes are slave nodes.

2) Suddenly, node1 fails:
insert image description here
the first thing after the downtime is to re-elect the master. For example, node2 is selected:
insert image description here
After node2 becomes the master node, it will detect the cluster monitoring status and find that: shard-1 and shard-0 are not replica node. Therefore, the data on node1 needs to be migrated to node2 and node3:
insert image description here

Guess you like

Origin blog.csdn.net/sinat_38316216/article/details/129745012