springCloud learning [6] distributed search engine (3)

Article Directory

foreword

  • This article is learned from a dark horse, and after careful study, it is sorted out and summarized, not copied! !
  • Regarding the actual operation part, it is recommended that learners do more hands-on and analyze carefully! ! !
  • This part has slightly higher requirements on the computer, please pay attention to upgrade the computer configuration! ! !

- data aggregation

insert image description here

  • Aggregations can realize the statistics, analysis and operation of document data.
  • There are three common types of aggregation:
    • Bucket aggregation: used to group documents
      • TermAggregation: group by document field value
      • Date Histogram: Group by date ladder, for example, a week as a group, or a month as a group
  • Metric aggregation: used to calculate some values, such as: maximum value, minimum value, average value, etc.
    • Avg: Average
    • Max: find the maximum value
    • Min: Find the minimum value
    • Stats: Simultaneously seek max, min, avg, sum, etc.
  • Pipeline (pipeline) aggregation: aggregation based on the results of other aggregations
  • **Note:** The fields participating in the aggregation must be keyword, date, value, and Boolean

1.1 DSL realizes aggregation

1.1.1 Bucket aggregation syntax

GET /hotel/_search
{
    
    
 "size": 0,  // 设置size为0,结果中不包含文档,只包含聚合结果
 "aggs": {
    
     // 定义聚合
   "brandAgg": {
    
     //给聚合起个名字
     "terms": {
    
     // 聚合的类型,按照品牌值聚合,所以选择term
       "field": "brand", // 参与聚合的字段
       "size": 20 // 希望获取的聚合结果数量
     }
   }
 }
}
  • Example:
    #桶排序
    GET /hotel/_search
    {
          
          
      "size": 0,
      "aggs": {
          
          
        "brandAgg": {
          
          
          "terms": {
          
          
            "field": "brand",
            "size": 20
          }
        }
      }
    }
    
  • result:
    insert image description here

1.1.2 Sorting aggregation results

  • By default, Bucket aggregation will count the number of documents in the Bucket, record it as _count, and sort in descending order of _count.
  • Specify the order attribute to customize the sorting method of aggregation
  • Demo:
    #自定义聚合的排序方式
    # 按照_count升序排列
    GET /hotel/_search
    {
          
          
      "size": 0,
      "aggs": {
          
          
        "brandAgg": {
          
          
          "terms": {
          
          
            "field": "brand",
            "order": {
          
          
              "_count": "asc"
            },
            "size": 20
          }
        }
      }
    }
    
    • result:
      insert image description here

1.1.3 Limit aggregation scope

  • By default, Bucket aggregation aggregates all documents in the index library, but in real scenarios, users will enter search conditions, so the aggregation must be the aggregation of search results. Then the aggregation has to be qualified.
  • To limit the range of documents to be aggregated, just add query conditions:
  • Demo:
    # 限定聚合范围
    # 只对200元以下的文档聚合
    GET /hotel/_search
    {
          
          
      "query": {
          
          
        "range": {
          
          
          "price": {
          
          
            "lte": 200 
          }
        }
      },
      "size": 0,
      "aggs": {
          
          
        "brandAgg": {
          
          
          "terms": {
          
          
            "field": "brand",
            "size": 20
          }
        }
      }
    }
    
  • result:
    insert image description here

1.2 Metric aggregation syntax

  • Metric aggregation: used to calculate some values, such as: maximum value, minimum value, average value, etc.
    • Avg: Average
    • Max: find the maximum value
    • Min: Find the minimum value
    • Stats: Simultaneously seek max, min, avg, sum, etc.
  • Demo:
GET /hotel/_search
{
    
    
  "size": 0, 
  "aggs": {
    
    
    "brandAgg": {
    
     
      "terms": {
    
     
        "field": "brand", 
        "size": 20
      },
      "aggs": {
    
     // 是brands聚合的子聚合,也就是分组后对每组分别计算
        "score_stats": {
    
     // 聚合名称
          "stats": {
    
     // 聚合类型,这里stats可以计算min、max、avg等
            "field": "score" // 聚合字段,这里是score
          }
        }
      }
    }
  }
}
  • result:
    insert image description here

1.3 Summary

  • aggs stands for aggregation, which is at the same level as query. What is the function of query at this time?

    • Scope the aggregated documents
  • The three elements necessary for aggregation:

    • aggregate name
    • aggregation type
    • aggregate field
  • Aggregate configurable properties are:

    • size: specify the number of aggregation results
    • order: specify the sorting method of aggregation results
    • field: specify the aggregation field

1.4 RestAPI realizes aggregation

1.5 API Syntax

  • Aggregation conditions are at the same level as query conditions, so request.source() needs to be used to specify aggregation conditions.
  • Syntax for aggregate conditions:
    insert image description here
  • The aggregation result is also different from the query result, and the API is also special. However, JSON is also parsed layer by layer:
    insert image description here

1.7 case

  • Requirement: The brand, city and other information of the search page should not be hard-coded on the page, but obtained by aggregated hotel data in the index library:
    insert image description here
  • analyze:
    • Use the aggregation function and Bucket aggregation to group the documents in the search results based on brand and city, and know the included brands and cities.
    • Because it is an aggregation of search results, the aggregation is a limited-range aggregation , that is to say, the limiting conditions of the aggregation are consistent with the conditions of the search document.
  • The return value type is the final result to be displayed on the page:
    insert image description here
  • The result is a Map structure:
    • key is a string, city, star, brand, price
    • value is a collection, such as the names of multiple cities

  • Implemented important logic code

  • Add a method in Controllerthe following requirements:

    • Request method:POST
    • Request path:/hotel/filters
    • Request parameters: RequestParams, consistent with the parameters of the search document
    • Return value type:Map<String, List<String>>
     @PostMapping("filters")
        public Map<String, List<String>> getFilters(@RequestBody RequestParams params){
          
          
            return hotelService.filters(params);
        }
    
  • ServiceDefine the new method in :

    Map<String, List<String>> filters(RequestParams params);
    
  • Implement this method in HotelServicethe implementation class of

    @Override
    public Map<String, List<String>> filters(RequestParams params) {
          
          
        try {
          
          
            // 1.准备Request
            SearchRequest request = new SearchRequest("hotel");
            // 2.准备DSL
            // 2.1.query
            buildBasicQuery(params, request);
            // 2.2.设置size
            request.source().size(0);
            // 2.3.聚合
            buildAggregation(request);
            // 3.发出请求
            SearchResponse response = client.search(request, RequestOptions.DEFAULT);
            // 4.解析结果
            Map<String, List<String>> result = new HashMap<>();
            Aggregations aggregations = response.getAggregations();
            // 4.1.根据品牌名称,获取品牌结果
            List<String> brandList = getAggByName(aggregations, "brandAgg");
            result.put("品牌", brandList);
            // 4.2.根据品牌名称,获取品牌结果
            List<String> cityList = getAggByName(aggregations, "cityAgg");
            result.put("城市", cityList);
            // 4.3.根据品牌名称,获取品牌结果
            List<String> starList = getAggByName(aggregations, "starAgg");
            result.put("星级", starList);
    
            return result;
        } catch (IOException e) {
          
          
            throw new RuntimeException(e);
        }
    }
    
    private void buildAggregation(SearchRequest request) {
          
          
        request.source().aggregation(AggregationBuilders
                                     .terms("brandAgg")
                                     .field("brand")
                                     .size(100)
                                    );
        request.source().aggregation(AggregationBuilders
                                     .terms("cityAgg")
                                     .field("city")
                                     .size(100)
                                    );
        request.source().aggregation(AggregationBuilders
                                     .terms("starAgg")
                                     .field("starName")
                                     .size(100)
                                    );
    }
    
    private List<String> getAggByName(Aggregations aggregations, String aggName) {
          
          
        // 4.1.根据聚合名称获取聚合结果
        Terms brandTerms = aggregations.get(aggName);
        // 4.2.获取buckets
        List<? extends Terms.Bucket> buckets = brandTerms.getBuckets();
        // 4.3.遍历
        List<String> brandList = new ArrayList<>();
        for (Terms.Bucket bucket : buckets) {
          
          
            // 4.4.获取key
            String key = bucket.getKeyAsString();
            brandList.add(key);
        }
        return brandList;
    }
    
  • Notice:

    • In the case part, the code does not necessarily need to be customized manually one by one, but you need to operate it yourself to verify the final result! ! !
    • At the same time, work hard to deal with the problems you encounter! ! !

Two auto-completion

  • The effect is as shown in the figure:
    insert image description here
    • According to the letters entered by the user, the function of prompting complete entries is automatic completion
    • Because it needs to be inferred based on the pinyin letters, the pinyin word segmentation function is used

2.1 Installation of Pinyin word breaker

  • To achieve completion based on letters, it is necessary to segment the document according to pinyin. There happens to be a pinyin word segmentation plugin for elasticsearch on GitHub. address
  1. download and unzip
    insert image description here

  2. Upload to the virtual machine, the plugin directory of elasticsearch

    • To install the plug-in, you need to know the location of the plugins directory of elasticsearch, and we use the data volume mount, so we need to view the data volume directory of elasticsearch, and check it by the following command:
    docker volume inspect es-plugins
    

    insert image description here

    • Then upload the decompressed file to this directory, pysuch as the pinyin word breaker after decompression and renaming
      insert image description here
  3. restart elasticsearch

    docker restart es
    
  4. test

    POST /_analyze
    {
          
          
      "text": "如家酒店还不错",
      "analyzer": "pinyin"
    }
    

    insert image description here

2.2 Custom tokenizer

  • The default pinyin word breaker divides each Chinese character into pinyin, but we want each entry to form a set of pinyin, so we need to customize the pinyin word breaker to form a custom word breaker.

  • The composition of the analyzer in elasticsearch consists of three parts:

    • character filters : process the text before the tokenizer. e.g. delete characters, replace characters
    • tokenizer : Cut the text into terms according to certain rules. For example, keyword is not participle; there is also ik_smart
    • tokenizer filter : further process the entries output by the tokenizer. For example, case conversion, synonyms processing, pinyin processing, etc.
  • When document word segmentation, the document will be processed by these three parts in turn:
    insert image description here

  • demo

#拼音分词器
DELETE /test

PUT /test
{
    
    
  "settings": {
    
    
    "analysis": {
    
    
      "analyzer": {
    
      
        "my_analyzer": {
    
     
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
    
    
        "py": {
    
     
          "type": "pinyin", 
		      "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    
    
    "properties": {
    
    
      "name": {
    
    
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_smart"
      }
    }
  }
}


POST /test/_analyze
{
    
    
  "text": ["如家酒店还不错"],
  "analyzer": "my_analyzer"
}
  • result:
    insert image description here
  • Precautions for pinyin word breaker
    • In order to avoid searching for homophones, do not use the pinyin word breaker when searching

2.3 Autocomplete query

  • Elasticsearch provides Completion Suggester query to achieve automatic completion. This query will match terms beginning with the user input and return them. In order to improve the efficiency of the completion query, there are some constraints on the types of fields in the document:

    • The fields participating in the completion query must be of completion type.
    • The content of the field is generally an array formed by multiple entries for completion.
  • Demo:

# 自动补全查询
DELETE /test02
## 创建索引库
PUT /test02
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "title": {
    
    
        "type": "completion"
      }
    }
  }
}
## 示例数据
POST test02/_doc
{
    
    
  "title": ["Sony", "WH-1000XM3"]
}
POST test02/_doc
{
    
    
  "title": ["SK-II", "PITERA"]
}
POST test02/_doc
{
    
    
  "title": ["Nintendo", "switch"]
}
## 自动补全查询
GET /test02/_search
{
    
    
  "suggest": {
    
    
    "title_suggest": {
    
    
      "text": "s", # 关键字
      "completion": {
    
    
        "field": "title", #补全查询的字段
        "skip_duplicates": true, #跳过重复的
        "size": 10 #获取前10条结果
      }
    }
  }
}

2.4 Java API for auto-completion query

  • First look at the API constructed by the request parameter:
    insert image description here
  • Let's look at the result analysis:
    insert image description here
  • Important code:
    @Override
    public List<String> getSuggestions(String prefix) {
          
          
        try {
          
          
            // 1.准备Request
            SearchRequest request = new SearchRequest("hotel");
            // 2.准备DSL
            request.source().suggest(new SuggestBuilder().addSuggestion(
                "suggestions",
                SuggestBuilders.completionSuggestion("suggestion")
                .prefix(prefix)
                .skipDuplicates(true)
                .size(10)
            ));
            // 3.发起请求
            SearchResponse response = client.search(request, RequestOptions.DEFAULT);
            // 4.解析结果
            Suggest suggest = response.getSuggest();
            // 4.1.根据补全查询名称,获取补全结果
            CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
            // 4.2.获取options
            List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
            // 4.3.遍历
            List<String> list = new ArrayList<>(options.size());
            for (CompletionSuggestion.Entry.Option option : options) {
          
          
                String text = option.getText().toString();
                list.add(text);
            }
            return list;
        } catch (IOException e) {
          
          
            throw new RuntimeException(e);
        }
    }
    

Three data synchronization

  • Introduce:
    insert image description here

3.1 Thought Analysis

  • There are three common data synchronization schemes:
    • Solution 1: Synchronous call
    • Solution 2: Asynchronous notification
    • Solution 3: Monitor binlog

3.2 Solution 1: Synchronous call

insert image description here

  • The basic steps are as follows:
    • hotel-demo provides an interface to modify the data in elasticsearch
    • After the hotel management service completes the database operation, it directly calls the interface provided by hotel-demo

3.3 Solution 2: Asynchronous notification

insert image description here

  • The process is as follows:
    • Hotel-admin sends MQ message after adding, deleting and modifying mysql database data
    • Hotel-demo listens to MQ and completes elasticsearch data modification after receiving the message

3.4 Monitor binlog

insert image description here

  • The process is as follows:
    • Enable the binlog function for mysql
    • The addition, deletion, and modification operations of mysql will be recorded in the binlog
    • Hotel-demo monitors binlog changes based on canal, and updates the content in elasticsearch in real time

3.5 Comparison and summary of three schemes

plan synchronous call asynchronous call monitor binlog
advantage Realize simple, crude Low coupling, generally difficult to implement Complete decoupling between services
shortcoming High degree of business coupling Rely on the reliability of mq Enabling binlog increases the burden on the database and makes the implementation more complex

3.6 Summary of Data Synchronization Cases

  • When the hotel data is added, deleted, or modified, the same operation is required for the data in elasticsearch.

  • step:

    • Import the hotel-admin project provided by the data, start and test the CRUD of hotel data
    • Complete message sending in the add, delete, and change business in hotel-admin
    • Use annotations to declare exchange, queue, and RoutingKey in hotel-demo, complete message monitoring, and update data in elasticsearch
    • Start and test the data sync function
  • The MQ structure is shown in the figure:
    insert image description here

  • rely:

    <!--amqp-->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-amqp</artifactId>
    </dependency>
    
  • Declare the queue exchange name

    public class MqConstants {
          
          
        /**
         * 交换机
         */
        public final static String HOTEL_EXCHANGE = "hotel.topic";
        /**
         * 监听新增和修改的队列
         */
        public final static String HOTEL_INSERT_QUEUE = "hotel.insert.queue";
        /**
         * 监听删除的队列
         */
        public final static String HOTEL_DELETE_QUEUE = "hotel.delete.queue";
        /**
         * 新增或修改的RoutingKey
         */
        public final static String HOTEL_INSERT_KEY = "hotel.insert";
        /**
         * 删除的RoutingKey
         */
        public final static String HOTEL_DELETE_KEY = "hotel.delete";
    }
    
  • Send MQ message

     @PostMapping
        public void saveHotel(@RequestBody Hotel hotel){
          
          
            hotelService.save(hotel);
            rabbitTemplate.convertAndSend(MqConstants.HOTEL_EXCHANGE,MqConstants.HOTEL_INSERT_KEY,hotel.getId());
        }
    
        @PutMapping()
        public void updateById(@RequestBody Hotel hotel){
          
          
            if (hotel.getId() == null) {
          
          
                throw new InvalidParameterException("id不能为空");
            }
            hotelService.updateById(hotel);
    
            rabbitTemplate.convertAndSend(MqConstants.HOTEL_EXCHANGE,MqConstants.HOTEL_INSERT_KEY,hotel.getId());
        }
    
        @DeleteMapping("/{id}")
        public void deleteById(@PathVariable("id") Long id) {
          
          
            hotelService.removeById(id);
            rabbitTemplate.convertAndSend(MqConstants.HOTEL_EXCHANGE,MqConstants.HOTEL_DELETE_KEY,id);
        }
    
  • Receive MQ message

    • Things to do when hotel-demo receives MQ messages include:
      • New message: Query hotel information according to the passed hotel id, and then add a piece of data to the index library
    • Delete message: Delete a piece of data in the index library according to the passed hotel id
    • serviceDefine new and deleted services in
    void deleteById(Long id);
    
    void insertById(Long id);
    
    • In its implementation class, implement the business
    @Override
    public void deleteById(Long id) {
          
          
        try {
          
          
            // 1.准备Request
            DeleteRequest request = new DeleteRequest("hotel", id.toString());
            // 2.发送请求
            client.delete(request, RequestOptions.DEFAULT);
        } catch (IOException e) {
          
          
            throw new RuntimeException(e);
        }
    }
    
    @Override
    public void insertById(Long id) {
          
          
        try {
          
          
            // 0.根据id查询酒店数据
            Hotel hotel = getById(id);
            // 转换为文档类型
            HotelDoc hotelDoc = new HotelDoc(hotel);
    
            // 1.准备Request对象
            IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
            // 2.准备Json文档
            request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
            // 3.发送请求
            client.index(request, RequestOptions.DEFAULT);
        } catch (IOException e) {
          
          
            throw new RuntimeException(e);
        }
    }
    
  • Write a listener

    import cn.itcast.hotel.constants.MqConstants;
    import cn.itcast.hotel.service.IHotelService;
    import org.springframework.amqp.core.ExchangeTypes;
    import org.springframework.amqp.rabbit.annotation.Exchange;
    import org.springframework.amqp.rabbit.annotation.Queue;
    import org.springframework.amqp.rabbit.annotation.QueueBinding;
    import org.springframework.amqp.rabbit.annotation.RabbitListener;
    import org.springframework.beans.factory.annotation.Autowired;
    import org.springframework.stereotype.Component;
    
    @Component
    public class HotelListener {
          
          
        @Autowired
        private IHotelService hotelService;
    
        /**
         * 监听酒店新增或修改的业务
         * @param id 酒店id
         *           queues =
         */
        @RabbitListener(bindings = @QueueBinding(value = @Queue(name =MqConstants.HOTEL_INSERT_QUEUE),
                        exchange = @Exchange(name = MqConstants.HOTEL_EXCHANGE,type = ExchangeTypes.TOPIC, autoDelete="false",durable = "true"),
                        key = {
          
          MqConstants.HOTEL_INSERT_KEY}
                        )
        )
        public void listenHotelInsertOrUpdate(Long id){
          
          
            hotelService.insertById(id);
        }
    
        /**
         * 监听酒店删除的业务
         * @param id 酒店id
         */
        @RabbitListener(bindings = @QueueBinding(value = @Queue(name =MqConstants.HOTEL_DELETE_QUEUE),
                exchange = @Exchange(name = MqConstants.HOTEL_EXCHANGE,type = ExchangeTypes.TOPIC, autoDelete="false",durable = "true"),
                key = {
          
          MqConstants.HOTEL_DELETE_KEY}
        )
        )
        public void listenHotelDelete(Long id){
          
          
            hotelService.deleteById(id);
        }
    }
    

3.7 Test of data synchronization case

  • In rabbitMq, you can see that the queue registration is complete
    insert image description here
  • Confirm the normal operation of the project through the binding of the switch
    insert image description here

  • revised 上海希尔顿酒店price
    insert image description here
  • The ID of the document viewed by vue devtoolsthe plug-in tool上海希尔顿酒店
    insert image description here
  • Then, edit its price
    insert image description here
    insert image description here
  • View, the message record of the queue
    insert image description here
  • check prices
    insert image description here

3.8 Supplement: Installation of vue Devtools plugin

  • I do not recommend everyone, download the source code yourself, compile and install it manually! ! ! You will encounter many mistakes, and your efforts are thankless! ! !

3.8.1 Edge browser installation method

  • In the edge extension store, search and install
    • The current version is stable version 6.5.0
      insert image description here

3.8.2 How to install chrome browser

  1. Download Vue Devtools and unzip it
  2. Turn on the developer mode of chrome: Settings->Extensions->Developer mode
    insert image description here
  3. Drag the files in the decompressed folder .crxto the extension program page of the chrome browser, and click Add extension.

3.9 Explanation of the vue devtools tool

  • Post-installation testing
    • This tool, only when the vue front-end page is run locally, the console will display the plug-in, and it will not appear on other web pages
    • So: the correct way to check whether the installation is successful is: start the local vue project, open the console to view vue devtools
      insert image description here
  • The value in the default configuration file of the plugin persistentis: true, so no modification is required

Four elasticsearch clusters

  • Stand-alone elasticsearch for data storage will inevitably face two problems: massive data storage and single point of failure.
    • Massive data storage problem: Logically split the index library into N shards (shards) and store them in multiple nodes
    • Single point of failure problem: back up fragmented data on different nodes (replica)
      insert image description here

4.1 ES cluster related concepts

  • Cluster (cluster): A group of nodes with a common cluster name.
  • Node (node) : an Elasticearch instance in the cluster
  • Shard : Indexes can be split into different parts for storage, called shards. In a cluster environment, different shards of an index can be split into different nodes
  • Solve the problem: the amount of data is too large and the storage capacity of a single point is limited.
    insert image description here
    • Primary shard (Primary shard): relative to the definition of replica shards.
    • Replica shard (Replica shard) Each primary shard can have one or more copies, and the data is the same as the primary shard.

  • In order to find a balance between high availability and cost, we can do this:
    • First shard the data and store it in different nodes
    • Then back up each shard and put it on the other node to complete mutual backup
      insert image description here
  • Now, each shard has 1 backup, stored on 3 nodes:
    • node0: holds shards 0 and 1
    • node1: holds shards 0 and 2
    • node2: saved shards 1 and 2

4.2 Build an ES cluster

  • We use docker containers to run multiple es instances on a single machine to simulate es clusters. However, in the production environment, it is recommended that you only deploy one es instance on each service node.

4.3.1 Create es cluster

  1. First write a docker-compose.yml file with the following content:
    version: '2.2'
    services:
      es01:
        image: elasticsearch:7.12.1
        container_name: es01
        environment:
          - node.name=es01
          - cluster.name=es-docker-cluster
          - discovery.seed_hosts=es02,es03
          - cluster.initial_master_nodes=es01,es02,es03
          - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        volumes:
          - data01:/usr/share/elasticsearch/data
        ports:
          - 9200:9200
        networks:
          - elastic
      es02:
        image: elasticsearch:7.12.1
        container_name: es02
        environment:
          - node.name=es02
          - cluster.name=es-docker-cluster
          - discovery.seed_hosts=es01,es03
          - cluster.initial_master_nodes=es01,es02,es03
          - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        volumes:
          - data02:/usr/share/elasticsearch/data
        ports:
          - 9201:9200
        networks:
          - elastic
      es03:
        image: elasticsearch:7.12.1
        container_name: es03
        environment:
          - node.name=es03
          - cluster.name=es-docker-cluster
          - discovery.seed_hosts=es01,es02
          - cluster.initial_master_nodes=es01,es02,es03
          - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        volumes:
          - data03:/usr/share/elasticsearch/data
        networks:
          - elastic
        ports:
          - 9202:9200
    volumes:
      data01:
        driver: local
      data02:
        driver: local
      data03:
        driver: local
    
    networks:
      elastic:
        driver: bridge
    
  2. es operation needs to modify some linux system permissions, modify /etc/sysctl.conffiles
    vi /etc/sysctl.conf
    
    insert image description here
  3. Then execute the command to make the configuration take effect:
    sysctl -p
    
  4. Start the cluster through docker-compose
    docker-compose up -d
    
    [root@kongyue tmp]# docker-compose up -d
    Starting es01 ... done
    Creating es03 ... done
    Creating es02 ... done
    

4.4 Cluster status monitoring

4.4.1 Win installation cerebro [not recommended]

  • Kibana can monitor es clusters, but the new version needs to rely on the x-pack function of es, and the configuration is more complicated.

  • It is recommended to use cerebro to monitor the status of the es cluster. The decompressed directory of the official website is as follows:
    insert image description here

  • Enter the corresponding bin directory:
    insert image description here

  • Double-click the cerebro.bat file to start the service.

4.4.2 Flashback problem [unresolved]

Oops, cannot start the server.
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: Unable to load cache item
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
        at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
        at com.google.inject.internal.FailableCache.get(FailableCache.java:54)
        at com.google.inject.internal.ConstructorInjectorStore.get(ConstructorInjectorStore.java:49)
        at com.google.inject.internal.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:155)
        at com.google.inject.internal.InjectorImpl.initializeBinding(InjectorImpl.java:592)
        at com.google.inject.internal.AbstractBindingProcessor$Processor.initializeBinding(AbstractBindingProcessor.java:173)
        at com.google.inject.internal.AbstractBindingProcessor$Processor.lambda$scheduleInitialization$0(AbstractBindingProcessor.java:160)
        at com.google.inject.internal.ProcessedBindingData.initializeBindings(ProcessedBindingData.java:49)
        at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:124)
        at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:108)
        at com.google.inject.Guice.createInjector(Guice.java:87)
        at com.google.inject.Guice.createInjector(Guice.java:78)
        at play.api.inject.guice.GuiceBuilder.injector(GuiceInjectorBuilder.scala:200)
        at play.api.inject.guice.GuiceApplicationBuilder.build(GuiceApplicationBuilder.scala:155)
        at play.api.inject.guice.GuiceApplicationLoader.load(GuiceApplicationLoader.scala:21)
        at play.core.server.ProdServerStart$.start(ProdServerStart.scala:54)
        at play.core.server.ProdServerStart$.main(ProdServerStart.scala:30)
        at play.core.server.ProdServerStart.main(ProdServerStart.scala)
Caused by: java.lang.IllegalStateException: Unable to load cache item
        at com.google.inject.internal.cglib.core.internal.$LoadingCache.createEntry(LoadingCache.java:79)
        at com.google.inject.internal.cglib.core.internal.$LoadingCache.get(LoadingCache.java:34)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator$ClassLoaderData.get(AbstractClassGenerator.java:119)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator.create(AbstractClassGenerator.java:294)
        at com.google.inject.internal.cglib.reflect.$FastClass$Generator.create(FastClass.java:65)
        at com.google.inject.internal.BytecodeGen.newFastClassForMember(BytecodeGen.java:258)
        at com.google.inject.internal.BytecodeGen.newFastClassForMember(BytecodeGen.java:207)
        at com.google.inject.internal.DefaultConstructionProxyFactory.create(DefaultConstructionProxyFactory.java:49)
        at com.google.inject.internal.ProxyFactory.create(ProxyFactory.java:156)
        at com.google.inject.internal.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:94)
        at com.google.inject.internal.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:30)
        at com.google.inject.internal.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:38)
        at com.google.inject.internal.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:34)
        at com.google.inject.internal.FailableCache$1.load(FailableCache.java:43)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
        ... 21 more
Caused by: java.lang.ExceptionInInitializerError
        at com.google.inject.internal.cglib.core.$DuplicatesPredicate.evaluate(DuplicatesPredicate.java:104)
        at com.google.inject.internal.cglib.core.$CollectionUtils.filter(CollectionUtils.java:52)
        at com.google.inject.internal.cglib.reflect.$FastClassEmitter.<init>(FastClassEmitter.java:69)
        at com.google.inject.internal.cglib.reflect.$FastClass$Generator.generateClass(FastClass.java:77)
        at com.google.inject.internal.cglib.core.$DefaultGeneratorStrategy.generate(DefaultGeneratorStrategy.java:25)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator.generate(AbstractClassGenerator.java:332)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator$ClassLoaderData$3.apply(AbstractClassGenerator.java:96)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator$ClassLoaderData$3.apply(AbstractClassGenerator.java:94)
        at com.google.inject.internal.cglib.core.internal.$LoadingCache$2.call(LoadingCache.java:54)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at com.google.inject.internal.cglib.core.internal.$LoadingCache.createEntry(LoadingCache.java:61)
        ... 38 more
Caused by: com.google.inject.internal.cglib.core.$CodeGenerationException: java.lang.reflect.InaccessibleObjectException-->Unable to make protected final java.lang.Class java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain) throws java.lang.ClassFormatError accessible: module java.base does not "opens java.lang" to unnamed module @6a988392
        at com.google.inject.internal.cglib.core.$ReflectUtils.defineClass(ReflectUtils.java:464)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator.generate(AbstractClassGenerator.java:339)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator$ClassLoaderData$3.apply(AbstractClassGenerator.java:96)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator$ClassLoaderData$3.apply(AbstractClassGenerator.java:94)
        at com.google.inject.internal.cglib.core.internal.$LoadingCache$2.call(LoadingCache.java:54)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at com.google.inject.internal.cglib.core.internal.$LoadingCache.createEntry(LoadingCache.java:61)
        at com.google.inject.internal.cglib.core.internal.$LoadingCache.get(LoadingCache.java:34)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator$ClassLoaderData.get(AbstractClassGenerator.java:119)
        at com.google.inject.internal.cglib.core.$AbstractClassGenerator.create(AbstractClassGenerator.java:294)
        at com.google.inject.internal.cglib.core.$KeyFactory$Generator.create(KeyFactory.java:221)
        at com.google.inject.internal.cglib.core.$KeyFactory.create(KeyFactory.java:174)
        at com.google.inject.internal.cglib.core.$KeyFactory.create(KeyFactory.java:157)
        at com.google.inject.internal.cglib.core.$KeyFactory.create(KeyFactory.java:149)
        at com.google.inject.internal.cglib.core.$KeyFactory.create(KeyFactory.java:145)
        at com.google.inject.internal.cglib.core.$MethodWrapper.<clinit>(MethodWrapper.java:23)
        ... 49 more
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make protected final java.lang.Class java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain) throws java.lang.ClassFormatError accessible: module java.base does not "opens java.lang" to unnamed module @6a988392
        at java.base/java.lang.reflect.AccessibleObject.throwInaccessibleObjectException(AccessibleObject.java:387)
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:363)
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:311)
        at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:201)
        at java.base/java.lang.reflect.Method.setAccessible(Method.java:195)
        at com.google.inject.internal.cglib.core.$ReflectUtils$1.run(ReflectUtils.java:61)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
        at com.google.inject.internal.cglib.core.$ReflectUtils.<clinit>(ReflectUtils.java:52)
        at com.google.inject.internal.cglib.reflect.$FastClassEmitter.<init>(FastClassEmitter.java:67)
        ... 46 more
  • Because the jdk version is too high, it is not compatible with cerebo. It seems that there is no solution at present! ! !
  • Of course, the author's level is limited! If there is a solution, please let me know, thank you very much

4.4.3 install cerebo in linux

  1. download cerebo
    insert image description here
    • Suggestion: install the github accelerator plug-in in the browser [skip if you have a ladder]
  2. Then upload the compressed package to linux for installation
    rpm -ivh cerebro-0.9.4-1.noarch.rpm
    
    insert image description here
  3. Modify the configuration file
    vim /usr/share/cerebro/conf/application.conf
    
    insert image description here
  4. Check and close the startup status of cerebro: [not recommended]
    • This boot method cannot be accessed from external devices
    # 停止
    systemctl stop cerebro  
    # 开启
    systemctl start cerebro
    # 查看状态
    systemctl status cerebro
    
    insert image description here
  5. Start command:
    • In order to facilitate troubleshooting, you can directly use the command to start cerebro
    /usr/share/cerebro/bin/cerebro
    
    • Started by default:
    [info] play.api.Play - Application started (Prod) (no global state)
    [info] p.c.s.AkkaHttpServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
    
  6. visit ip:9000:
    insert image description here
  7. Enter the address and port of any node of elasticsearch, click connect
    insert image description here

insert image description here
- A green bar means the cluster is green (healthy)

4.4.5 Create an index library

4.4.6 Using kibana's DevTools to create an index library [non-practical operation]

  • Enter the command in DevTools:
    PUT /itcast
    {
          
          
      "settings": {
          
          
        "number_of_shards": 3, // 分片数量
        "number_of_replicas": 1 // 副本数量
      },
      "mappings": {
          
          
        "properties": {
          
          
          // mapping映射定义 ...
        }
      }
    }
    

4.4.7 Using cerebro to create an index library [practical operation]

  • You can also create an index library with cerebro:

insert image description here

  • Fill in the index library information:
    insert image description here

  • Click the create button in the lower right corner:
    insert image description here

  • Click the create button in the lower right corner:
    insert image description here

    insert image description here

4.4.8 View Fragmentation Effect

  • Go back to the home page, and you can view the fragmentation effect of the index library:
    insert image description here

4.9 Cluster split-brain problem

4.9.1 Division of Cluster Responsibilities

  • Cluster nodes in elasticsearch have different responsibilities:
    insert image description here
  • The cluster must separate cluster responsibilities:
    • master node: high CPU requirements, but memory requirements
    • data node: high requirements for CPU and memory
    • Coordinating node: high requirements for network bandwidth and CPU
  • Separation of duties can allocate different hardware for deployment according to the needs of different nodes. And avoid mutual interference between services.
  • Each node role in elasticsearch has its own different responsibilities, so it is recommended that each node has an independent role during cluster deployment.
    insert image description here

4.9.2 Split brain problem

  • By default, each node is a master eligible node, so once the master node goes down, other candidate nodes will elect one to become the master node. A split-brain problem can occur when the network between the master node and other nodes fails.
    insert image description here

  • After node3 is elected, the cluster continues to provide external services. Node2 and node3 form a cluster by itself, and node1 forms a cluster by itself. The data of the two clusters is not synchronized, resulting in data discrepancies and split-brain situations.

  • In order to avoid split-brain, it is necessary to require votes to exceed (number of eligible nodes + 1)/2 to be elected as the master, so the number of eligible nodes is preferably an odd number.

  • The corresponding configuration item is discovery.zen.minimum_master_nodes, which has become the default configuration after es7.0, so the problem of split brain generally does not occur

4.9.3 Summary

  • The role of the master eligible node:

    • Participate in group election
    • The master node can manage the cluster state, manage sharding information, and process requests to create and delete index libraries
  • The role of the data node:

    • CRUD of data
  • The role of the coordinator node:

    • Route requests to other nodes
    • Combine the query results and return them to the user

4.10 Cluster Distributed Storage

  • When new documents are added, they should be saved in different shards to ensure data balance, so how does the coordinating node determine which shard the data should be stored in?

4.10.1 Shard storage test

  • The tool used for testing insomnia, Insomnia, like postman, is a free cross-platform interface testing desktop application
  • Insomnia official website , if you want to download, it is recommended to download from other websites, the official website is too slow
  • Here the author provides the latest version of "Insomnia.Core-2023.1.0.exe"
    insert image description here
    insert image description here
    insert image description here
  • You can see from the test that the three pieces of data are in different shards:
    insert image description here

4.10.2 Shard storage principle

  • Elasticsearch will use the hash algorithm to calculate which shard the document should be stored in:
    insert image description here
  • illustrate:
    • _routing defaults to the id of the document
    • The algorithm is related to the number of shards, so once the index library is created, the number of shards cannot be modified

  • The process of adding a new document is as follows
    insert image description here
  • Interpretation:
    • 1) Add a document with id=1
    • 2) Do a hash operation on the id, if the result is 2, it should be stored in shard-2
    • 3) The primary shard of shard-2 is on node3, and the data is routed to node3
    • 4) Save the document
    • 5) Synchronize to replica-2 of shard-2, on the node2 node
    • 6) Return the result to the coordinating-node node

4.10.3 Cluster Distributed Query

  • The elasticsearch query is divided into two stages:
    • scatter phase: In the scatter phase, the coordinating node will distribute the request to each shard
    • gather phase: the gathering phase, the coordinating node summarizes the search results of the data node, and processes it as the final result set and returns it to the user

insert image description here

4.10.4 Cluster failover

  • Failover: The master node of the cluster will monitor the status of the nodes in the cluster. If a node is found to be down, it will immediately migrate the fragmented data of the down node to other nodes to ensure data security.
  • For example, a cluster structure is shown in the figure: node1 is the master node, and the other two nodes are slave nodes
    insert image description here
  • node1 has failed
    insert image description here
  • The first thing after the downtime is to re-elect the master, for example, select node2
    insert image description here
  • After node2 becomes the master node, it will check the cluster monitoring status and find that: shard-1 and shard-0 have no replica nodes. Therefore, the data on node1 needs to be migrated to node2 and node3
    insert image description here
  • Animation demo:
    insert image description here

Guess you like

Origin blog.csdn.net/yang2330648064/article/details/129870103