Microservices--Data Aggregation

Dark Horse-Spring Cloud Microservice Technology Stack Data Aggregation.

Project involves technology

  1. The knowledge points are arranged in order according to the number of episodes, which is convenient for future search.
  2. Considering that it is not a fixed networking method, sometimes WiFi, sometimes hotspots, configuring a static IP will cause reconfiguration after each network change, so the dynamic routing used by the virtual machine needs to be modified when the IP changes when it needs to run related programs yml文件| The configuration in | test测试类|| is enough.启动类
  3. Enumerating code paths is primarily for follow-up review.
  4. RestClient operates the code path of the hotel index library E:\微服务\实用篇\day05-Elasticsearch01\资料\hotel-demo.
  5. The code path for operating database + mq to realize data synchronization E:\微服务\实用篇\day07-Elasticsearch03\资料\hotel-admin.

Practical articles

  1. Aggregation – statistics, analysis and calculation of document data. (P120)
  2. Common Kinds of Aggregation
  3. Bucket(Bucket aggregation): group document data; TermAggregation: group by document field; Date Histogram: group by date ladder, such as one week or one month.
  4. Metric(metric aggregation or nested aggregation): calculate document data, such as avg, min, max, status (simultaneously seek sum, min, etc.);
  5. Pipeline(Pipeline Aggregation): Perform aggregation based on other aggregation results.
  6. The types of fields participating in the aggregation must be: keyword, value, date, Boolean.
  7. Implementing Bucket Aggregation with DSL (P121)
  8. aggs stands for aggregation, which is at the same level as query; the role of query: to limit the scope of aggregated documents.
  9. Aggregation must have three elements: aggregation name, aggregation type, and aggregation field.
  10. The configurable attributes of the aggregation are: size: specify the number of aggregation results; order: specify the sorting method of the aggregation results; field: specify the aggregation field.
  11. DSL Realizes Metrics Aggregation (P122)
  12. ResrClient implements aggregation (P123)
# 统计所有数据中的酒店品牌有几种,此时可以根据酒店品牌的名称做聚合
# size-设置size为0,结果中不包含文档,只包含聚合结果
# aggs-定义聚合    brandAgg-给聚合起个名字
# terms-聚合的类型,按照品牌值聚合,所以选择
# field-参与聚合的字段  size- 希望获取的聚合结果数量
GET /hotel/_search
{
	"size": 0,
	"aggs": {
		"brandAgg": {
			"terms": {
				"field":"brand",
				"size": 10
			}
		}
	}
}
# Bucket聚合会统计Bucket内的文档数量,记为_count,并且按照_count降序排序
GET /hotel/_search
{
	"size": 0,
	"aggs": {
		"brandAgg": {
			"terms": {
				"field":"brand",
				"size": 10,
				"order": {
					"_count": "asc"
				}
			}
		}
	}
}
# Bucket聚合是对索引库的所有文档做聚合,我们可以限定要聚合的文档范围,只要添加query条件
GET /hotel/_search
{
	"query": {
		"range": {
			"price": {
				"lte": 200
			}
		}
	},
	"size": 0,
	"aggs": {
		"brandAgg": {
			"terms": {
				"field":"brand",
				"size": 10
			}
		}
	}
}

# 获取每个品牌的用户评分的min、max、avg等值.
# aggs-brands聚合的子聚合,也就是分组后对每组分别计算
# scoreAgg-聚合名称
# stats-聚合类型,这里stats可以计算min、max、avg等
# field-聚合字段,这里是score
GET /hotel/_search
{
	"size": 0,
	"aggs": {
		"brandAgg": {
			"terms": {
				"field":"brand",
				"size": 10,
				"order": {
					"scoreAgg.avg": "desc"
				}
			},
			"aggs": {
				"scoreAgg": {
					"stats": {
						"field": "score"
					}
				}
			}
		}
	}
}
  1. Auto-completion (P126)
  2. Install the pinyin word breaker and test it.
  3. Custom word breaker – the composition of word breaker (analyzer) in elasticsearch consists of three parts:
  4. character filters: Process the text before the tokenizer. For example delete characters, replace characters.
  5. tokenizer: Cut the text into terms according to certain rules. For example, keyword is not participle; there is also ik_smart.
  6. tokenizer filter: Further process the entry output by the tokenizer. For example, case conversion, synonyms processing, pinyin processing, etc.
# 安装pinyin分词器
# 查看数据卷elasticsearch的plugins目录位置
docker volume inspect es-plugins
# 到这个目录下
cd /var/lib/docker/volumes/es-plugins/_data
# 使用FileZillar直接传输Windows下解压的pinyin分词器的文件夹,结果是成功的
# 重启es容器
docker restart es
# 查看es日志
docker logs -f es
# 测试拼音分词器
GET /_analyze
{
  "text": ["如家酒店还不错"],
  "analyzer": "pinyin"
}

# 删除索引库
DELETE /test

# 自定义拼音分词器,创建索引库时,通过settings来配置自定义的analyzer(分词器);拼音分词器适合在创建倒排索引的时候使用,但不能在搜索的时候使用。--导致多音字都被搜出来
# 创建倒排索引时应该用my_analyzer分词器  -- analyzer;
# 字段在搜索时应该使用ik_smart分词器 -- search_analyzer;
PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": { 
        "my_analyzer": { 
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
        "py": { 
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings":{
  	"properties":{
  		"name": {
  			"type": "text",
  			"analyzer": "my_analyzer",
  			"search_analyzer": "ik_smart"
  		}
  	}
  }
}
# 测试自定义分词器
GET /test/_analyze
{
  "text": ["如家酒店还不错"],
  "analyzer": "pinyin"
}
  1. Autocompletioncompletion suggesterQuery – Implements the autocompletion feature. (P128)
  2. Auto-completion requirements for fields : type is completiontype; field value is an array of multiple entries.
  3. Case: Auto-completion of hotel data – realize the auto-completion and pinyin search functions of the hotel index library. (P130)
# 自动补全的索引库
PUT test1
{
	"mappings":{
        "properties":{
            "title": {
                "type": "completion"
            }
        }
  }
}
# 示例数据
POST test1/_doc
{
	"title":["Sony", "WH-1000XM3"]
}
POST test1/_doc
{
	"title":["SK-II", "PITERA"]
}
POST test1/_doc
{
	"title":["Nintendo", "switch"]
}
# 自动补全查询
POST /test1/_search
{
  "suggest": {
    "title_suggest": {
      "text": "s", # 关键字
      "completion": {
        "field": "title", # 补全字段
        "skip_duplicates": true, # 跳过重复的
        "size": 10 # 获取前10条结果
      }
    }
  }
}
  1. Data synchronization – data synchronization elasticsearchbetween mysqland (P132)
  2. Question : In microservices, the business responsible for hotel management (operating mysql) and the business responsible for hotel search (operating elasticsearch) may be on two different microservices. How to achieve data synchronization? Solution :
  3. Method 1: Synchronous call ; Advantages: Simple and rude implementation; Disadvantages: High degree of business coupling.
  4. Method 2: Asynchronous notification ; Advantages: low coupling, general difficulty in implementation; Disadvantages: 依赖mqhigh reliability.
  5. Method 3: monitor binlog ; advantages: complete decoupling between services; disadvantages: 开启binlogincrease database burden and high implementation complexity. – use canalmiddleware.
  6. ES cluster structure (P138)
  7. Stand-alone elasticsearch for data storage will inevitably face 两个问题:
  8. Massive data storage problem – logically split the index library into N shards and store them on multiple nodes.
  9. Single point of failure problem – back up fragmented data on different nodes (replica).
  10. The number of shards and copies of each index library is specified when the index library is created, and the number of shards cannot be modified once set.
  11. Cluster nodes in elasticsearch have different responsibilities:
  12. master eligi(Master node) – Alternate master node: The master node can manage and record the cluster status, decide which node the shards are on, and process requests to create and delete index libraries.
  13. data(Data Nodes) – Data Nodes: store data, search, aggregate, CRUD.
  14. ingest– Preprocessing before data storage.
  15. coordinating(Coordinating node) – Route requests to other nodes to merge the results processed by other nodes and return to the user.
  16. Split-brain of ES clusters – when the network between the master node and other nodes fails, a split-brain problem may occur.
  17. The role of the coordinating node
  18. How to determine the sharding of distributed new additions – coordinating nodeaccording to idthe calculation hash, the result is shardthe remainder of the number, and the remainder is the corresponding shard.
  19. Two phases of distributed query.
  20. Scattering phase : The coordinating node distributes query requests to different shards.
  21. Collection phase : Summarize the query results to the coordinating node, organize and return to the user.
  22. Failover – After the master goes down, EligibleMaster is elected as the new master node; the master node monitors the fragmentation and node status, and transfers the fragmentation on the failed node to the normal node to ensure data security.

Guess you like

Origin blog.csdn.net/qq_51601665/article/details/129720825