Microservices - Practical Articles - Notebooks

service items

Dark Horse - Spring Cloud Microservice Technology Stack Practical Completion Time 2023-3-22.

Project involves technology

  1. The knowledge points are arranged in order according to the number of episodes, which is convenient for future search.
  2. Considering that it is not a fixed networking method, sometimes WiFi, sometimes hotspots, configuring a static IP will cause reconfiguration after each network change, so the dynamic routing used by the virtual machine needs to be modified when the IP changes when it needs to run related programs yml文件| The configuration in | test测试类|| is enough.启动类
  3. Enumerating code paths is primarily for follow-up review.
  4. Write the code path yourself E:\微服务\实用篇\day01-SpringCloud01\资料\cloud-demo.
  5. Package and upload to Linux to implement the cluster deployment path E:\微服务\实用篇\day03-Docker\资料\cloud-demo.
  6. mq's code path E:\微服务\实用篇\day04-MQ\资料\mq-demo.
  7. RestClient operates the code path of the hotel index library E:\微服务\实用篇\day05-Elasticsearch01\资料\hotel-demo.
  8. The code path for operating database + mq to realize data synchronization E:\微服务\实用篇\day07-Elasticsearch03\资料\hotel-admin.

Practical articles

background

  1. Guide to microservice technology stack.
  2. Microservice architecture, background, technology comparison, etc.
  3. Service splitting – microservice calling method.

registration center

  1. Eureka Registration Center – Service Principle, Construction, Registration, Discovery -> (p9).
  2. Ribbon Load Balancing Rules – Principles, Polling and Random Balancing Strategies, Hunger Loading -> (p14).
  3. Nacos registration center – installation, registration, discovery, service hierarchical storage (sub-regional cluster), NacosRule load balancing strategy, weight control, environment isolation-namespace->(P17).
  4. The similarities and differences between Nacos and eureka.
  5. Nacos startup – Enter cmd in the bin directory and enter startup.cmd -m standalone.

configuration center

  1. Nacos configuration management (ps: video p26 – is the namespace correct)).
  2. Nacos implements hot update – (1) @Value + RefreshScope to refresh. (2) @ConfigurationProperties injection, automatic refresh.
  3. Multi-service shared configuration – priority: configuration in nacos (service name-profile.yaml > service name.yaml) > local configuration.
  4. Nacos cluster construction – (1) Build a MySQL cluster and initialize the database table. (2) Download and decompress nacos, modify the cluster configuration and database configuration. (3) Start multiple nacos nodes separately. (4) nginx reverse proxy.

remote call

  1. http client Feign – definition and usage steps -> (p30).
  2. The difference and replacement between Feign and RestTemplate.
  3. Customize the configuration of Feign - modify the log level - configuration file && java code configuration (generally requires configuration, detailed explanation), parser for response results, request parameter encoding, supported annotation formats, failure retry mechanism.
  4. Feign's performance optimization – default-URLConnection: default implementation, does not support connection pooling; Apache HttpClient: supports connection pooling; OKHttp: supports connection pooling.
  5. Optimization – try to use basic for logs; use connection pool.
  6. Feign best practice – controller and FeignClient inherit the same interface; define the default configuration of FeignClient, POJO, and Feign into one project for consumers to use (extract).
  7. Extract Feign – Here, the UserClient is automatically annotated to report an error because the package cannot be scanned. The following video has a solution - two ways to import the package.

service gateway

  1. Unified Gateway Gateway – functions: identity authentication, permission verification, load balancing, and current limiting. (p35)
  2. Gateway construction, routing configuration - routing id, routing target (uri), routing assertion (predicates), routing filter (filters).
  3. Routing assertion factory – 11 basic assertions (when using it, go to the Spring Cloud Gateway official documentation to introduce and use samples), function: read user-defined assertion conditions, and make judgments on requests.
  4. Routing filter – 31 kinds of filtering rules; the role of the filter: to process the request or response of the route, and the filter configured under the route is only effective for the current routing request; the role of defaultFilters default route: it is effective for all routes filter.
  5. Global filter – function: a filter that is effective for all routes, and can customize the processing logic; implementation steps: implement the GlobalFilter interface, add @Order annotations or implement the Ordered interface, and write processing logic.
  6. Filter execution order – the smaller the order value, the higher the priority; when the order value is the same, the order is the defaultFilter first, then the local routing filter, and finally the global filter.
  7. CORS cross-domain – configuration parameters: which domain names are allowed, request headers, request methods, whether to allow the use of permanent cookies, and how long they are valid.

project deployment

  1. Docker solves the compatibility problem of different component dependencies - packages the application's Libs (function library), Deps (dependency), configuration and application together to form a portable image; puts each application on an isolated container to run, using a sandbox mechanism, isolated from each other . (P42)
  2. Docker solves the problem of differences in development, testing, and production environments – the Docker image contains a complete operating environment, including system function libraries, and only relies on the kernel of the Linux system, so it can run on any Linux operating system.
  3. Docker is a technology for quickly delivering applications and running applications – startup and removal can be completed with a single command, which is convenient and fast.
  4. The difference between Docker and a virtual machine – docker is a system process; a virtual machine is an operating system in an operating system; docker is small in size, fast in startup speed, and has good performance; the virtual machine is large in size, slow in startup speed, and has average performance.
  5. Image: Package the application and its dependencies, environment, and configuration together.
  6. Container: When an image runs, it is a container, and one image can run multiple containers.
  7. Docker structure – server: accept commands or remote requests, operate images or containers; client: send commands or requests to the Docker server.
  8. DockerHub: A mirror hosting server, similar to Alibaba Cloud mirror service, collectively referred to as DockerRegistry.
  9. Linux uninstalls, installs, starts docker, and configures mirroring. Docker applications need to use various ports, and modify firewall settings one by one. It is very troublesome, so you need to temporarily close the firewall before starting.
# 关闭
systemctl stop firewalld
# 禁止开机启动防火墙
systemctl disable firewalld
#查询防火墙状态
systemctl status firewalld

# 启动docker服务
systemctl start docker  
# 查询docker服务状态
systemctl status docker
# 查看版本
docker -v
# 停止docker服务
systemctl stop docker 
# 重启docker服务
systemctl restart docker  
  1. Docker basic operation – image command (p47)
  2. docker--helpView the help documentation; for example: docker images --help- View all images, which contain explanations and instructions for using parameters.
  3. Install nginx and redis in turn.
  4. Below are some mirroring commands, you don’t need to remember them, just check the help documentation when you use them.
#查看帮助文档
docker--help
#拉取nginx的命令
docker pull nginx
#查看镜像
docker images

#查询帮助文档
docker save --help
#导出镜像到磁盘  -o 导出后的名称 名称:版本
docker save -o nginx.tar nginx:latest
#删除镜像  rmi 名称:版本  或者  rmi 镜像id
docker rmi nginx:latest
#导入镜像
docker load -i nginx.tar
  1. Basic operation of docker – container command (p49)
# 运行docker
docker run
#docker run:运行容器  --name:起名字 -p:宿主机端口(可变):容器端口(不可变)  -d:后台运行容器  nginx:是镜像名称
docker run --name name -p 80:80 -d nginx
#redis
docker run --name mr -p 6379:6379 -d redis redis-server --appendonly yes

# 暂停
docker pause 容器名字
# 从暂停到运行
docker unpause 容器名字

# 停止
docker stop 容器名字
# 从停止到运行
docker start 容器名字

# 查看所有运行的容器及状态
docker ps

# 查看容器运行日志  
docker logs 容器名字
# 持续查看输出日志
docker logs -f 容器名字

# 进入容器执行命令
docker exec
# docker exec:进入容器内部执行命令  -it:给当前进入的容器创建一个标准输入、输出终端,允许我们与容器交互  name:容器名称  bash:进入容器后执行的命令,bash是一个Linux终端的交互命令
docker exec -it name bash
# 删除指定容器
docker rm 容器名字
  1. Basic operation of docker - data volume (p53)
  2. A data volume (volume) is a virtual directory that points to a certain directory in the host system.
  3. The role of data volumes: to separate and decouple the container from the data, to facilitate the operation of the data in the container, and to ensure data security.
  4. Basic commands for data volumes.
# 数据卷基本语法
docker volume [command]
# 下方是根据命令的command
create # 创建一个volumn  + 名称
inspect # 显示一个或多个volumn的信息  + 名称
ls # 列出所有的volume
prune # 删除未使用的volume
rm # 删除一个或多个指定的volumn + 名称
  1. Basic operation of docker - mount data volume (P54)
  2. If the volume does not exist when the container is running, it will be created automatically.
  3. The coupling degree of data volume mounting is low, and the directory is managed by docker, but the directory is deep and hard to find.
  4. The directory mount is highly coupled and needs to be managed by ourselves, and the directory is easy to find.
# docker run的命令中通过 -v 参数挂载文件或目录到容器中:
# (1)-v volume名称:容器内目录
# (2)-v 宿主机文件:容器内文件
# (3)-v 宿主机目录:容器内目录

# docker run:运行容器  --name:起名字  -v volumename:/targetContainerPath -p 8080:80:把宿主机的8080端口映射到容器内的80端口  -d:挂载到后台  nginx:镜像名称
docker run --name mn -v html:/root/html -p 8080:80 -d nginx
# mysql
docker run --name mysql -e MYSQL_ROOT_PASSWORD=123456 -p 3306:3306 -v /tmp/mysql/conf/hmy.cnf:/etc/mysql/conf.d/hmy.cnf -v /tmp/mysql/data:/var/lib/mysql -d mysql:5.7.25
  1. Dockerfile – custom image (P56)
  2. Mirroring is a layered structure, and each layer is called a Layer. BaseImage layer: contains basic system function libraries, environment variables, and file systems; Entrypoint: entry, the command to start the application in the image; Others: add dependencies, install programs, and complete the installation and configuration of the entire application on the basis of BaseImage.
  3. A Dockerfile is a text file that contains instructions that describe what to do to build the image.
  4. The first line of Dockerfile must be FROM, which is built from a basic image (it can be a basic operating system, or a good image made by others).
  5. Some commonly used commands are introduced as follows:
# 每一个指令都会形成一层Layer
FROM   # 指定基础镜像
ENV    # 设置环境变量,可在后面指令使用
COPY   # 拷贝本地文件到镜像的指定目录
RUN    # 执行Linux的shell命令,一般是安装过程的命令
EXPOSE # 指定容器运行时监听的端口
ENTRYPOINT  # 镜像中应用的启动命令,容器运行时调用
# 利用dockerfile来构建镜像 指令后的.是指DockerFile在当前目录下
docker build -t javaweb:1.0 .
# 将生成的镜像跑起来
docker run --name web -p 8090:8090 -d javaweb:1.0
  1. DockerCompose – Microservice Cluster Deployment (P58).
  2. Docker Compose quickly deploys distributed applications based on Compose files, without manually creating and running containers one by one.
  3. The Compose file is a text file that defines how each container in the cluster runs through instructions (equivalent to converting various parameters of docker to define, as well as running containers and building images).
  4. CentOS7 installs Docker Compose.
  5. Use Docker Compose to deploy the previous project cluster to Linux. (ps: It is not possible to directly transfer folders using xshell, you can first compress and upload to Linux, and then decompress)
  6. Since the deployment of nacos is relatively slow, other microservices need to depend on it, which will cause some runtime errors. Solution: deploy nacos first, and then deploy other microservices.
# 查看DockerCompose的帮助文档
docker-compose --help
# 查看创建的容器
docker ps
# 查看日志  最后可以加微服务名称,查询一个启动的日志
docker-compose logs -f
# 解决nacos部署慢,重启其它微服务
docker-compose restart gateway userservice orderservice
  1. Docker – Mirror Warehouse (P60)
  2. Build a mirror warehouse – using Docker Compose to deploy a Docker Registry mirror warehouse with a graphical interface, you need to configure the Docker trust address first.
  3. To push or pull an image from a private image repository, you must first tag the image to be pushed to the private image service.
# 打开要修改的文件
vi /etc/docker/daemon.json
# 添加内容:
"insecure-registries":["http://192.168.226.134:8080"]
# 重加载
systemctl daemon-reload
# 重启docker
systemctl restart docker

# 创建DockerCompose部署带有图象界面的DockerRegistry的yaml文件
version: '3.0'
services:
  registry:
    image: registry
    volumes:
      - ./registry-data:/var/lib/registry
  ui:
    image: joxit/docker-registry-ui:static
    ports:
      - 8080:80
    environment:
      - REGISTRY_TITLE=传智教育私有仓库
      - REGISTRY_URL=http://registry:5000
    depends_on:
      - registry
# 后台运行
docker-compose up -d

# 查看现有镜像
docker images
# 重新tag本地镜像,名称前缀为私有仓库地址
docker tag nginx:latest 192.168.226.134:8080/nginx:1.0
# 推送镜像
docker push 192.168.226.134:8080/nginx:1.0
# 删除镜像
docker rmi 192.168.226.134:8080/nginx:1.0
# 拉取镜像
docker pull 192.168.226.134:8080/nginx:1.0

asynchronous communication

  1. MQ–RabbitMQ–SpringAMQP(P61)
  2. Synchronous call – Advantages: strong timeliness, results can be obtained immediately; Disadvantages: high coupling, reduced performance and throughput, additional resource consumption, and cascading failure problems.
  3. Implementation of asynchronous calls - event-driven advantages, event-driven architecture - Broker.
  4. Asynchronous communication – Advantages: low coupling, improved throughput, fault isolation, traffic peak shaving; Disadvantages: rely on Broker’s reliability, security, throughput, complex architecture, no obvious business process line, difficult to track and manage .
  5. MQ (MessageQueue): Message queue, a queue for storing messages.
  6. Comparative analysis of RabbitMQ, ActiveMQ, RocketMQ, and Kafka. (P64)
  7. RabbitMQ – Deployment and installation, page introduction, structure and concepts.
# 在线拉取
docker pull rabbitmq:3-management
# 上传好tar包,命令加载镜像
docker load -i mq.tar
# 运行MQ容器
docker run \
 -e RABBITMQ_DEFAULT_USER=itcast \
 -e RABBITMQ_DEFAULT_PASS=123321 \
 --name mq \
 --hostname mq1 \
 -p 15672:15672 \
 -p 5672:5672 \
 -d \
 rabbitmq:3-management
 # 查看全部容器
 docker ps -a
 # 重启后重启mq容器
docker start mq
  1. Analysis of page elements after successful RabbitMQ deployment – ​​channel: tool for operating MQ, exchange: routing messages to queues, queue: cached messages, virtual host: virtual host, which is a logical grouping of resources such as queue and exchange.
  2. RabbitMQ entry case - simple queue model. Official Documents-Getting Started Cases-publisher: message publisher, sending messages to the queue; queue: message queue, responsible for receiving and buffering messages; consumer: subscribing to the queue, processing messages in the queue.
  3. The message sending process of the basic message queue and the message receiving process of the basic message queue. (P67)
  4. SpringAMQP – a set of API specifications based on the AMQP protocol definition, providing templates to send and receive messages; AMQP introduction – a protocol for message communication between applications, independent of language and platform.
  5. Simple queue model – using SpringAMQP to implement the basic message queue function of HelloWorld – introducing the amqp starter dependency; configuring the RabbitMQ address; using the convertAndSend method of the RabbitTemplate to send messages. (P67)
  6. Work queue – work queue, which can improve message processing speed and avoid queue message accumulation - default: message prefetch.
  7. Work model - multiple consumers are bound to a queue, and the same message will only be processed by one consumer; the number of messages prefetched by consumers can be controlled by setting prefetch. (P71)
  8. Publish, subscribe model – allows sending the same message to multiple consumers. The implementation method is to add exchange (switch); exchange type-Fanout: broadcast, Direct: routing, Topic: topic. (P72)
  9. The role of exchange (switch) – to receive the message sent by the publisher; to route the message to the queue bound to it according to the rules; to be responsible for message routing, not storage, and if the routing fails, the message will be lost.
  10. Fanout Exchange – routes received messages to each queue bound to it.
  11. Direct Exchange – routing mode will route the received message to the specified Queue according to the rules (each Queue has a BindingKey with Exchange); when the publisher sends a message, specify the RoutingKey of the message; Exchange will route the message to the BindingKey and message RoutingKey consistent queue. (P74)
  12. TopicExchange–Similar to Direct Exchange, the difference is that the routingKey must be a list of multiple words and separated by . ; Queue and Exchange can use wildcards when specifying BindingKey: #-refers to 0 or more words; *: refers to a word .
  13. Message Converter – deserialization and deserialization of messages in SpringAMQP – implemented using MessageConverter, the default is serialization of JDK; note that the sender and receiver must use the same MessageConverter.

distributed search

  1. ES==elasticsearch – open source distributed search engine. (P77)
  2. elasticsearch: Used to implement functions such as search, log statistics, analysis, and system monitoring.
  3. elasticsearch+kibana、Logstash、Beats == elastic stack(ELK)。
  4. elasticsearch – core – store, calculate, search data; replaceable components – kibana – data visualization ; replaceable components – Logstash, Beats – data capture .
  5. Lucene-Apache's search engine class library - easy to expand, high performance, based on inverted index - provides search engine core API - only supports Java language.
  6. Database table- document : each piece of data is a document; entry : segment the content in the document, and the obtained word is an entry.
  7. Forward indexCreate an index based on the document id ; when querying an entry, you must first find the document, and then judge whether it contains the entry - fuzzy query of the database - query and judge one by one.
  8. Inverted index – segment the content of the document, create an index for the entry , and record the id of the document where the entry is located; the query is to first query the document id based on the entry, and then obtain the document.
  9. ES – storage -oriented to document storage, document data will be serialized into JSON format; index -a collection of documents of the same type; mapping -field constraint information of documents in the index, similar to the structural constraints of tables.
  10. Conceptual comparison, architecture analysis, and relationship between MySQL and elasticsearch. (P80)
  11. To install and deploy es and kibana, you need to connect the es and kibana containers first. To deploy a single point of es or kibana, you need to run the uploaded tar package to import the data, and then run the docker command to access it. The specific commands are organized as follows, plus notes, Clear and clear.
# 创建网络
docker network create es-net
# 关闭虚拟机后,查看局域网络(已配置过,重启后不影响)
docker network ls
# 导入数据
docker load -i es.tar
# 运行docker命令,部署单点es
docker run -d \
	--name es \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    -e "discovery.type=single-node" \
    -v es-data:/usr/share/elasticsearch/data \
    -v es-plugins:/usr/share/elasticsearch/plugins \
    --privileged \
    --network es-net \
    -p 9200:9200 \
    -p 9300:9300 \
elasticsearch:7.12.1
# 重启后,重启容器
docker start es
# 输入地址加端口即可访问es
http://192.168.226.139:9200

# 导入数据
docker load -i kibana.tar
# 运行docker命令,部署kibana
docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601  \
kibana:7.12.1
# 重启后重启容器
docker start kibana
# 输入地址加端口即可访问kibana
http://192.168.226.139:5601

ps: (1) After restarting, you only need to restart the container; (2) If you do not delete the container, re-run the docker command deployment will report a duplicate name error; (3) If you do not delete the container and want to modify the name to create a container, you need to create a container in The request connection in ess and kibana must be synchronized, otherwise Kibana server is not ready yeta problem will be reported.

  1. The role of the tokenizer – segment the document when creating the inverted index, and segment the input content when the user searches. (P83)
  2. Test the tokenizer and install the IK tokenizer.
# 在kibana中测试分词器
# english-默认分词器、standard-标准分词器
POST /_analyze
{
	"text": "好好学习,天天向上",
	"analyzer": "english"
}

# 安装ik分词器
# 查看数据卷elasticsearch的plugins目录位置
docker volume inspect es-plugins
# 到这个目录下
cd /var/lib/docker/volumes/es-plugins/_data
# 上传elasticsearch-analysis-ik-7.12.1.zip,然后解压
unzip elasticsearch-analysis-ik-7.12.1.zip
# 不太建议上面的方式,我试过发现启动会报错,后面改了很久都是报错,不知道哪里的配置文件被修改了,然后恢复快照重新来过
# 使用FileZillar直接传输Windows下解压的文件夹,结果是成功的
# 重启es容器
docker restart es
# 查看es日志
docker logs -f es

# 测试ik分词器
# IK分词器包含两种模式
# ik_smart:最少切分   --  被搜索的概论低-粗粒度
# ik_max_word:最细切分 -- 内存占用高-细粒度
GET /_analyze
{
  "analyzer": "ik_max_word",
  "text": "好好学习天天向上,奥利给,噢噢点赞"
}
  1. ik word breaker - expand thesaurus, stop thesaurus.

Note that the encoding of the current file must be in UTF-8 format, and editing with Windows Notepad is strictly prohibited

# 打开IK分词器config目录,在IKAnalyzer.cfg.xml配置文件内容添加
# 用户可以在这里配置自己的扩展字典
<entry key="ext_dict">ext.dic</entry>
#用户可以在这里配置自己的扩展停止词字典  *** 添加停用词词典
<entry key="ext_stopwords">stopword.dic</entry>

# 新建一个 ext.dic,可以参考config目录下复制一个配置文件进行修改
奥利给
# 在 stopword.dic 添加停用词
噢噢
# 修改过看效果,重启es容器即可
docker restart es
# 查看 日志
docker logs -f es
  1. Index library operation (P85)
  2. mapping attribute – A mapping is a constraint on the documents in the indexed repository.
  3. The mapping attributes include: type: field data type, index: create an index after yes - the default is true; analyzer: which tokenizer to use; properties: the subfield of the field.
  4. type Simple type – string: text (text that can be segmented), keyword (exact value); value: long, integer, short, byte, double, float; Boolean: boolean; date: date; object: object.
  5. Create an index library, view an index library, delete an index library, and prohibit modification of an index library. (P86)
# DSL语法
# 创建索引库名
PUT /索引库名
# 创建索引库的DSL语法例子
PUT /a
{
	"mappings": {
		"properties": {
			"info": {
				"type": "text",
				"analyzer": "ik_smart"
			},
			"name":{
				"type": "object",
				"properties": {
					"firstName": {
						"type": "keyword",
						"index": false
					}
				}
			}
		}
	}
}

# 查看索引库
GET /索引库名
# 删除索引库
DELETE /索引库名
# 索引库和mapping一旦创建就无法修改,但是可以添加新的字段
PUT /索引库名/_mapping
{
	"properties":{
		"新字段名":{
			"type":"integer"
		}
	}
}
  1. Document operation – insert document, view document, delete document (P88)
  2. Modify documents – full modification will delete old documents and add new documents; partial modification will modify the specified field value.
# 插入文档
POST /索引库名/_doc/文档id
# 查看文档
GET /索引库名/_doc/文档id
# 删除文档
DELETE /索引库名/_doc/文档id

# 插入的DSL语法例子 -- 索引库名与上方创建相同
POST /a/_doc/1
{
	"info": "好好学习天天向上",
	"name": {
		"firstName": "小",
		"lastName": "盈"
	}
}

# 修改文档 -- 全量修改,会删除旧文档,添加新文档
PUT /索引库名/_doc/文档id
PUT /a/_doc/1
{
	"info": "好好学习天天向上",
	"email": "45543563.qq.com",
	"name": {
		"firstName": "小",
		"lastName": "盈"
	}
}
# 局部修改,修改指定字段值 -- 只能修改一个指段
POST /索引库名/_update/文档id
POST /a/_update/1
{
	"doc":{
		"email": "[email protected]"
	}
}
  1. RestClient operation index library (P90)
  2. Import hotel-demo, analyze the mapping data structure of hotel - field name, data type, whether to participate in search, whether to segment words, word segmenter.
  3. tip: Two geographic coordinate data types are supported in ES – geo_point: a point determined by latitude and longitude; geo_shape: a complex geometric figure composed of multiple geo_points.
  4. Field copy can use the copy_to attribute to copy the current field to the specified field.
  5. Create an index library, delete an index library, and determine whether an index library exists.
  6. The basic steps of index library operation: initialize RestHighLevelClient; create XxxIndexRequest. xxx is Create, Get, Delete; prepare DSL (needed for Crete); send request, call RestHighLevelClient#indices().xxx() method.
# 酒店的mapping
PUT /hotel
{
	"mappings":{
		"properties":{
			"id":{
				"type": "keyword"
			},
			"name":{
				"type": "text",
				"analyzer": "ik_max_word",
				"copy_to": "all"
			},
			"address":{
				"type": "keyword",
				"index": false
			},
			"price":{
				"type": "integer"
			},
			"score":{
				"type": "integer"
			},
			"brand":{
				"type": "keyword",
				"copy_to": "all"
			},
			"city":{
				"type": "keyword"
			},
			"starName":{
				"type": "keyword"
			},
			"business":{
				"type": "keyword",
				"copy_to": "all"
			},
			"location":{
				"type": "geo_point"
			},
			"pic":{
				"type": "keyword",
				"index": false
			},
			"all":{
				"type": "text",
				"analyzer": "ik_max_word"
			}
		}
	}
}
  1. RestClient Manipulating Documents – Use JavaRestClient to implement CRUD of documents. (P95)
  2. Query the hotel data in the database, import it into the hotel index library, and realize the CRUD of the hotel data.
  3. Add document-index, query document according to id-get, modify document according to id-update, delete document according to id-delete.
  4. Basic steps of document operation: initialize RestHighLevelClient; create XxxRequest; prepare parameters (needed for Index and Update); send request, call RestHighLevelClient#.xxx() method; parse result (needed for Get).
  5. Use JavaRestClient to batch import hotel data to ES.

DSL syntax

  1. DSL Query Syntax (P101)
  2. Query all : Query all data, for general testing. For example: match_all.
  3. Full-text search (full text) query : use the word segmenter to segment the user input content, and then match it in the inverted index database. For example: match, multi_match; The difference between the two is that match is queried based on one field, and multi_match is queried based on multiple fields; the more fields involved in the query, the worse the query performance.
  4. Precise query : Find data based on precise entry values, generally searching for keyword, numeric, date, boolean and other types of fields. For example: ids, range, term;
  5. Geographic (geo) query : query based on latitude and longitude. For example: geo_distance, geo_bounding_box.
  6. Compound (compound) query : compound query can combine the above-mentioned various query conditions and combine query conditions. For example: Boolean Query, function_score.
  7. Relevance Scoring Algorithm (P105)
  8. TF-IDF : Before elasticsearch5.0, it will grow bigger and bigger as the word frequency increases
  9. BM25 : After elasticsearch5.0, it will increase with the increase of word frequency, but the growth curve will tend to be horizontal.
# 查询所有
GET /hotel/_search
{
	"query": {
		"match_all": {}
	}
}

# 全文检索 -- match查询(效率高)
GET /hotel/_search
{
	"query": {
		"match": {
			"all": "外滩如家"
		}
	}
}
# 全文检索 -- multi_match
GET /hotel/_search
{
	"query": {
		"multi_match": {
			"query": "外滩如家",
			"fields": ["brand", "name", "business"]
		}
	}
}

# 精确查询 -- term查询 精确匹配
GET /hotel/_search
{
	"query": {
		"term": {
			"city": {
				"value": "上海"
			}
		}
	}
}
# 精确查询 -- range查询  范围过滤
GET /hotel/_search
{
	"query": {
		"range": {
			"price": {
				"gte": 100,
				"lte": 300
			}
		}
	}
}

# 地理查询 -- distance查询
GET /hotel/_search
{
	"query": {
		"geo_distance": {
			"distance": "2km",
			"location": "31.21, 121.5"
		}
	}
}

# 复合查询 -- function_score   参加打分
# 给“如家”这个品牌的酒店靠前一点
GET /hotel/_search
{
	"query": {
		"function_score": {
			"query": {
				"match": {
					"all": "外滩"
				}
			},
			"functions": [    //算分函数
				{
					"filter": {    //条件
						"term": {
							"brand": "如家"
						}
					},
					"weight": 10   //算分权重
				}
			],
			"boost_mode": "sum"  //加权分式
		}
	}
}
# 复合查询 -- Boolean Query 
# must:必须匹配的条件,可以理解为“与”
# should:选择性匹配的条件,可以理解为“或”
# must_not:必须不匹配的条件,不参与打分 - 提高效率
# filter:必须匹配的条件,不参与打分 - 提高效率
# 搜索名字包含“如家”,价格不高于400,在坐标31.21,121.5周围10km范围内的酒店
GET /hotel/_search
{
	"query": {
		"bool": {
			"must": [
				{"match":{"name": "如家"}}
			],
			"must_not": [
				{"range":{"price":{"gt": 400}}}
			],
			"filter":[
				{"geo_distance": {
					"distance": "20km",
					"location": {
						"lat": 31.21,
						"lon": 121.5
					}
				}}
			]
		}
	}
}
  1. Search result processing (P108)
  2. Sorting – After sorting, no correlation scoring is performed, which improves query efficiency.
  3. Pagination – Query all documents, and then intercept the position of the current document + the number of displayed documents; default top10, query more modification parameters - from, size.
  4. Deep paging problem – when ES clusters are processing, the results of all nodes are aggregated, sorted in memory, and the corresponding documents are selected; if the number of search pages is too deep, or the result set (from+size) is larger, the impact on memory and CPU The consumption is also higher.
  5. es sets the upper limit of the result set to 10000.
  6. Pagination method (P109)
  7. from + size–Advantages: Support random page turning; Disadvantages: Deep paging problem. Scenario: random page-turning search of Baidu, Google, JD.com, etc.
  8. after search–Advantages: No query upper limit (word query size does not exceed 10,000), Disadvantages: You can only query backwards page by page, and do not support random page turning. Scenario: Search without random page turning requirements, such as turning pages down on a mobile phone.
  9. scroll: Advantages: no query upper limit (word query size does not exceed 10,000), disadvantages: additional memory consumption, search results are not real-time, scenario: massive data acquisition and migration. (deprecated)
  10. Highlight – Highlight the search keywords in the search results.
# 对酒店数据按照用户评价降序排序,评价相同的按照价格升序排序
GET /hotel/_search
{
	"query": {
		"match_all": {}
	},
	"sort": [
		{
			"score": "desc"
		},
		{
			"price": "asc"
		}
	]
}
# 对酒店数据数据按照你的位置坐标的距离升序排序
GET /hotel/_search
{
	"query": {
		"match_all": {}
	},
	"sort": [
		{
			"_geo_distance": {
				"location": {
					"lat": 31.034661,
					"lon": 121.612282
				},
				"order": "asc",
				"unit": "km"
			}
		}
	]
}

# 分页查询 -- from-分页当前的位置  size-显示文档的总数
GET /hotel/_search
{
	"query": {
		"match_all": {}
	},
	"sort":[
		{
			"price": "asc"
		}
	],
	"from": 0,
	"size": 10
}

# 高亮查询,默认情况下,ES搜索字段必须与高亮字段一致,可以将"require_field_match":"false"-关闭搜索字段和高亮字段匹配
GET /hotel/_search
{
	"query": {
		"match": {
			"all": "如家"
		}
	},
	"highlight":{
		"fields":{
			"name":{
				"require_field_match":"false"
			}
		}
	}
}
  1. RestClient Querying Documents – Querying documents using JavaRestClient. (P111)
  2. Basic steps – create a SearchRequest object - prepare Request.source(), in which QueryBuilders is used to construct query conditions, and then pass in the query() method - send the request, get the result - parse the result (refer to the JSON result, from the outside to the inside, analyze layer by layer ).
  3. Full Text Search - Only QueryBuilders are needed to build conditions.
  4. Highlight – The highlight result parsing refers to the JSON result and is parsed layer by layer.

data aggregation

  1. Aggregation – statistics, analysis and calculation of document data. (P120)
  2. Common Kinds of Aggregation
  3. Bucket(Bucket aggregation): group document data; TermAggregation: group by document field; Date Histogram: group by date ladder, such as one week or one month.
  4. Metric(metric aggregation or nested aggregation): calculate document data, such as avg, min, max, status (simultaneously seek sum, min, etc.);
  5. Pipeline(Pipeline Aggregation): Perform aggregation based on other aggregation results.
  6. The types of fields participating in the aggregation must be: keyword, value, date, Boolean.
  7. Implementing Bucket Aggregation with DSL (P121)
  8. aggs stands for aggregation, which is at the same level as query; the role of query: to limit the scope of aggregated documents.
  9. Aggregation must have three elements: aggregation name, aggregation type, and aggregation field.
  10. The configurable attributes of the aggregation are: size: specify the number of aggregation results; order: specify the sorting method of the aggregation results; field: specify the aggregation field.
  11. DSL Realizes Metrics Aggregation (P122)
  12. ResrClient implements aggregation (P123)
# 统计所有数据中的酒店品牌有几种,此时可以根据酒店品牌的名称做聚合
# size-设置size为0,结果中不包含文档,只包含聚合结果
# aggs-定义聚合    brandAgg-给聚合起个名字
# terms-聚合的类型,按照品牌值聚合,所以选择
# field-参与聚合的字段  size- 希望获取的聚合结果数量
GET /hotel/_search
{
	"size": 0,
	"aggs": {
		"brandAgg": {
			"terms": {
				"field":"brand",
				"size": 10
			}
		}
	}
}
# Bucket聚合会统计Bucket内的文档数量,记为_count,并且按照_count降序排序
GET /hotel/_search
{
	"size": 0,
	"aggs": {
		"brandAgg": {
			"terms": {
				"field":"brand",
				"size": 10,
				"order": {
					"_count": "asc"
				}
			}
		}
	}
}
# Bucket聚合是对索引库的所有文档做聚合,我们可以限定要聚合的文档范围,只要添加query条件
GET /hotel/_search
{
	"query": {
		"range": {
			"price": {
				"lte": 200
			}
		}
	},
	"size": 0,
	"aggs": {
		"brandAgg": {
			"terms": {
				"field":"brand",
				"size": 10
			}
		}
	}
}

# 获取每个品牌的用户评分的min、max、avg等值.
# aggs-brands聚合的子聚合,也就是分组后对每组分别计算
# scoreAgg-聚合名称
# stats-聚合类型,这里stats可以计算min、max、avg等
# field-聚合字段,这里是score
GET /hotel/_search
{
	"size": 0,
	"aggs": {
		"brandAgg": {
			"terms": {
				"field":"brand",
				"size": 10,
				"order": {
					"scoreAgg.avg": "desc"
				}
			},
			"aggs": {
				"scoreAgg": {
					"stats": {
						"field": "score"
					}
				}
			}
		}
	}
}
  1. Auto-completion (P126)
  2. Install the pinyin word breaker and test it.
  3. Custom word breaker – the composition of word breaker (analyzer) in elasticsearch consists of three parts:
  4. character filters: Process the text before the tokenizer. For example delete characters, replace characters.
  5. tokenizer: Cut the text into terms according to certain rules. For example, keyword is not participle; there is also ik_smart.
  6. tokenizer filter: Further process the entry output by the tokenizer. For example, case conversion, synonyms processing, pinyin processing, etc.
# 安装pinyin分词器
# 查看数据卷elasticsearch的plugins目录位置
docker volume inspect es-plugins
# 到这个目录下
cd /var/lib/docker/volumes/es-plugins/_data
# 使用FileZillar直接传输Windows下解压的pinyin分词器的文件夹,结果是成功的
# 重启es容器
docker restart es
# 查看es日志
docker logs -f es
# 测试拼音分词器
GET /_analyze
{
  "text": ["如家酒店还不错"],
  "analyzer": "pinyin"
}

# 删除索引库
DELETE /test

# 自定义拼音分词器,创建索引库时,通过settings来配置自定义的analyzer(分词器);拼音分词器适合在创建倒排索引的时候使用,但不能在搜索的时候使用。--导致多音字都被搜出来
# 创建倒排索引时应该用my_analyzer分词器  -- analyzer;
# 字段在搜索时应该使用ik_smart分词器 -- search_analyzer;
PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": { 
        "my_analyzer": { 
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
        "py": { 
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings":{
  	"properties":{
  		"name": {
  			"type": "text",
  			"analyzer": "my_analyzer",
  			"search_analyzer": "ik_smart"
  		}
  	}
  }
}
# 测试自定义分词器
GET /test/_analyze
{
  "text": ["如家酒店还不错"],
  "analyzer": "pinyin"
}
  1. Autocompletioncompletion suggesterQuery – Implements the autocompletion feature. (P128)
  2. Auto-completion requirements for fields : type is completiontype; field value is an array of multiple entries.
  3. Case: Auto-completion of hotel data – realize the auto-completion and pinyin search functions of the hotel index library. (P130)
# 自动补全的索引库
PUT test1
{
	"mappings":{
        "properties":{
            "title": {
                "type": "completion"
            }
        }
  }
}
# 示例数据
POST test1/_doc
{
	"title":["Sony", "WH-1000XM3"]
}
POST test1/_doc
{
	"title":["SK-II", "PITERA"]
}
POST test1/_doc
{
	"title":["Nintendo", "switch"]
}
# 自动补全查询
POST /test1/_search
{
  "suggest": {
    "title_suggest": {
      "text": "s", # 关键字
      "completion": {
        "field": "title", # 补全字段
        "skip_duplicates": true, # 跳过重复的
        "size": 10 # 获取前10条结果
      }
    }
  }
}
  1. Data synchronization – data synchronization elasticsearchbetween mysqland (P132)
  2. Question : In microservices, the business responsible for hotel management (operating mysql) and the business responsible for hotel search (operating elasticsearch) may be on two different microservices. How to achieve data synchronization? Solution :
  3. Method 1: Synchronous call ; Advantages: Simple and rude implementation; Disadvantages: High degree of business coupling.
  4. Method 2: Asynchronous notification ; Advantages: low coupling, general difficulty in implementation; Disadvantages: 依赖mqhigh reliability.
  5. Method 3: monitor binlog ; advantages: complete decoupling between services; disadvantages: 开启binlogincrease database burden and high implementation complexity. – use canalmiddleware.
  6. ES cluster structure (P138)
  7. Stand-alone elasticsearch for data storage will inevitably face 两个问题:
  8. Massive data storage problem – logically split the index library into N shards and store them on multiple nodes.
  9. Single point of failure problem – back up fragmented data on different nodes (replica).
  10. The number of shards and copies of each index library is specified when the index library is created, and the number of shards cannot be modified once set.
  11. Cluster nodes in elasticsearch have different responsibilities:
  12. master eligi(Master node) – Alternate master node: The master node can manage and record the cluster status, decide which node the shards are on, and process requests to create and delete index libraries.
  13. data(Data Nodes) – Data Nodes: store data, search, aggregate, CRUD.
  14. ingest– Preprocessing before data storage.
  15. coordinating(Coordinating node) – Route requests to other nodes to merge the results processed by other nodes and return to the user.
  16. Split-brain of ES clusters – when the network between the master node and other nodes fails, a split-brain problem may occur.
  17. The role of the coordinating node
  18. How to determine the sharding of distributed new additions – coordinating nodeaccording to idthe calculation hash, the result is shardthe remainder of the number, and the remainder is the corresponding shard.
  19. Two phases of distributed query.
  20. Scattering phase : The coordinating node distributes query requests to different shards.
  21. Collection phase : Summarize the query results to the coordinating node, organize and return to the user.
  22. Failover – After the master goes down, EligibleMaster is elected as the new master node; the master node monitors the fragmentation and node status, and transfers the fragmentation on the failed node to the normal node to ensure data security.

dark horse tourism

  1. Dark horse tourism case .
  2. Basic search and paging – Case 1 : Realize the hotel search function of Heima Travel, and complete keyword search and paging. (P115)
  3. Conditional filtering – Case 2 : Add filter functions such as brand, city, star rating, price, etc.
  4. Nearby Hotels – Case 3 : Hotels near me.
  5. Ad sticking to the top – Case 4 : Let the specified hotel rank at the top of the search results.
  6. Sorting – Added sorting functionality to Dark Horse Tours.
  7. Highlighting – add search keyword highlighting effect to Dark Horse Travel.
  8. Aggregation-RestClient – ​​Define methods in IUserService to achieve aggregation of brands, cities, and star ratings. (P124)
  9. Hotel search page auto-completion – realize the auto-completion of the hotel search page input box. (P131)
  10. Message Synchronization – Use MQ to synchronize data between mysql and elasticsearch. (P133)

->微服务技术栈课程视频

https://www.bilibili.com/video/BV1LQ4y127n4?p=1

<-

记录每一个学习瞬间

Guess you like

Origin blog.csdn.net/qq_51601665/article/details/129720876