[Elasticsearch] zen discovery cluster discovery mechanism

Insert picture description here

1 Overview

Reprinted: https://blog.csdn.net/yangshangwei/article/details/103996803

Continue to learn ES from Teacher Zhonghua Shishan, Chapter 64

Course address: https://www.roncoo.com/view/55

2.zen discovery cluster discovery mechanism

2.1 Problem

Q: Now there are multiple hosts, each machine deploys an es process, each machine starts an es process, how do you make multiple es processes on multiple machines find each other, and then perfectly form a production environment What about es cluster? ?

By default, the es process will be bound to its own loopback address, which is 127.0.0.1, and then scan the port numbers 9300~9305 on the machine, try to communicate with other es processes started on those ports, and then form a Cluster. This is very convenient for building es (pseudo) cluster development environment on this machine.

but对于生产环境下的集群是不行的,需要将每台es进程绑定在一个非回环的ip地址上,才能跟其他节点进行通信,同时需要使用集群发现机制来跟其他节点上的es node进行通信。

Do you still remember, if we play by ourselves on windows, do we mean that if you directly start multiple es processes, they will form a cluster by themselves?

Deploying the ES cluster on multiple machines in the production environment involves the discovery mechanism of ES, that is, the mechanism by which each node in the cluster discovers each other and then forms a cluster. At the same time, the discovery mechanism is also responsible for the master election of the ES cluster. Regarding the master, Say for a while.

By default, each node in the elasticsearch cluster is eligible to become the master node, stores data, and can also provide query services. These functions are controlled by two attributes.

node.master
node.data
默认情况下这两个属性的值都是true

There is also a coordinate node.

  1. node.master: This attribute indicates whether the node has become 主节点的资格Note: 此属性的值为 true,并不意味着这个节点就是主节点。因为真正的主节点,是由多个具有主节点资格的节点进行选举产生的. Therefore, this attribute only represents whether the node is eligible for the election of the master node.
  2. node.data: This attribute represents the node 是否存储数据.

2.2 master, data and client nodes

es是一种peer to peer,也就是p2p点对点的分布式系统架构, Not the distributed system of master-slave architecture commonly used in hadoop ecology. Each node in the cluster communicates directly with other nodes, rather than the master-slave distributed system architecture in the hadoop ecosystem.

Almost all API operations, such as index, delete, search, etc., do not mean that the client communicates with the master, but the client communicates with any node, and that node forwards the request to the corresponding node for execution.

Two roles, master node and data node. 正常情况下,就只有一个master node. Responsibility is the master node 负责维护整个集群的状态信息, which is the number of 集群元数据信息, 同时在node加入集群或者从集群中下线时,重新分配shard,或者是创建或删除了一个索引. Including each cluster state if there is a change, then master都会负责将集群状态同步给所有的node.

The master node is responsible for receiving all cluster state-related change information, and then push the latest cluster state after the change to all data nodes in the cluster 集群中所有的node都有一份完整的cluster state. It's just that the master node is responsible for maintenance. Other nodes, except for the master, are responsible for data storage and reading and writing, writing to indexes, searching for data, and data nodes.

2.3 cluster.name

If you want to form a multiple node cluster es, first of all the first parameter to be set, is cluster.name, 多个node的cluster.name如果一样,才满足组成一个集群的基本条件.

This cluster.name 默认值是elasticsearch,在生产环境中,一定要修改这个值,否则可能会导致未知的node无端加入集群,造成集群运行异常.

The default discovery mechanism in es is the zen discovery mechanism

zen discoveryThe mechanism provides a unicast discoverycluster discovery mechanism. The communication between nodes during cluster discovery depends on the transport module, which is the network communication module and protocol at the bottom of es.

es默认配置为使用unicast集群发现机制, So that specially configured nodes can form a cluster, rather than any node can form a cluster.

However 默认配置下,unicast是本机,也就是localhost, it is only possible to start multiple nodes on one machine to form a cluster.

Although es will still be provided multicast pluginas a discovery mechanism, it has been used 不建议in production environments. Although we may want the simplicity of multicast, all nodes can automatically join the cluster immediately after receiving a multicast ping. But the multicast mechanism 有很多的问题is very fragile. For example, if the network is slightly adjusted, the node may not be able to find each other.

Therefore, it is now recommended to use the unicast mechanism in the production environment and provide an es seed node as a transit routing node.

(0)master node、data node、network.host

Plan a dedicated master eligible node and data node for the cluster

master node, master eligible node (master selection point), data node

When you configure, you configure multiple nodes to become master eligible nodes, but just say that one node from these master eligible nodes will be elected as the master node, and other master eligible nodes will only take over when the master node fails next. Qualification, but it is still used as a data node.

It is generally recommended to give 3 master eligible nodes:

node.master: true
node.data: false

The remaining nodes are set to data nodes:

node.master: false
node.data: true

But 如果一个小集群,就10个以内的节点,那就所有节点都可以作为master eligible node以及data nodethat’s fine. If the cluster has more than 10 nodes, split the master and data nodes separately.

If your number of nodes is less than 10, a small cluster, then all nodes, no additional configuration is required, the master eligible node is also a data node

By default, es will bind itself to 127.0.0.1, which is ok for running es in a single-node development mode.

However, in order for nodes to communicate with each other to form a cluster, the nodes need to be bound to an ip address. 非回环的地址Generally, it will be configured:

network.host: 192.168.1.10

Once we have configured network.host, es will think that we are migrating from development mode to production mode and will enable a series of bootstrap checks.

2.4 ping

Ping is a process in which a node uses the discovery mechanism to discover other nodes

2.5 unicast

The unicast discovery cluster discovery mechanism requires the configuration of a host list to be used as a router for the gossip communication protocol. If these machines are specified by hostname, they will be resolved to ip addresses during ping.

The two most important configurations of unicast discovery mechanism are as follows:

hosts:用逗号分割的主机列表
hosts.resolve_timeout:hostname被DNS解析为ip地址的timeout等待时长

In simple terms, if you want multiple nodes to find each other and form a cluster, then you have to have 一个中间的公共节点, then 不同的节点on 发送请求to them 公共节点, then these 公共节点交换各自的信息, then let all the other node to node perception exists, and to communicate the final composition of a Cluster. This is the unicast cluster discovery mechanism based on gossip gossip communication protocol.

When a node communicates with a member in the unicast node list, it will receive a complete cluster status, where all nodes in the cluster will be listed.

Then the node communicates with the master through the cluster state and joins the cluster. This means that our unicast list node does not need to list all nodes in the cluster. Just provide a few nodes, such as 3, so that new nodes can be connected.

If we assign several nodes to the cluster as dedicated master nodes, then just list our three dedicated master nodes. Use the following configuration:

discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

Several important configuration items:

cluster.name
node.name
network.host
discovery.zen.ping.unicast.hosts

The main steps:

  1. It has been initially configured, and each node is first bound to the non-loopback ip address through network.host, so that it can communicate with other nodes
  2. bydiscovery.zen.ping.unicast.hosts配置了一批unicast中间路由的node
  3. All nodes can send a ping message to the routing node, and then get the cluster state back from the routing node
  4. Then all nodes will 选举produce a master
  5. All nodes will communicate with the master and then join the master's cluster
  6. The cluster.name must be the same to form a cluster
  7. node.name identifies a name set by ourselves for each node

2.6 Master election

In the ping discovery process, it is also very important to elect a master for the cluster, and the es cluster will automatically complete this operation. It is recommended to set the discovery.zen.ping_timeoutparameter (the default is 3s). If the master election timeout is caused by the slow network or congestion, you can increase this parameter to ensure the stability of the cluster startup.

After the master election of a cluster is completed, each time a new node joins the cluster, a join request will be sent to the master node. You can set to discovery.zen.join_timeoutensure that the node joins the cluster stably, increase the timeout waiting time of the join, if a join fails, the default will be Retry 20 times.

If the master node is stopped or goes down, the nodes in the cluster will perform a ping process again and elect a new master. If it is discovery.zen.master_election.ignore_non_master_pingsset to true, then it will be mandatory to distinguish master candidate nodes. If the node.master is set to false and a ping request is sent to participate in the master election, then these nodes will be ignored because they are not eligible to participate.

discovery.zen.minimum_master_nodesThe parameter is used to set how many master candidate nodes must be required for a newly elected master to connect to the newly elected master. It is also used to set a master candidate node that must be owned in a cluster. If these requirements are not met, then the master node will be stopped, and then a new master will be re-elected. This parameter must be set to the quorum number of our master candidate node. Generally avoid saying that there are only two master candidate nodes, because the quorum of 2 is still 2. If in that situation, any master candidate node goes down, the cluster will not function properly.

2.7 Detection of cluster failure

es has two cluster fault detection mechanisms,

  1. The first is through the master,master会ping集群中所有的其他node,确保它们是否是存活着的
  2. Second, each node will ping the master node to ensure that the master node is alive, otherwise it will 发起一个选举过程.

The following three parameters are used to configure the cluster fault detection process:

ping_interval:每隔多长时间会ping一次node,默认是1s
ping_timeout:每次ping的timeout等待时长是多长时间,默认是30s
ping_retries:如果一个node被ping多少次都失败了,就会认为node故障,默认是3

2.8 Cluster status update

The master node is the node in the cluster that 唯一一个can update the cluster state.

The master node will process a cluster status update event each time, apply the status update, and then transfer the updated status 发布to all nodes in the cluster.

Each node will receive publish message, ack this message, but will not apply this update.

If the master does not obtain an ack response discovery.zen.commit_timeoutfrom at least the discovery.zen.minimum_master_nodes个node within the specified time (30s by default) , then the cluster state change event will be rejected rejectand will not be applied.

But once the specified number of nodes have returned ack messages within a specified time, the cluster state will be deleted commit, and then a message will be sent to all nodes.

After all nodes receive the commit message, they will then apply the previously received cluster state to their local state copy.

Then the master will wait for all nodes to respond again to see if it is successful to update its local copy status. Within a waiting timeout period, if a response is received, it will continue to process the next update status stored in the memory queue.

discovery.zen.publish_timeoutThe default is 30s. This timeout waiting period is calculated from the plublish cluster state.

2.9 Do not block cluster operations due to master downtime

If the cluster is to operate normally, there must be a master and a discovery.zen.minimum_master_nodesspecified number of master candidate nodes, all running. discovery.zen.no_master_blockYou can control what kind of operation should be rejected when the master is instantaneous. There are two options:

all:一旦master宕机,那么所有的操作都会被拒绝
write:这是默认的选项,所有的写操作都会被拒绝,但是读操作是被允许的

Guess you like

Origin blog.csdn.net/qq_21383435/article/details/109300046