ElasticSearch study notes (4)-basic concepts and construction process and main working principle of ES cluster

Develop a habit, like first and then watch!!!

1 Introduction

UP used to test ES on its own Alibaba Cloud server and Tencent Cloud server . The previous operations on ES and Kibana can be performed normally, but this time the problem has always been a problem when configuring the ES cluster . . Although both are the ES 能够正常启动, but both sides nodes are displayed 找不到对方节点, has been in a state of ping each node . node and as both sides are in this state, resulting in Kibana two servers are not properly connected to the appropriate ES, resulting subsequent The operation cannot be performed normally.

Insert picture description here

After consulting our technical director, I said that the specific reason may be due to 不在同一个局域网internal reasons, or because it is in the WAN. It is recommended to try to open two virtual machines on the local VMware to test first , and see if it is on the local machine. Whether it can start normally, if it starts normally, we will analyze whether it is because of the communication barrier between Alibaba Cloud and Tencent Cloud.

So I tried to test it on this machine today to see if it was a problem with my previous configuration.After the local test, I found that the ES cluster on the machine can be configured and started normally:
Insert picture description here

So the reason can basically be locked between the Alibaba Cloud server and Tencent Cloud server通讯的问题 , which will be discussed later.

Next, let's talk about the main content of this article:

  1. Learn about some basic concepts in RS cluster
  2. How does ES build a cluster and how to set it up to start automatically
  3. What is the main principle of ES cluster work? Here I will compare ES cluster with Mysql cluster, so that I can better understand why ES cluster does this

2. Basic concept of cluster

After understanding how the ES cluster is configured, we need to understand some concepts in the cluster by the way.These concepts have been mentioned before, but we have not explained it in depth. Today we just explain it in detail.

  • 集群
    A cluster is a node group composed of two or more nodes. There will be only one master node in the cluster. The other nodes follow the arrangement of the master node, and all the nodes store ES data together, and these data are all It is stored in these nodes.
  • 节点
    A node is a virtual machine with ES installed, and each node can store data. The node is divided into a master node and a slave node. The master node controls all the slave nodes, and
  • 索引
    We have talked about the concept of index before. The index in ES is equivalent to the concept of database. But this is not exactly the same. Because the index逻辑存储 in ES is just a simple unit , not a real physical storage unit , The real 物理存储unit is a concept we want to talk about below分片 .
  • 分片
    We have already talked about the concept of sharding. All our data is stored in shards. Here we need to talk about it in conjunction with the above index. Generally, we create the index first, which is ours. index, After creating the index, we will add data to the index. At this time, the data is stored in the shards, and the added data may be stored in multiple shards. This makes the relationship between the index and the shards Enter the following figure as shown:
    Insert picture description here
  • 副本
    Finally, we need to know about the concept of copy. In fact, we see that you know what that means. ES cluster in which he was the default will give each slice in 除该分片所在的节点以外的其它的节点creating a slice of the above 副本, but 并不是所有其它的节点will create a copy, ES cluster default This is in 其它的3个节点上创建副本. Doing so can not only guarantee the data容错性 , but also effectively reduce the data存储压力 . If other nodes create copies of the shard, although the fault tolerance is further improved, the stored data of each node is Will increase drastically.

3. How to build an ES cluster

If you don’t know how to install a virtual machine on VMware , you can read my blog: How to install a virtual machine in VMware, and I have shared the software and Linux system through Baidu network disk link.

The ES cluster is very simple to build, just simply configure the elasticsearch.yml file in the config directory.

Here is mainly the configuration information:

#集群的名称
cluster.name: es-cluster
#节点的名称
node.name: es-1
#是否是主节点
node.master: true
#是否是数据节点
node.data: true
#数据存储的目录
path.data: /opt/es/data
#日志存储的目录
path.logs: /opt/es/logs
#本机的ip
network.host: 本机ip
#es对方访问的端口号
http.port: 9200
#es集群内部通信的端口号
transport.tcp.port: 9300
集群中其他机器的IP地址
discovery.zen.ping.unicast.hosts: ["IP地址"]
#避免脑裂,设置集群中能够作为主节点机器的最小个数
discovery.zen.minimum_master_nodes: 2

After a little understanding of these, let's explain their concepts in detail.

  • cluster.name: es-cluster
    This is mainly to define the name of our ES cluster, and this 名称必须要统一, otherwise it cannot be regarded as the same cluster

  • node.name: es-1
    This is the name of the node, for each node 名称必须不一样, otherwise each node cannot be distinguished

  • node.master: true
    Define whether the node can become the master node, 一般每个节点都添加这段代码because after we define it here, the node 不一定is the master node , and the master node selects the most suitable node from all the master node groups after startup.

  • node.data: true
    Define whether the node needs to store data. This depends on the specific situation.Generally, the company is directly ES and storage server 同一台. Then 需要设置, if it is ES server and storage server 不在同一台, then you can不添加 , depending on your own needs.

  • path.data: /opt/es/data
    Everyone can tell from the name that it defines the storage directory of ES data

  • path.logs: /opt/es/logs
    Everyone can see from the name that defines the storage directory of the ES log

  • network.host: 本机ip
    This definition is the IP through which our ES is finally accessed

  • http.port: 9200
    This definition is the port through which our ES is finally accessed. This mainly refers to how we access the ES port in the browser.

  • transport.tcp.port: 9300
    This definition is the port number for communication between nodes in the ES cluster

  • discovery.zen.ping.unicast.hosts: ["IP地址"]
    This is similar to ours 通讯录, the main thing is to fill in the ip of other nodes in the cluster in addition to the ip of the current node, so that each node can know the information of other nodes.

  • discovery.zen.minimum_master_nodes: 2
    Before understanding this property, we need to understand 脑裂this concept first . To

    Insert picture description here

    Insert picture description here

    Insert picture description here

    Insert picture description here
    understand it literally, it means that the brain is split, from the original one brain to two or more brains. In general, the cluster can work normally, but As you all know, there are many abnormal situations in the world. If the cluster is working normally, then the work flow should be like this: But suppose that the communication between node 3 and node 4 and the master node is broken due to the network or some other reasons. , It will become the following: In this case, node 3 and node 4 will naturally think that the master node is down, then at this time they will choose another master from them. Node, the following situation will occur: in this way, there will be two master nodes in the cluster, and each master node thinks that there is no other master node. I play with my own slave node. Not only the internal, but the external must also not know Yes , this will lead to the following situation: when the first request is made, it is distributed to node 1 , because at this time, the ES cluster thinks that node 1 is the master node, so the operation is performed at node 1.

    When the second request arrives, the ES cluster considers node 3 to be the master node, so node 3 is modified.

    The key is that if just after the third request arrives at this time, the ES cluster regards node 1 as the master node , then it will stop and you can see it 数据是没有发生改变. So this is the problem that will occur after the split brain. Client's 请求可能是分发到了多个主节点On, but 主节点之间已经失去了通信, then this will happen after repeated requests 节点之间的数据不统一.

    So we need to solve the problem of split brain. The attribute we set here can help us solve this problem to a great extent. Note that this is only a great deal and cannot completely solve this problem.

    The attribute we set here means the 最小number of machines that can be set as the master node in the cluster , and the value of this number is defined in this way, yes 机器总数的半值+1. Once ES detects that the number of all nodes <this value, it means that the ES cluster has been split brain , It will stop the service.

    Here is an example for you to illustrate, everyone can understand. Here we will take the above example to explain: the
    Insert picture description here
    above example has already explained the process very well.

After explaining the concept of the above attributes, let's take a practical look at this time and configure the ES cluster configuration for the two machines.

Here I choose to directly Clone my other machine.You can choose to directly configure two virtual machines, or you can configure one virtual machine like me, and the other machine can directly Clone this machine.

The process of Clone is as follows:

  1. Create a snapshot of the virtual machine

Insert picture description here

  1. Clone virtual machine through snapshot
    Insert picture description here

  2. Modify the ip of the Clone machine and restart the network.
    Because we are Clone, the IP addresses of the two machines are the same, so we need to modify
    Insert picture description here
    and then modify the IPADDR parameter to
    Insert picture description here
    restart the network
    Insert picture description here

Then we only need to configure the elasticsearch.yml files of ES in our two virtual machines separately
192.168.94.128.:

Insert picture description here

192.168.94.129:

Insert picture description here

There is one more idea here that is needed 打开我们刚才创建的数据以及日志目录的权限, otherwise it cannot be used

Switch to the root user and execute the following command:

chmod 777 /opt/es/data #这个目录需要匹配你自己定义的数据目录
chmod 777 /opt/es/logs #这个目录需要匹配你自己定义的日志目录

After that, we can restart our ES.
Insert picture description here

Insert picture description here
The ES of the two machines have been successfully started, but this does not prove that the two machines are really in the same cluster. At this time, we also need to use a tool Cerebroto manage ES nodes to detect, this thing can help us check Whether the node is really in a cluster. I also share the software.
Link: https://pan.baidu.com/s/1QdQrcD19EfHm6esLWXYzJw
Extraction code: qilh

After downloading, unzip it and run the file as an administrator.

Insert picture description here
After the operation is complete, we can see such an interface:

Insert picture description here
Then we visit the address: http://localhost:9000/
and fill in our ES address to manage our ES cluster:
Insert picture description here
you can see that a cluster has indeed been formed, and es-1 is our master node.

4.ES set to boot automatically

Because I am not here to build an ES cluster on a cloud server, I need to manually start elasticSearch after opening the virtual machine by myself every time. After trying for a few days, I found that this is too annoying. Start it.

We have already talked about the steps to set the boot-up and self-start before when we talked about the distributed file system, mainly the following steps:

  • 编写开机自启动脚本
    We add the boot script of elasticSearch under the /etc/init.d directory
cd /etc/init.d
vi elasticsearch

Paste this script afterwards, but to 注意下面我注释标注的三个地方!!!!!

#chkconfig: 345 63 37
#description: elasticsearch
#processname: elasticsearch-6.3.1

#这里需要填写你自己ES的安装目录,不一样的话记得修改
export ES_HOME=/opt/es/elasticsearch-6.3.1

case $1 in
        start)
                #这里的用户需要填写你自己的ES启动用户,不是es的话,需要修改
                su es<<!
                cd $ES_HOME
                ./bin/elasticsearch -d -p pid
                exit
!
                echo "elasticsearch is started"
                ;;
        stop)
                pid=`cat $ES_HOME/pid`
                kill -9 $pid
                echo "elasticsearch is stopped"
                ;;
        restart)
                pid=`cat $ES_HOME/pid`
                kill -9 $pid
                echo "elasticsearch is stopped"
                sleep 1
                 #这里的用户需要填写你自己的ES启动用户,不是es的话,需要修改
                su es<<!     
                cd $ES_HOME
                ./bin/elasticsearch -d -p pid
                exit
!
                echo "elasticsearch is started"
        ;;
    *)
        echo "start|stop|restart"
        ;;  
esac
exit 0

  • 赋予开机脚本权限
    After saving and exiting, we need to enter the following command:
chmod 777 elasticsearch
  • 添加到开机服务里面同时开启开机自启
chkconfig --add elasticsearch
chkconfig elasticsearch on

In this way, our boot-up and self-starting of ES has been completed. Not only that, we can also directly start, restart, and close the es service through the following commands

#启动es服务
service elasticsearch start
#关闭es服务
service elasticsearch stop
#重启es服务
service elasticsearch restart

5.ES cluster working principle (compared to Mysql cluster)

Before says works ES cluster, we first take a look Mysql cluster works.
Works Mysql cluster is the main 主从复制, meaning that the primary database changes, will want to change should be from the database and when the master database crash The slave database will replace the master database and continue to assume responsibility.

It can be seen that Mysql's solution is mainly 复制the concept adopted , but there are several problems. Once a few machines have problems, then it is obvious that the data of the direct machines is completely gone, leading to the concurrency of all subsequent data All the amounts are transferred to the remaining machines. As shown in the figure below:

Insert picture description here
Performance will definitely decrease.

The second is that there may be a situation where multiple queries and modified operations are being executed before the other main databases crash . Assuming that these operations have not been completed from the database, that is, the replication process has not ended, the main database suddenly crashed. , Then obviously this part of the data does not have a synchronization number, then it is obvious that the subsequent data will have differences, as shown in the following figure:

Insert picture description here
This is mainly caused 数据的不统一.

In this case, let's take a look at how the ES cluster is done?

The scheme adopted by the ES cluster is like this 分片+复制,

In fact 分片+复制, we have already 分片+副本talked about the basic concepts of clusters above .

The main thing is that the ES cluster is not like the Mysql cluster, which stores data on one server at the beginning.On the contrary, it stores all the data on multiple shards, and these shards are stored in all nodes. Above.

And his replication process is the same as the concept of the copy we mentioned above. It is not like a Mysql cluster, all slave databases need to communicate with the master database, and each slave database needs to replicate the data of the master database. This is the case for the ES cluster, which only needs to be replicated between each shard and its copy.

It is precisely because of the above two reasons that ES的数据容错性非常高although the attention is very high, there will still be paralysis.

At this point, all the contents explained in this article have been explained, originality is not easy, codewords are not easy!!

If you think the quality of the article can be helpful or helpful to you, you can pay attention to my public account. Newcomers need your support!!!

Insert picture description here

If you don't look at it, you look good!

Keep watching, you look better!

Guess you like

Origin blog.csdn.net/lovely__RR/article/details/112313237