[ElasticSearch from entry to abandonment series 5] ElasticSearch distributed cluster construction

The previous blog introduced ES principles and basic usage methods, because the most powerful part of ES lies in its PB-level quasi-real-time search capabilities. Of course, PB-level data is still very difficult to store and retrieve for a single server , so ES Generally use distributed clusters to build.

Distributed related concepts

level concept
Cluster A cluster is organized by one or more nodes, which together hold the entire data and provide index and search functions together. A cluster is identified by a unique name, which is " elasticsearch " by default . This name is important because a node can only join this cluster by specifying the name of a certain cluster
Node A node is a server in the cluster. As a part of the cluster, it stores data and participates in the indexing and search functions of the cluster . Similar to a cluster, a node is also identified by a name. During this management process, it is determined which servers in the network correspond to which nodes in the Elasticsearch cluster. A node can join a specified cluster by configuring the cluster name. By default, each node will be arranged to join a cluster called "elasticsearch", which means that if you start several nodes in your network and assume they can find each other, they will automatically To form and join a cluster called "elasticsearch". In a cluster, you can have as many nodes as you want. Moreover, if there are currently no Elasticsearch nodes running in your network, start a node at this time, and a cluster called "elasticsearch" will be created and added by default.
Fragmentation An index can store a large amount of data beyond the hardware limit of a single node. For example, an index with 1 billion documents occupies 1TB of disk space, and no node has such a large disk space; or a single node processes search requests and responds too slowly. To solve this problem, Elasticsearch provides the ability to divide the index into multiple parts, which are called shards. When you create an index, you can specify the number of shards you want. Each shard itself is also a fully functional and independent "index", this "index" can be placed on any node in the cluster. Sharding is very important: it allows you to split/expand your content capacity horizontally, and allows you to perform distributed and parallel operations on shards, thereby improving performance/throughput. As for how a shard is distributed and how its documents are aggregated The search request is completely managed by Elasticsearch, which is transparent to you as a user
copy In a network/cloud environment, failure can happen at any time. A certain shard/node is offline for some reason or disappears for any reason. In this case, a failover mechanism is very useful. And it is highly recommended. For this purpose, Elasticsearch allows you to create one or more copies of a shard. These copies are called replicated shards, or simply called replications. The reason why replication is important: Provides high availability in case of shard/node failure. For this reason, it is very important to note that the replicated shard is never placed on the same node as the original/primary shard . Expand your search volume/throughput, because searches can be run in parallel on all replications. In short, each index can be divided into multiple fragments. An index can also be copied 0 times (meaning no copying) or multiple times. Once replicated, each index has a difference between the primary shard (the original shard as the source of replication) and the replicated shard (the copy of the primary shard). The number of shards and replications can be specified when the index is created. After the index is created, you can dynamically change the number of replications at any time, but you cannot change the number of shards afterwards. By default, each index in Elasticsearch is sharded with 5 primary shards and 1 replication , which means that if your cluster has at least two nodes, your index will have 5 primary shards and Another 5 replicated shards (1 full copy), so there are 10 shards in total for each index

The overall cluster architecture is as follows: each server is a Node, an index is logically together, and physically can be fragmented and replicated to different machines:
Insert picture description here

Distributed cluster construction

A distributed ES cluster can be built through the following steps. If our cluster requires three nodes, these three nodes can be three ES instances on the same machine, or on three machines, of course. It is not allowed to add three instances on one machine to process.

Copy node

Copy the three ES folders as an instance of three nodes, that is, copy three servers

Insert picture description here

Modify node configuration

Modify the configuration file of each service node separately, modify the elasticsearch-cluster\elasticsearch-node*\config\elasticsearch.yml configuration file

#节点1的配置信息:
#集群名称,保证唯一
cluster.name: elasticsearch-tml
#节点名称,必须不一样
node.name: node-1
#必须为本机的ip地址
network.host: 127.0.0.1
#服务端口号,在同一机器下必须不一样
http.port: 9200
#集群间通信端口号,在同一机器下必须不一样
transport.tcp.port: 9300
#设置集群自动发现机器ip集合
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300","127.0.0.1:9301","127.0.0.1:9302"]
#节点2的配置信息:
#集群名称,保证唯一
cluster.name: elasticsearch-tml
#节点名称,必须不一样
node.name: node-2
#必须为本机的ip地址
network.host: 127.0.0.1
#服务端口号,在同一机器下必须不一样
http.port: 9201
#集群间通信端口号,在同一机器下必须不一样
transport.tcp.port: 9301
#设置集群自动发现机器ip集合
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300","127.0.0.1:9301","127.0.0.1:9302"]
#节点3的配置信息:
#集群名称,保证唯一
cluster.name: elasticsearch-tml
#节点名称,必须不一样
node.name: node-3
#必须为本机的ip地址
network.host: 127.0.0.1
#服务端口号,在同一机器下必须不一样
http.port: 9202
#集群间通信端口号,在同一机器下必须不一样
transport.tcp.port: 9302
#设置集群自动发现机器ip集合
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300","127.0.0.1:9301","127.0.0.1:9302"]

Start three nodes

Start three nodes separately. The first node is successfully started and added to the cluster after the 2 and 3 nodes are started. The
Insert picture description here
second node is started successfully and joined the
Insert picture description here
third node in the cluster with node-1 as the master. Succeeded and joined the cluster with node-1 as the master
Insert picture description here

Add an index

We create an index and view it in the head. There are 5 shards and one copy.
Insert picture description here
Create a piece of data: The
Insert picture description here
document is created successfully:
Insert picture description here

Guess you like

Origin blog.csdn.net/sinat_33087001/article/details/108092358