Elasticsearch is a distributed search and analysis engine. Its cluster management principle is based on the distributed architecture of shard and replica.
In Elasticsearch, each index is divided into multiple shards, and each shard is an independent Lucene index. Shards can be distributed among different nodes for horizontal scaling and high availability. In order to improve data redundancy and fault tolerance, each shard can have one or more copies, which are identical shard copies and can run on the same node or on different nodes.
The primary goal of Elasticsearch cluster management is to enable automatic allocation and redistribution of shards and replicas to ensure high availability and load balancing. When a new node joins the cluster, Elasticsearch automatically assigns some shards to the new node and replicates the replicas to the new node. If a node goes down or fails, Elasticsearch automatically reassigns shards and replicas to ensure data availability and integrity.
The following is a simple Elasticsearch cluster management architecture diagram, showing an Elasticsearch cluster consisting of three nodes:
+----------+ +----------+ +----------+
| Node 1 | | Node 2 | | Node 3 |
| | | | | |
| Data | | Data | | Data |
| Master +----------+ Master | | Master |
| Node | | Node +----------+ Node |
| | | | | |
+----------+ +----------+ +----------+
In this architecture, each node has some data shards and some replicas. Among them, each node has a master node (Master Node), and the master node is responsible for coordinating the allocation and redistribution of fragments, as well as the overall management of the cluster. When a new node joins the cluster, the master node automatically assigns some shards to the new node and replicates the replicas to the new node. If a node goes down or fails, the master node automatically reassigns shards and replicas to ensure data availability and integrity.
The following is a simple example implementation of Elasticsearch cluster management:
- Start the Elasticsearch cluster
First, at least two Elasticsearch nodes need to be started in order to form a cluster. Both nodes can be started with the following command:
bin/elasticsearch -E node.name=node1 -E cluster.name=my_cluster -E path.data=data1 -E path.logs=log1
bin/elasticsearch -E node.name=node2 -E cluster.name=my_cluster -E path.data=data2 -E path.logs=log2
Among them, node.name
specify the node name, cluster.name
specify the cluster name, path.data
and path.logs
specify the storage paths of data and logs respectively.
- add node
To add a node to the cluster, the following command can be used:
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "all"
}
}
This will enable shard auto-allocation so that new nodes can receive shards. Then, start a new node and specify the same cluster name, and the new node will automatically join the cluster.
- View cluster status
To view cluster status and information, you can use the following command:
GET /_cluster/health
GET /_cluster/stats
This will return information about cluster health, number of nodes, number of shards, number of indexes, etc.
- manage index
To manage indexes, you can use the following commands:
- Create an index:
PUT /my_index
- Delete the index:
DELETE /my_index
- Get index information:
GET /my_index/_stats
- Change index settings:
PUT /my_index/_settings
{
"index": {
"refresh_interval": "30s"
}
}
- Add documents to the index:
POST /my_index/_doc
{
"title": "Elasticsearch Tutorial",
"content": "This is a tutorial on Elasticsearch",
"tags": ["elasticsearch", "tutorial"]
}
- Search index:
GET /my_index/_search
{
"query": {
"match": {
"title": "Elasticsearch"
}
}
}
- management node
To manage nodes, the following commands can be used:
- View node information:
GET /_nodes
- View specific node information:
GET /_nodes/node1
- Shut down the node:
POST /_cluster/nodes/node1/_shutdown
This will shut down node1
the node named . After a node shuts down, its shards will be automatically redistributed to other nodes.
Elasticsearch cluster management is commonly used in the following scenarios:
- Handle large-scale data: Elasticsearch can handle a large amount of structured and unstructured data, and is suitable for application scenarios that need to process large-scale data.
- High availability: Through the automatic allocation and reallocation of fragments and copies, Elasticsearch can achieve high availability and fault tolerance, which is suitable for application scenarios that require high availability.
- Realize load balancing: The Elasticsearch cluster can automatically distribute requests to different nodes and fragments to achieve load balancing and performance optimization.
Here are some links to literature on Elasticsearch cluster management:
- Elasticsearch official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html
- Introduction to the principles of Elasticsearch cluster management: https://www.elastic.co/cn/blog/a-deep-dive-into-elasticsearch-cluster-management
- Elasticsearch cluster management best practices: https://www.elastic.co/cn/blog/elasticsearch-cluster-management-best-practices
- Elasticsearch cluster size and performance optimization: https://www.elastic.co/cn/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
- Elasticsearch cluster monitoring and debugging: https://www.elastic.co/cn/blog/monitoring-and-debugging-elasticsearch-performance-and-health