Table of contents
Step 1: Stop shards allocation
The second step: graceful stop
Step 4: Restart shards allocation
Other articles in this series:
A Preliminary Study of CrateDB (1): Docker Deployment of CrateDB Cluster
CrateDB Preliminary Study (2): PARTITION, SHARDING AND REPLICATION
A Preliminary Study of CrateDB (3): JDBC
CrateDB Preliminary Study (4): Optimistic Concurrency Control (Optimistic Concurrency Control)
The main content of this article is graceful shutdown and rolling upgrade
Assuming that a node (node01) in the cluster needs to be shut down for maintenance, a new node is started to join the cluster
Cluster initial state and shards distribution of table 'staff1'
Table 'staff1' shards before stopping node01
Step 1: Stop shards allocation
First change the value of the shard allocation parameter ( cluster.routing.allocation.enable ) to new_primaries
SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'new_primaries';
Check the parameter cluster.graceful_stop.min_availability : primaries only reallocate the primary shard
The second step: graceful stop
Perform decommission operation on node01
ALTER CLUSTER decommission 'node01';
Regarding the use of decommission operations, the official explanation:
To initiate a graceful shutdown that behaves as described in the introduction of this document, the Decommission Statement must be used.
After decommission , cratedb moved the primary shard:0 on the original node01 to node03
Since the value of cluster.graceful_stop.min_availability is primaries, even if there is a copy of a shard on node01, only the primary shard will be moved and the copy will be discarded. For the cluster.graceful_stop.min_availability parameter, see the official documentation:
cluster.graceful_stop.min_availability
Default:
primaries
Runtime:
yes
Allowed Values:
none | primaries | full
none
: No minimum data availability is required. The node may shut down even if records are missing after shutdown.
primaries
: At least all primary shards need to be available after the node has shut down. Replicas may be missing.
full
: All records and all replicas need to be available after the node has shut down. Data availability is full.
Since node01 is the master, re-elect node03 as the master
After the decommission is completed, the process of node01 is killed
Note that the cluster status at this time:
Step 3: Start a new node
After starting a new node04, primary shard 0 is assigned from node03 to node04
Step 4: Restart shards allocation
Finally, change the shards allocation parameter to all
SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'all';