目录
这个系列的其他文章:
CrateDB初探(一):CrateDB集群的Docker部署
CrateDB初探(二):PARTITION, SHARDING AND REPLICATION
CrateDB初探(四):乐观并发控制 (Optimistic Concurrency Control )
本篇主要内容是优雅停机和rolling upgrade
假设集群中的一个节点 (node01) 需要停机维护,为此新启动一个节点加入集群
集群初始状态和表‘staff1’的shards分布
在停node01之前表‘staff1’的shards
第一步:停止shards分配
首先把shard分配参数 (cluster.routing.allocation.enable) 值修改为new_primaries
SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'new_primaries';
查看参数cluster.graceful_stop.min_availability: primaries 只对primary shard重新分配
第二步:graceful stop
对node01进行decommission操作
ALTER CLUSTER decommission 'node01';
关于使用decommission操作,官方解释:
To initiate a graceful shutdown that behaves as described in the introduction of this document, the Decommission Statement must be used.
decommission之后,cratedb将原node01上的primary shard:0 移动到了node03上
由于cluster.graceful_stop.min_availability值为primaries,即使node01上原先有某个shard的副本,也只会移动了primary shard,副本会被丢弃。关于cluster.graceful_stop.min_availability参数,详见官方文档:
cluster.graceful_stop.min_availability
Default:
primaries
Runtime:
yes
Allowed Values:
none | primaries | full
none
: No minimum data availability is required. The node may shut down even if records are missing after shutdown.
primaries
: At least all primary shards need to be available after the node has shut down. Replicas may be missing.
full
: All records and all replicas need to be available after the node has shut down. Data availability is full.
由于node01是master,所以重新选举node03为master
decommission完成后,node01的进程被killed
注意,此时集群状态:
第三步:启动一个新节点
新启动一个node04后,primary shard 0 从node03被分配到node04
第四步:重启shards分配
最后,把shards分配参数修改为all
SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'all';