One, background
ES cluster non-stop migration, the migration process does not affect business use. Used cluster version 6.3.0.
Second, the program
1, business access the cluster through the domain name;
2, the new machine building cluster;
3, the original cluster snapshot, if there is missing data can be restored from a snapshot;
4, old and new clusters merge, and force the old cluster to the new cluster data migration data through a balanced way;
5, the original old cluster offline.
Third, the implementation
1, the new machine building cluster approach
1) preparation machine (root set): Reference official website
vim / etc / Security / limits.conf released files and memory limit * Soft MEMLOCK Unlimited * Hard MEMLOCK Unlimited * - nofile 65536 * - Core Unlimited entry into force: Exit sign in vim / etc / the sysctl.conf add vm.max_map_count = 262144 vm. swappiness = 1 entry into force: sysctl -p
2) Node configuration: Reference official website
The order of precedence for cluster settings is: transient cluster settings persistent cluster settings settings in the elasticsearch.yml configuration file.
Precautions:
1) ES heap memory required within 32G, preferably 26G.
Official website users say Reference 1 Reference 2
More 2) shard, QPS will be lower ( more details ), the size of the officially recommended value Shard 20-40GB ( details ), 1GB per slice corresponding to heap memory clusters between 20-25 (26G may store 520 shard ) ( details );
On each node may store a number of fragments available heap size proportional to, but not compulsory Elasticsearch fixed limits. There is a good rule of thumb: GB ensures that for each node has been configured, the number of fragments is maintained at 20 or less. If a node has 30GB of heap memory, then it can be up to 600 slices, but within this limit range, the fewer the number of slices you set, the better ( more details ). Shard number of each node setting method , shard cluster allocation strategy setting method .
3) master node and the client memory and CPU are relatively small amount of the corresponding parameter can be set smaller.
2, a snapshot of the cluster method: by building their own Hadoop cluster to achieve (hdfs need to install plug-ins)
curl -XPUT http://XXX/_snapshot/hdfs_repo -H 'Content-Type: application/json' -d ' { "type": "hdfs", "settings": { "uri": "hdfs://XXX:8800/", "path": "hdfs_repo", "compress": true } }'
3, old and new merged cluster: a master can be connected can initiate two clusters, so that two clusters can be combined into a cluster
Note: the same index will be overwritten if the cluster index RED part, and after confirming that no influence may be reallocated by slicing reroute
POST /_cluster/reroute?retry_failed=true&pretty { }
4, the cluster data migration method of: filtering configuration is achieved by slicing the data migration away from the old machine.
curl -XPUT http://XXX/_cluster/settings -H 'Content-Type: application/json' -d ' { "transient": { "cluster.routing.allocation.exclude._name": "XXX" } }'
Fourth, other
1, hot and cold separation data
1 , so that all fragments are not dispensed on the SSD; curl -XPUT " HTTP: // XXX / * / _ = 120s master_timeout Settings? " -H ' the Content-the Type: file application / JSON ' -d ' { " index.routing .allocation.exclude.box_type " : " Hot " } ' 2 , so that the test can be partitioned SSD; the PUT test / _settings { " index.routing.allocation.include.box_type " : " Hot " } 3 , enable tag-aware policy (no it does not matter) PUT / _cluster / Settings { "transient" : { "cluster.routing.allocation.awareness.attributes": "box_type" } }
2, using esrally pressure-measurement
3, Kibana
4, rolling restart
1 , cluster data equalized suspend the PUT _cluster / Settings { " transient " : { " cluster.routing.rebalance.enable " : " none " } } 2 , prohibit fragmentation assigned the PUT _cluster / Settings { " transient " : { " Cluster. routing.allocation.enable " : " none " } } . 3 , the refresh index (optional) curl -XPOST HTTP: // XXX / _flush / synced . 4, Where updating operation performed after restarting node 5 , to allow re-slice assigned the PUT _cluster / Settings { " persistent " : { " cluster.routing.allocation.enable " : null } } . 6 , to be repeated after 2 clustered recovery green ~ step 5 until the update is complete restart all the nodes 7 , balanced cluster data recovery