Hadoop+Spark+hbase cluster dynamically adds nodes

If a cluster resource is not enough and needs to be expanded, it can be expanded dynamically without shutting down. The specific operations are as follows:
 
 
 
  
 
 
 
192.168.111.11 lyy1 ---master node
 
 
192.168.111.12 lyy2
 
 
192.168.111.13 lyy3
 
 
192.168.111.14 lyy4
 
 
Added:
 
 
192.168.111.15 lyy5
 
 
192.168.111.16 lyy6
1. Clone two virtual machines from the lyy1 node, ensure that all configurations are the same as the software, and then modify the ip and hostname
(This cluster is a virtual cluster based on proxmox, which can easily replicate, switch virtual machines, etc. If it is a physical cluster, you can copy the master node image to the new node)
 
 
 
 
vim /etc/network/interfaces
 
 
vim /etc/hostname
2. Modify vim /etc/hosts and add ip mapping. Use batch commands and sync to all machines
 
 
 
 
for i in $(seq 1 6); do echo lyy$i; scp /etc/hosts root@lyy$i:/etc/;done
At the same time, it is necessary to modify the workers of hadoop, the slaves of spark, the regionservers of hbase, and increase the host name.
 
 
 
 
for i in $(seq 1 6); do echo lyy$i; scp /opt/hadoop-3.0.0/etc/hadoop/workers root@lyy$i:/opt/hadoop-3.0.0/etc/hadoop;done
 
 
for i in $(seq 2 6); do echo lyy$i; scp /opt/hbase-1.2.4/conf/regionservers root@lyy$i:/opt/hbase-1.2.4/conf;done
 
 
for i in $(seq 2 6);do echo lyy$i; scp /opt/spark-2.2.0-bin-hadoop2.7/conf/slaves root@lyy$i:/opt/spark-2.2.0-bin-hadoop2.7/conf;done
 
 

 
 
Also sync hbase-site.xml configuration file
 
 
for i in $(seq 2 6); do echo lyy$i; scp /opt/hbase-1.2.4/conf/hbase-site.xml root@lyy$i:/opt/hbase-1.2.4/conf;done
 
 
for i in $(seq 1 6); do echo lyy$i; ssh lyy$i "cp /opt/hbase-1.2.4/conf/hbase-site.xml /opt/spark-2.2.0-bin-hadoop2.7/conf && cp /opt/hbase-1.2.4/conf/hbase-site.xml /opt/hadoop-3.0.0/etc/hadoop";done
Note: You can directly start the following processes on the newly added nodes without restarting the cluster:
3.hadoop adds datanode nodes
 
 
 
 
hadoop-daemon.sh start datanode starts the DataNode process
 
 
yarn-daemon.sh start nodemanager starts the NodeManager process
4. Spark adds worker nodes
 
 
 
 
start-slave.sh spark://lyy1:7077 starts the Worker process
5.Hbase adds RegionServer
 
 
 
 
hbase-daemon.sh start regionserver starts the HRegionServer process
 
 
hbase-daemon.sh start zookeeper starts the HquorumPeer process
 
 
Enter status in the hbase shell to view the cluster status
6. Load Balancing
 
 
 
 
If it is not balanced, the cluster will store all new data on the new node, which will reduce work efficiency:
 
 
View hdfs node status: hdfs dfsadmin --report
 
 
1048576(=1Mb/s)
 
 
104857600(=100Mb/s)
 
  
 
 
 
#Set the bandwidth for copying data between different nodes is limited, the default is 1MB/s
 
  
 
 
 
#Set if the disk usage of a datanode is 1% higher than the average level, Blocks will be transferred to other datanodes that are lower than the average level, that is, the usage difference of each node does not exceed 1%.
 
 
or:
 
 
start-balancer.sh
 
 
stop-balancer.sh

Utilization before load balancing
After load balancing, the hard disk usage of each node tends to be balanced:

Utilization after load balancing
In addition, Hbase also needs load balancing:
Enter in hbase shell: balance_switch true
At this point, the node expansion can be completed. Now the cluster has 6 nodes. You can view the nodes on the monitoring pages of hadoop, spark, and hbase respectively.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324625583&siteId=291194637