Como construir um cluster de alta disponibilidade do Hadoop

1. Diagrama de configuração do cluster

        Antes de construir um cluster, precisamos considerar a configuração de cada máquina do cluster. Aqui tomamos quatro máquinas como exemplo, o diagrama de configuração é o seguinte:

集群配置图
ant151         ant152         ant153          ant154
NameNode       NameNode
DataNode       DataNode       DataNode        DataNode
NodeManager    NodeManager    NodeManager     NodeManager
                              ResourceManager ResourceManager
JournalNode    JournalNode    JournalNode
DFSZKFConler   DFSZKFConler
zk0				zk1 			zk2

         ant151, ant152, ant153, ant154 são quatro nomes de host.

        ant151 e ant152 atuam como o NameNode principal e o NameNode secundário, respectivamente.

        Cada um dos DataNode e NodeManager é usado como um nó para executar a alternância ativo/em espera.

        Defino ResourceManager como ant153 e ant154 como mestre e backup.

        JournalNode defini como ant151, ant152, ant153. JournalNode é equivalente ao processo daemon NameNode, há pelo menos 3 JournalNodes para 100 nós e pelo menos 5 para mais de 100 nós. Para obter detalhes, consulte  a função de journalnode .

        Quando o DFSZKFConler está altamente disponível, ele é responsável por monitorar o status do NN (NameNode) e gravar as informações de status no ZK em tempo hábil. Ele obtém o status de saúde do NN chamando periodicamente uma interface específica no NN por meio de um thread independente. O FC também tem o direito de escolher quem será o NN Ativo, pois são apenas dois nodos no máximo, e a atual estratégia de seleção é relativamente simples (primeiro a chegar, primeiro a ser servido, rodízio)

        zk0, zk1, zk2 são IDs de servidor no cluster zookeeper.

Em segundo lugar, configure cada host

        Primeiro crie uma máquina virtual e configure a rede.        

        Em seguida, defina cada nome de host e use o comando bash para atualizar e entrar em vigor.

hostnamectl set-hostname ant151

        Em seguida, desligue o firewall em cada host

systemctl stop firewalld
systemctl disable firewalld

        Em seguida, sincronize o horário de cada host. Primeiro instale o serviço de tempo de sincronização.

# 同步时间
	[root@xsqone31 ~]# yum install -y ntpdate
	[root@xsqone31 ~]# ntpdate time.windows.com
	[root@xsqone31 ~]# date

        Depois de instalar o serviço, use o comando

crontab -e

        Digite o comando, salve e saia.

* */5 * * * /usr/sbin/ntpdate -u time.windows.com

        Em seguida, recarregue e inicie o serviço de cronometragem

	# 重新加载
	[root@xsqone31 ~]# service crond reload
	# 启动定时任务
	[root@xsqone31 ~]# service crond start

        Em seguida, defina o login sem senha, primeiro obtenha a chave pública local

# 配置免密登录
ssh-keygen -t rsa -P ""

        Após obtê-la, passe a chave pública para outros hosts

# 将本地公钥拷贝到要免密登录的目标机器
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant151
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant152
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant153
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant154
# 测试
ssh -p22 主机名

        Finalmente, os serviços de instalação de comutação ativa e standby

yum install psmisc -y

3. Instale o JDK

        Instale o JDK para cada máquina e configure os arquivos

Em quarto lugar, crie um cluster zookeeper

        Os scripts podem ser usados ​​aqui. Aqui vou armazenar o pacote compactado na pasta /opt/install, extraí-lo para a pasta /opt/software e renomeá-lo para zk345. Você pode modificar o caminho de instalação de acordo com suas necessidades. O específico roteiro é o seguinte:

#! /bin/bash
zk=true
hostname=`hostname`
if [ "$zk" = true ];then
	echo 'ZK安装开始'
	tar -zxf /opt/install/zookeeper-3.4.5-cdh5.14.2.tar.gz -C /opt/software/
	mv /opt/software/zookeeper-3.4.5-cdh5.14.2 /opt/software/zk345
	cp /opt/software/zk345/conf/zoo_sample.cfg /opt/software/zk345/conf/zoo.cfg
	mkdir -p /opt/software/zk345/datas
	sed -i '12c\dataDir=/opt/software/zk345/datas' /opt/software/zk345/conf/zoo.cfg
	echo 'server.0='$hostname':2287:3387' >> /opt/software/zk345/conf/zoo.cfg
	echo "0" > /opt/software/zk345/datas/myid
	sed -i '73a\export PATH=$PATH:$ZOOKEEPER_HOME/bin' /etc/profile
	sed -i '73a\export ZOOKEEPER_HOME=/opt/software/zk345/' /etc/profile
	sed -i '73a\#ZK' /etc/profile
	source /etc/profile
	echo 'ZK安装完成'
fi

        Após a conclusão da instalação, você precisa configurar o arquivo zoo.cfg no diretório zk345/datas

vim /opt/soft/zk345/datas/zoo.cfg
# zookeeper集群的配置
	server.0=ant151:2287:3387
	server.1=ant152:2287:3387
	server.2=ant153:2287:3387
	server.3=ant154:2287:3387

        Depois, transfira o arquivo de configuração e zk345 para outros hosts, e transfira o código ant152 da seguinte forma, e outros são omitidos:

# 将安装目录传输给其他主机
scp -r /opt/software/zk345 root@ant152:/opt/soft/
# 将配置文件传输给其他主机
scp -r /etc/profile root@1522:/etc/

        Em seguida, modifique /opt/soft/zk345/datas/myid de acordo com a configuração no arquivo de configuração do zookeeper. Como acima, ant151 é 0, ant152 é 1, ant153 é 2 e ant154 é 3.

        Para facilitar a inicialização do cluster, escrevemos um script de operação do cluster.

#! /bin/bash
case $1 in 
"start"){
	for i in ant151 ant152 ant153
	do
	ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh start"
	done
};;
"stop"){
	for i in ant151 ant152 ant153
	do
	ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh stop"
	done
};;
"status"){
	for i in ant151 ant152 ant153
	do
	ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh status"
	done
};;
esac

Quarto, crie um cluster Hadoop

        Primeiro instale o hadoop

  tar -zxf /opt/install/hadoop-3.1.3.tar.gz -C /opt/software/
  mv /opt/software/hadoop-3.1.3 /opt/software/hadoop313
  chown -R root:root /opt/software/hadoop313/

      Vá para o diretório hadoop313/etc/hadoop e configure o arquivo de configuração

        Configuração do arquivo core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://gky</value>
		<description>逻辑名称,必须与hdfs-site.xml中的dfs.nameservices值保持一致</description>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/opt/software/hadoop313/tmpdata</value>
		<description>namenode上本地的hadoop临时文件夹</description>
	</property>
	<property>
		<name>hadoop.http.staticuser.user</name>
		<value>root</value>
		<description>默认用户</description>
	</property>
	<property>
		<name>hadoop.proxyuser.root.hosts</name>
		<value>*</value>
		<description></description>
	</property>
	<property>
		<name>hadoop.proxyuser.root.groups</name>
		<value>*</value>
		<description></description>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
		<description>读写文件的buffer大小为:128K</description>
	</property>
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>ant151:2181,ant152:2181,ant153:2181</value>
		<description></description>
	</property>
	<property>
		<name>ha.zookeeper.session-timeout.ms</name>
		<value>10000</value>
		<description>hadoop链接zookeeper的超时时长设置为10s</description>
	</property>
</configuration>

        Configuração do arquivo hadoop-env.sh

export JAVA_HOME=/opt/software/jdk180
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

        Configuração do arquivo hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
		<description>Hadoop中每一个block的备份数</description>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/opt/software/hadoop313/data/dfs/name</value>
		<description>namenode上存储hdfs名字空间元数据目录</description>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/opt/software/hadoop313/data/dfs/data</value>
		<description>datanode上数据块的物理存储位置</description>
	</property>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>ant151:9869</value>
		<description></description>
	</property>
	<property>
		<name>dfs.nameservices</name>
		<value>gky</value>
		<description>指定hdfs的nameservice,需要和core-site.xml中保持一致</description>
	</property>
	<property>
		<name>dfs.ha.namenodes.gky</name>
		<value>nn1,nn2</value>
		<description>gky为集群的逻辑名称,映射两个namenode逻辑名</description>
	</property>
	<property>
		<name>dfs.namenode.rpc-address.gky.nn1</name>
		<value>ant151:9000</value>
		<description>namenode1的RPC通信地址</description>
	</property>
	<property>
		<name>dfs.namenode.http-address.gky.nn1</name>
		<value>ant151:9870</value>
		<description>namenode1的http通信地址</description>
	</property>
	
	<property>
		<name>dfs.namenode.rpc-address.gky.nn2</name>
		<value>ant152:9000</value>
		<description>namenode2的RPC通信地址</description>
	</property>
	<property>
		<name>dfs.namenode.http-address.gky.nn2</name>
		<value>ant152:9870</value>
		<description>namenode2的http通信地址</description>
	</property>
	<property>
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://ant151:8485;ant152:8485;ant153:8485/gky</value>
		<description>指定NameNode的edits元数据的共享存储位置(JournalNode列表)</description>
	</property>
	<property>
		<name>dfs.journalnode.edits.dir</name>
		<value>/opt/software/hadoop313/data/journaldata</value>
		<description>指定JournalNode在本地磁盘存放数据的位置</description>
	</property>	
	<!-- 容错 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
		<description>开启NameNode故障自动切换</description>
	</property>
	<property>
		<name>dfs.client.failover.proxy.provider.gky</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
		<description>失败后自动切换的实现方式</description>
	</property>
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
		<description>防止脑裂的处理</description>
	</property>
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/root/.ssh/id_rsa</value>
		<description>使用sshfence隔离机制时,需要ssh免密登陆</description>
	</property>	
	<property>
		<name>dfs.permissions.enabled</name>
		<value>false</value>
		<description>关闭HDFS操作权限验证</description>
	</property>
	<property>
		<name>dfs.image.transfer.bandwidthPerSec</name>
		<value>1048576</value>
		<description></description>
	</property>	
	<property>
		<name>dfs.block.scanner.volume.bytes.per.second</name>
		<value>1048576</value>
		<description></description>
	</property>
</configuration>

        Configuração do arquivo mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
		<description>job执行框架: local, classic or yarn</description>
		<final>true</final>
	</property>
	<property>
		<name>mapreduce.application.classpath</name>
		<value>/opt/software/hadoop313/etc/hadoop:/opt/software/hadoop313/share/hadoop/common/lib/*:/opt/software/hadoop313/share/hadoop/common/*:/opt/software/hadoop313/share/hadoop/hdfs/*:/opt/software/hadoop313/share/hadoop/hdfs/lib/*:/opt/software/hadoop313/share/hadoop/mapreduce/*:/opt/software/hadoop313/share/hadoop/mapreduce/lib/*:/opt/software/hadoop313/share/hadoop/yarn/*:/opt/software/hadoop313/share/hadoop/yarn/lib/*</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>ant151:10020</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>ant151:19888</value>
	</property>
	
	<property>
		<name>mapreduce.map.memory.mb</name>
		<value>1024</value>
		<description>map阶段的task工作内存</description>
	</property>
	<property>
		<name>mapreduce.reduce.memory.mb</name>
		<value>2048</value>
		<description>reduce阶段的task工作内存</description>
	</property>
	
</configuration>

        Configuração do arquivo yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
	<property>
		<name>yarn.resourcemanager.ha.enabled</name>
		<value>true</value>
		<description>开启resourcemanager高可用</description>
	</property>
	<property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>yrcabc</value>
		<description>指定yarn集群中的id</description>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
		<description>指定resourcemanager的名字</description>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname.rm1</name>
		<value>ant153</value>
		<description>设置rm1的名字</description>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname.rm2</name>
		<value>ant154</value>
		<description>设置rm2的名字</description>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>ant153:8088</value>
		<description></description>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>ant154:8088</value>
		<description></description>
	</property>	
	<property>
		<name>yarn.resourcemanager.zk-address</name>
		<value>ant151:2181,ant152:2181,ant153:2181</value>
		<description>指定zk集群地址</description>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
		<description>运行mapreduce程序必须配置的附属服务</description>
	</property>
	<property>
		<name>yarn.nodemanager.local-dirs</name>
		<value>/opt/software/hadoop313/tmpdata/yarn/local</value>
		<description>nodemanager本地存储目录</description>
	</property>
	<property>
		<name>yarn.nodemanager.log-dirs</name>
		<value>/opt/software/hadoop313/tmpdata/yarn/log</value>
		<description>nodemanager本地日志目录</description>
	</property>
	
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>2048</value>
		<description>resource进程的工作内存</description>
	</property>
	<property>
		<name>yarn.nodemanager.resource.cpu-vcores</name>
		<value>2</value>
		<description>resource工作中所能使用机器的内核数</description>
	</property>
	<property>
		<name>yarn.scheduler.minimum-allocation-mb</name>
		<value>256</value>
		<description></description>
	</property>
	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
		<description></description>
	</property>
	<property>
		<name>yarn.log-aggregation.retain-seconds</name>
		<value>86400</value>
		<description>日志保留多少秒</description>
	</property>
	<property>
		<name>yarn.nodemanager.vmem-check-enabled</name>
		<value>false</value>
		<description></description>
	</property>
	<property>
		<name>yarn.application.classpath</name>
		<value>/opt/software/hadoop313/etc/hadoop:/opt/software/hadoop313/share/hadoop/common/lib/*:/opt/software/hadoop313/share/hadoop/common/*:/opt/software/hadoop313/share/hadoop/hdfs/*:/opt/software/hadoop313/share/hadoop/hdfs/lib/*:/opt/software/hadoop313/share/hadoop/mapreduce/*:/opt/software/hadoop313/share/hadoop/mapreduce/lib/*:/opt/software/hadoop313/share/hadoop/yarn/*:/opt/software/hadoop313/share/hadoop/yarn/lib/*</value>
		<description></description>
	</property>
	<property>
		<name>yarn.nodemanager.env-whitelist</name>
		<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
		<description></description>
	</property>
</configuration>

        Configuração do arquivo de trabalhadores

ant151
ant152
ant153
ant154

        Configuração de variáveis ​​de ambiente do sistema (/etc/profile)

#hadoop
export JAVA_LIBRARY_PATH=/opt/software/hadoop313/lib/native
export HADOOP_HOME=/opt/software/hadoop313
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib

        Por fim, transfira os arquivos de configuração e o hadoop para outros hosts

# 将安装目录传输给其他主机
scp -r /opt/software/hadoop313 root@ant152:/opt/software/
# 将配置文件传输给其他主机
scp -r /etc/profile root@ant1522:/etc/

5. Inicie o cluster pela primeira vez

A primeira inicialização do cluster e instruções relacionadas
1. Inicie o cluster zk
2. Inicie o serviço journalnode de ant151, ant152, ant153: hdfs --daemon start journalnode
3. Formate o namenode hfds em ant151: hdfs namenode -format
4. Inicie o serviço namenode em ant151: hdfs --daemon start namenode
5. Sincronize as informações do namenode na máquina ant152 [root@ant152 soft]# hdfs namenode -bootstrapStandby
6. Inicie o serviço namenode em ant152: hdfs --daemon start namenode
   Visualize o status do nó namenode: hdfs haadmin -getServiceState nn1 |nn2
7. Feche todos os serviços relacionados a dfs [root@ant151 soft]# stop-dfs.sh
8. Formate zk:[root@ant151 soft]# hdfs zkfc -formatZK
9. Inicie dfs: [root @ant151 soft] # start-dfs.sh 
10. Iniciar yarn: [root@ant151 soft]# start-yarn.sh 
   Exibir status do nó do gerenciador de recursos: yarn rmadmin -getServiceState rm1|rm2

Acho que você gosta

Origin blog.csdn.net/Alcaibur/article/details/129061281
Recomendado
Clasificación