1. Diagrama de configuração do cluster
Antes de construir um cluster, precisamos considerar a configuração de cada máquina do cluster. Aqui tomamos quatro máquinas como exemplo, o diagrama de configuração é o seguinte:
集群配置图
ant151 ant152 ant153 ant154
NameNode NameNode
DataNode DataNode DataNode DataNode
NodeManager NodeManager NodeManager NodeManager
ResourceManager ResourceManager
JournalNode JournalNode JournalNode
DFSZKFConler DFSZKFConler
zk0 zk1 zk2
ant151, ant152, ant153, ant154 são quatro nomes de host.
ant151 e ant152 atuam como o NameNode principal e o NameNode secundário, respectivamente.
Cada um dos DataNode e NodeManager é usado como um nó para executar a alternância ativo/em espera.
Defino ResourceManager como ant153 e ant154 como mestre e backup.
JournalNode defini como ant151, ant152, ant153. JournalNode é equivalente ao processo daemon NameNode, há pelo menos 3 JournalNodes para 100 nós e pelo menos 5 para mais de 100 nós. Para obter detalhes, consulte a função de journalnode .
Quando o DFSZKFConler está altamente disponível, ele é responsável por monitorar o status do NN (NameNode) e gravar as informações de status no ZK em tempo hábil. Ele obtém o status de saúde do NN chamando periodicamente uma interface específica no NN por meio de um thread independente. O FC também tem o direito de escolher quem será o NN Ativo, pois são apenas dois nodos no máximo, e a atual estratégia de seleção é relativamente simples (primeiro a chegar, primeiro a ser servido, rodízio)
zk0, zk1, zk2 são IDs de servidor no cluster zookeeper.
Em segundo lugar, configure cada host
Primeiro crie uma máquina virtual e configure a rede.
Em seguida, defina cada nome de host e use o comando bash para atualizar e entrar em vigor.
hostnamectl set-hostname ant151
Em seguida, desligue o firewall em cada host
systemctl stop firewalld
systemctl disable firewalld
Em seguida, sincronize o horário de cada host. Primeiro instale o serviço de tempo de sincronização.
# 同步时间
[root@xsqone31 ~]# yum install -y ntpdate
[root@xsqone31 ~]# ntpdate time.windows.com
[root@xsqone31 ~]# date
Depois de instalar o serviço, use o comando
crontab -e
Digite o comando, salve e saia.
* */5 * * * /usr/sbin/ntpdate -u time.windows.com
Em seguida, recarregue e inicie o serviço de cronometragem
# 重新加载
[root@xsqone31 ~]# service crond reload
# 启动定时任务
[root@xsqone31 ~]# service crond start
Em seguida, defina o login sem senha, primeiro obtenha a chave pública local
# 配置免密登录
ssh-keygen -t rsa -P ""
Após obtê-la, passe a chave pública para outros hosts
# 将本地公钥拷贝到要免密登录的目标机器
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant151
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant152
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant153
ssh-copy-id -i /root/.ssh/id_rsa.pub -p22 root@ant154
# 测试
ssh -p22 主机名
Finalmente, os serviços de instalação de comutação ativa e standby
yum install psmisc -y
3. Instale o JDK
Instale o JDK para cada máquina e configure os arquivos
Em quarto lugar, crie um cluster zookeeper
Os scripts podem ser usados aqui. Aqui vou armazenar o pacote compactado na pasta /opt/install, extraí-lo para a pasta /opt/software e renomeá-lo para zk345. Você pode modificar o caminho de instalação de acordo com suas necessidades. O específico roteiro é o seguinte:
#! /bin/bash
zk=true
hostname=`hostname`
if [ "$zk" = true ];then
echo 'ZK安装开始'
tar -zxf /opt/install/zookeeper-3.4.5-cdh5.14.2.tar.gz -C /opt/software/
mv /opt/software/zookeeper-3.4.5-cdh5.14.2 /opt/software/zk345
cp /opt/software/zk345/conf/zoo_sample.cfg /opt/software/zk345/conf/zoo.cfg
mkdir -p /opt/software/zk345/datas
sed -i '12c\dataDir=/opt/software/zk345/datas' /opt/software/zk345/conf/zoo.cfg
echo 'server.0='$hostname':2287:3387' >> /opt/software/zk345/conf/zoo.cfg
echo "0" > /opt/software/zk345/datas/myid
sed -i '73a\export PATH=$PATH:$ZOOKEEPER_HOME/bin' /etc/profile
sed -i '73a\export ZOOKEEPER_HOME=/opt/software/zk345/' /etc/profile
sed -i '73a\#ZK' /etc/profile
source /etc/profile
echo 'ZK安装完成'
fi
Após a conclusão da instalação, você precisa configurar o arquivo zoo.cfg no diretório zk345/datas
vim /opt/soft/zk345/datas/zoo.cfg
# zookeeper集群的配置
server.0=ant151:2287:3387
server.1=ant152:2287:3387
server.2=ant153:2287:3387
server.3=ant154:2287:3387
Depois, transfira o arquivo de configuração e zk345 para outros hosts, e transfira o código ant152 da seguinte forma, e outros são omitidos:
# 将安装目录传输给其他主机
scp -r /opt/software/zk345 root@ant152:/opt/soft/
# 将配置文件传输给其他主机
scp -r /etc/profile root@1522:/etc/
Em seguida, modifique /opt/soft/zk345/datas/myid de acordo com a configuração no arquivo de configuração do zookeeper. Como acima, ant151 é 0, ant152 é 1, ant153 é 2 e ant154 é 3.
Para facilitar a inicialização do cluster, escrevemos um script de operação do cluster.
#! /bin/bash
case $1 in
"start"){
for i in ant151 ant152 ant153
do
ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh start"
done
};;
"stop"){
for i in ant151 ant152 ant153
do
ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh stop"
done
};;
"status"){
for i in ant151 ant152 ant153
do
ssh $i "source /etc/profile;/opt/software/zk345/bin/zkServer.sh status"
done
};;
esac
Quarto, crie um cluster Hadoop
Primeiro instale o hadoop
tar -zxf /opt/install/hadoop-3.1.3.tar.gz -C /opt/software/
mv /opt/software/hadoop-3.1.3 /opt/software/hadoop313
chown -R root:root /opt/software/hadoop313/
Vá para o diretório hadoop313/etc/hadoop e configure o arquivo de configuração
Configuração do arquivo core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://gky</value>
<description>逻辑名称,必须与hdfs-site.xml中的dfs.nameservices值保持一致</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/software/hadoop313/tmpdata</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
<description>默认用户</description>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
<description></description>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
<description></description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description>读写文件的buffer大小为:128K</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>ant151:2181,ant152:2181,ant153:2181</value>
<description></description>
</property>
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>10000</value>
<description>hadoop链接zookeeper的超时时长设置为10s</description>
</property>
</configuration>
Configuração do arquivo hadoop-env.sh
export JAVA_HOME=/opt/software/jdk180
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Configuração do arquivo hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Hadoop中每一个block的备份数</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/software/hadoop313/data/dfs/name</value>
<description>namenode上存储hdfs名字空间元数据目录</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/software/hadoop313/data/dfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ant151:9869</value>
<description></description>
</property>
<property>
<name>dfs.nameservices</name>
<value>gky</value>
<description>指定hdfs的nameservice,需要和core-site.xml中保持一致</description>
</property>
<property>
<name>dfs.ha.namenodes.gky</name>
<value>nn1,nn2</value>
<description>gky为集群的逻辑名称,映射两个namenode逻辑名</description>
</property>
<property>
<name>dfs.namenode.rpc-address.gky.nn1</name>
<value>ant151:9000</value>
<description>namenode1的RPC通信地址</description>
</property>
<property>
<name>dfs.namenode.http-address.gky.nn1</name>
<value>ant151:9870</value>
<description>namenode1的http通信地址</description>
</property>
<property>
<name>dfs.namenode.rpc-address.gky.nn2</name>
<value>ant152:9000</value>
<description>namenode2的RPC通信地址</description>
</property>
<property>
<name>dfs.namenode.http-address.gky.nn2</name>
<value>ant152:9870</value>
<description>namenode2的http通信地址</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://ant151:8485;ant152:8485;ant153:8485/gky</value>
<description>指定NameNode的edits元数据的共享存储位置(JournalNode列表)</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/software/hadoop313/data/journaldata</value>
<description>指定JournalNode在本地磁盘存放数据的位置</description>
</property>
<!-- 容错 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>开启NameNode故障自动切换</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.gky</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>失败后自动切换的实现方式</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<description>防止脑裂的处理</description>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
<description>使用sshfence隔离机制时,需要ssh免密登陆</description>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>关闭HDFS操作权限验证</description>
</property>
<property>
<name>dfs.image.transfer.bandwidthPerSec</name>
<value>1048576</value>
<description></description>
</property>
<property>
<name>dfs.block.scanner.volume.bytes.per.second</name>
<value>1048576</value>
<description></description>
</property>
</configuration>
Configuração do arquivo mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>job执行框架: local, classic or yarn</description>
<final>true</final>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>/opt/software/hadoop313/etc/hadoop:/opt/software/hadoop313/share/hadoop/common/lib/*:/opt/software/hadoop313/share/hadoop/common/*:/opt/software/hadoop313/share/hadoop/hdfs/*:/opt/software/hadoop313/share/hadoop/hdfs/lib/*:/opt/software/hadoop313/share/hadoop/mapreduce/*:/opt/software/hadoop313/share/hadoop/mapreduce/lib/*:/opt/software/hadoop313/share/hadoop/yarn/*:/opt/software/hadoop313/share/hadoop/yarn/lib/*</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>ant151:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ant151:19888</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
<description>map阶段的task工作内存</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
<description>reduce阶段的task工作内存</description>
</property>
</configuration>
Configuração do arquivo yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
<description>开启resourcemanager高可用</description>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrcabc</value>
<description>指定yarn集群中的id</description>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
<description>指定resourcemanager的名字</description>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>ant153</value>
<description>设置rm1的名字</description>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>ant154</value>
<description>设置rm2的名字</description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>ant153:8088</value>
<description></description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>ant154:8088</value>
<description></description>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>ant151:2181,ant152:2181,ant153:2181</value>
<description>指定zk集群地址</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>运行mapreduce程序必须配置的附属服务</description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/opt/software/hadoop313/tmpdata/yarn/local</value>
<description>nodemanager本地存储目录</description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/opt/software/hadoop313/tmpdata/yarn/log</value>
<description>nodemanager本地日志目录</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
<description>resource进程的工作内存</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
<description>resource工作中所能使用机器的内核数</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>256</value>
<description></description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description></description>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
<description>日志保留多少秒</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description></description>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/opt/software/hadoop313/etc/hadoop:/opt/software/hadoop313/share/hadoop/common/lib/*:/opt/software/hadoop313/share/hadoop/common/*:/opt/software/hadoop313/share/hadoop/hdfs/*:/opt/software/hadoop313/share/hadoop/hdfs/lib/*:/opt/software/hadoop313/share/hadoop/mapreduce/*:/opt/software/hadoop313/share/hadoop/mapreduce/lib/*:/opt/software/hadoop313/share/hadoop/yarn/*:/opt/software/hadoop313/share/hadoop/yarn/lib/*</value>
<description></description>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
<description></description>
</property>
</configuration>
Configuração do arquivo de trabalhadores
ant151
ant152
ant153
ant154
Configuração de variáveis de ambiente do sistema (/etc/profile)
#hadoop
export JAVA_LIBRARY_PATH=/opt/software/hadoop313/lib/native
export HADOOP_HOME=/opt/software/hadoop313
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib
Por fim, transfira os arquivos de configuração e o hadoop para outros hosts
# 将安装目录传输给其他主机
scp -r /opt/software/hadoop313 root@ant152:/opt/software/
# 将配置文件传输给其他主机
scp -r /etc/profile root@ant1522:/etc/
5. Inicie o cluster pela primeira vez
A primeira inicialização do cluster e instruções relacionadas
1. Inicie o cluster zk
2. Inicie o serviço journalnode de ant151, ant152, ant153: hdfs --daemon start journalnode
3. Formate o namenode hfds em ant151: hdfs namenode -format
4. Inicie o serviço namenode em ant151: hdfs --daemon start namenode
5. Sincronize as informações do namenode na máquina ant152 [root@ant152 soft]# hdfs namenode -bootstrapStandby
6. Inicie o serviço namenode em ant152: hdfs --daemon start namenode
Visualize o status do nó namenode: hdfs haadmin -getServiceState nn1 |nn2
7. Feche todos os serviços relacionados a dfs [root@ant151 soft]# stop-dfs.sh
8. Formate zk:[root@ant151 soft]# hdfs zkfc -formatZK
9. Inicie dfs: [root @ant151 soft] # start-dfs.sh
10. Iniciar yarn: [root@ant151 soft]# start-yarn.sh
Exibir status do nó do gerenciador de recursos: yarn rmadmin -getServiceState rm1|rm2