Qi Space HDFS Distributed File System Deployment Solution

1 Background

At present, all the files in the architecture of Qikong Space must be manually synchronized to each server node using rsync, and maintenance is troublesome. Therefore, I plan to deploy a Hadoop HDFS distributed file system cluster. Now I have basically completed the deployment plan through virtual machine deployment experiments.

2 Deployment environment

project value
operating system Centos Linux
Software Environment zookeeper+hadoop3.1.1+autofs
need java environment Yes
hadoop version 3.1.1
hadoop startup module namenode+datanode+journalnode+secondarynamenode+portmap+nfs3+zkfc
HDFS Basic Module purpose + datanode + journalnode + secondarynamenode + zkfc
Number of namedata nodes 3 KVMs
The number of datanode nodes 3 KVMs
number of journalnode nodes 3 KVMs
Number of secondarynamenode nodes 3 KVMs
Supports automatic failover support
Use autofs autoloader Yes
Does autofs support multi-IP domain name access? Yes
Server node DNS1 service1.kvm.qhjack.cn
Server Node DNS2 service2.kvm.qhjack.cn
Server Node DNS3 service3.kvm.qhjack.cn
nfs domain name of service1.kvm.qhjack.cn nfs.service1.kvm.qhjack.cn
nfs domain name of service2.kvm.qhjack.cn nfs.service2.kvm.qhjack.cn
nfs domain name of service3.kvm.qhjack.cn nfs.service3.kvm.qhjack.cn
The A record of the nfs subdomain of service1.kvm.qhjack.cn points to the IP point to service2.kvm.qhjack.cnandservice3.kvm.qhjack.cn
The A record of the nfs subdomain of service2.kvm.qhjack.cn points to the IP point to service1.kvm.qhjack.cnandservice3.kvm.qhjack.cn
The A record of the nfs subdomain of service3.kvm.qhjack.cn points to the IP point to service1.kvm.qhjack.cnandservice2.kvm.qhjack.cn
autofs autoload directories /hdfs_autofs
Need to use soft link Yes
/www/rootlink to /hdfs_autofs/www
/www/server/nginx/The cache directory link points to /hdfs_autofs/nginx
Whether the hadoop hdfs cluster starts automatically with the boot action yes (required)
Whether autofs starts automatically with the boot action Yes
Whether zookeeper starts automatically with the boot action yes (required)

2 deployment of zookeeper

2.1 Installation of zookeeper

官网下载zookeeper-3.4.13.tar.gz并解压,命令如下:

wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.4.13/zookeeper-3.4.13.tar.gz
tar xvf zookeeper-3.4.13.tar.gz
mv zookeeper-3.4.13 /usr/local/zookeeper

输入vi /etc/profile后,增加以下内容:

export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:ZOOKEEPER_HOME/bin

2.2 zookeepe的配置

首先执行如下命令,进入配置目录和拷贝配置文件:

cd /usr/local/zookeeper/conf
cp zoo_sample.cfg coo.cfg

执行vi zoo.cfg后编辑如下:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/zookeeper/dataDir
dataLogDir=/usr/local/zookeeper/dataLogDir
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
4lw.commands.whitelist=*
server.1=service1.kvm.qhjack.cn:2888:3888
server.2=service2.kvm.qhjack.cn:2888:3888
server.3=service3.kvm.qhjack.cn:2888:3888

service1.kvm.qhjack.cn的dataDir配置如下:

cd /usr/local/zookeeper/dataDir
vi myid

其内容如下:

1

service2.kvm.qhjack.cn的dataDir配置如下:

cd /usr/local/zookeeper/dataDir
vi myid

其内容如下:

2

service3.kvm.qhjack.cn的dataDir配置如下:

cd /usr/local/zookeeper/dataDir
vi myid

其内容如下:

3

2.3 启动zookeeper

在3台服务器执行以下命令启动:

zkServer.sh start

当全部启动完成后,可以使用如下命令查看状态:

zkServer.sh status

3 Hadoop HDFS的部署

3.1 Hadoop的安装

官网下载hadoop-3.1.1.tar.gz并解压,命令如下:

yum install java-1.8.0-openjdk-devel java-1.8.0-openjdk
groupadd hadoop
useradd -g hadoop hadoop
wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
tar xvf hadoop-3.1.1.tar.gz
mv hadoop-3.1.1 /usr/local/hadoop
chown hadoop:hadoop -R /usr/local/hadoop

输入vi /etc/profile后,增加以下内容:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64
export CLASSPATH=$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

3.2 ssh节点间免密登录

在3台节点依次执行如下指令:

ssh-keygen
ssh-copy-id [email protected]
ssh-copy-id [email protected]
ssh-copy-id [email protected]

3.3 Hadoop的配置

进入/usr/local/hadoop/etc/hadoop/并进行配置:

cd /usr/local/hadoop/etc/hadoop/
vi hadoop-env.sh

并做如下修改:

export HDFS_ZKFC_USER=hadoop
export HDFS_JOURNALNODE_USER=hadoop
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64
export CLASSPATH=$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/usr/local/hadoop
export HDFS_DATANODE_SECURE_USER=hadoop
export HDFS_NFS3_SECURE_USER=hadoop
export HDFS_NAMENODE_USER=hadoop
export HADOOP_PID_DIR=/usr/local/hadoop/tmp

执行vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml配置如下:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
        <description>设置冗余数量</description>
    </property>
    <property>
        <name>dfs.nameservices</name>
        <value>hdfs_cluster</value>
        <description>完全分布式集群名称</description>
    </property>
    <property>
        <name>dfs.ha.namenodes.hdfs_cluster</name>
        <value>service1,service2,service3</value>
        <description>集群中 NameNode 节点都有哪些</description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.hdfs_cluster.service1</name>
        <value>service1.kvm.qhjack.cn:8020</value>
        <description>service1.kvm.qhjack.cn 的 RPC 通信地址</description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.hdfs_cluster.service2</name>
        <value>service2.kvm.qhjack.cn:8020</value>
        <description>service2.kvm.qhjack.cn 的 RPC 通信地址</description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.hdfs_cluster.service3</name>
        <value>service3.kvm.qhjack.cn:8020</value>
        <description>service3.kvm.qhjack.cn 的 RPC 通信地址</description>
    </property>
    <property>
        <name>dfs.namenode.http-address.hdfs_cluster.service1</name>
        <value>service1.kvm.qhjack.cn:50070</value>
        <description>service1.kvm.qhjack.cn 的 http 通信地址</description>
    </property>
    <property>
        <name>dfs.namenode.http-address.hdfs_cluster.service2</name>
        <value>service2.kvm.qhjack.cn:50070</value>
        <description>service2.kvm.qhjack.cn 的 http 通信地址</description>
    </property>
    <property>
        <name>dfs.namenode.http-address.hdfs_cluster.service3</name>
        <value>service3.kvm.qhjack.cn:50070</value>
        <description>service3.kvm.qhjack.cn 的 http 通信地址</description>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://service1.kvm.qhjack.cn:8485;service2.kvm.qhjack.cn:8485;service3.kvm.qhjack.cn:8485/hdfs_cluster</value>
        <description>指定 NameNode 元数据在 JournalNode 上的存放位置</description>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
        <description>配置隔离机制,即同一时刻只能有一台服务器对外响应</description>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
        <description>使用隔离机制时需要 ssh 无秘钥登录,value是密钥所在的位置</description>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/usr/local/hadoop/datanode/jl</value>
        <description>声明 journalnode 服务器存储目录</description>
    </property>
    <property>
        <name>dfs.permissions.enable</name>
        <value>false</value>
        <description>设置权限检查是否激活</description>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.hdfs_cluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        <description>配置自动故障转移实现方式</description>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave1:50090</value>
        <description></description>
    </property>
    <property>
        <name>nfs.dump.dir</name>
        <value>/usr/local/hadoop/tmp/hdfs-nfs</value>
        <description>设置转储目录</description>
    </property>
    <property>
        <name>nfs.exports.allowed.hosts</name>
        <value>* rw</value>
        <description>设置允许访问 NFS 主机列与权限</description>
    </property>
    <property>
        <name>nfs.rtmax</name>
        <value>1048576</value>
        <description>这是NFS网关支持的读取请求的最大字节数。如果更改此设置,请确保还更新了nfs mount的rsize参数</description>
    </property>
    <property>
        <name>nfs.wtmax</name>
        <value>65536</value>
        <description>这是NFS网关支持的写入请求的最大字节数。如果更改此项,请确保还更新了nfs mount的wsize参数</description>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
        <description>启动自动故障转移</description>
    </property>
</configuration>

执行vi/usr/local/hadoop/etc/hadoop/core-site.xml配置如下:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hdfs_cluster</value>
        <description>设置namenode访问地址</description>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>   
        <value>service1.kvm.qhjack.cn:2181,service2.kvm.qhjack.cn:2181,service3.kvm.qhjack.cn:2181</value>
        <description>设置zookeeper的链接地址</description>
    </property>
</configuration>

执行vi masters配置namenode节点:

service1.kvm.qhjack.cn
service2.kvm.qhjack.cn
service3.kvm.qhjack.cn

执行vi workers配置其他子节点:

service1.kvm.qhjack.cn
service2.kvm.qhjack.cn
service3.kvm.qhjack.cn

service1.kvm.qhjack.cn的服务器执行以下命令进行全文件同步:

scp -r /usr/local/hadoop/etc/hadoop/* [email protected]:/usr/local/hadoop/etc/hadoop/
scp -r /usr/local/hadoop/etc/hadoop/* [email protected]:/usr/local/hadoop/etc/hadoop/

3.4 HDFS的初始化

在服务器service1.kvm.qhjack.cn执行以下指令启动journalnode节点:

mkdir /usr/local/hadoop/datanode
mkdir /usr/local/hadoop/datanode/jl
hdfs --daemon start journalnode

在服务器service2.kvm.qhjack.cn执行以下指令启动journalnode节点:

mkdir /usr/local/hadoop/datanode
mkdir /usr/local/hadoop/datanode/jl
hdfs --daemon start journalnode

在服务器service3.kvm.qhjack.cn执行以下指令启动journalnode节点:

mkdir /usr/local/hadoop/datanode
mkdir /usr/local/hadoop/datanode/jl
hdfs --daemon start journalnode

service1.kvm.qhjack.cn执行格式化命令和启动dfs命令:

hdfs namenode -format
sbin/start-dfs.sh
hdfs haadmin -transitionToActive service1.kvm.qhjack.cn
hdfs zkfc -formatZK
hdfs --daemon start zkfc

service2.kvm.qhjack.cn执行同步meta数据并启动namenode

hdfs namenode -bootstrapStandby
hdfs --daemon start namenode
hdfs --daemon start zkfc

service3.kvm.qhjack.cn执行同步meta数据并启动namenode

hdfs namenode -bootstrapStandby
hdfs --daemon start namenode
hdfs --daemon start zkfc

3.5 正常启动

正常启动可使用以下命令实现:

sbin/start-dfs.sh

3.6 启动Hadoop NFS网关服务

首先需要安装基本nfs服务:

yum install nfs-utils

停用系统默认的服务:

systemctl stop rpcinfo
systemctl disable rpcinfo
systemctl stop nfs
systemctl disable nfs

在3台KVM节点执行以下指令启动nfs网关:

hdfs --daemon start portmap
hdfs --daemon start nfs3

4 配置autofs自动挂载器

4.1 autofs安装

执行以下指令安装:

yum install autofs

4.2 autofs配置

首先设置autofs.conf文件配置,修改以下内容:

use_hostname_for_mounts = "yes" # 把这个设置前面的注释取消,no改成yes

执行以下指令vi /etc/auto.master.d/hdfs.autofs内容如下:

/hdfs_autofs /etc/auto.master.d/hdfs.misc

service1.qhjack.cn执行vi /etc/auto.master.d/hdfs.misc,编辑文件,内容如下:

www -fstype=nfs,proto=tcp,noacl,sync,rw,nolock,vers=3 nfs.service1.qhjack.cn:/root
nginx -fstype=nfs,proto=tcp,noacl,sync,rw,nolock,vers=3 nfs.service1.qhjack.cn:/nginx

service2.qhjack.cn执行vi /etc/auto.master.d/hdfs.misc,编辑文件,内容如下:

www -fstype=nfs,proto=tcp,noacl,sync,rw,nolock,vers=3 nfs.service2.qhjack.cn:/root
nginx -fstype=nfs,proto=tcp,noacl,sync,rw,nolock,vers=3 nfs.service2.qhjack.cn:/nginx

service3.qhjack.cn执行vi /etc/auto.master.d/hdfs.misc,编辑文件,内容如下:

www -fstype=nfs,proto=tcp,noacl,sync,rw,nolock,vers=3 nfs.service3.qhjack.cn:/root
nginx -fstype=nfs,proto=tcp,noacl,sync,rw,nolock,vers=3 nfs.service3.qhjack.cn:/nginx

4.3 启动autofs

执行以下指令启动:

systemctl stop autofs
systemctl start autofs
systemctl enable autofs

4.4 建立软连接

执行如下指令建立软链接:

ln -s /hdfs_autofs/www /www/root
ln -s /hdfs_autofs/nginx /www/server/nginx

5 注意

  • 使用jps命令可以查询开放的Node
  • 各个节点的Hadoop版本一定要统一
  • 依赖于zookeeper集群网络
0 0 vote
Article Rating

本文同步分享在 博客“起航天空”(other)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324154339&siteId=291194637