环境搭建-Linux、ZooKeeper、Hadoop

Linux版本:Ubuntu 16.04 Server LTS

1. 安装Linux,初始用户名设置为hadoop,host依次是:

Lead1,Lead2,Register1,Register2,Register3,Follower1,,Follower2,Follower3,Follower4,Follower5

Lead1,Lead2用于安置Namenode和Resourcemanager的HA

Register1,Register2,Register3用来运行ZooKeeper集群和qjournal服务

Follower1,,Follower2,Follower3,Follower4,Follower5用来运行Datanode和Nodemanager

2. 安装软件:openjdk 1.8,openssh-server,vim

3. 配置ssh免密登录:

a. ssh -keygen -t rsa

b. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

c. scp ~/.ssh/authorized_keys xxx(hostname):~/.ssh/authorized_keys 直到所有主机的authorized_keys文件中都含有所有主机的id_rsa.pub

4. 利用ifconfig查看ip地址,依次修改所有主机的/etc/hosts的内容,同时去掉127.0.0.1等指向自己的映射

5. sudo chmod 777 /opt并应用到所有主机上

 

ZooKeeper版本:3.4.10

1. 将ZooKeeper-3.4.10解压到Redister1,Redister2,Redister3三台主机的/opt下

2. 添加/etc/profile.d/zookeeper.sh,内容为:

export ZOOKEEPER_HOME=/opt/hadoop-2.7.3

export PATH=$ZOOKEEPER_HOME/bin:$PATH

3. 复制zookeeper-3.4.10/conf/中的zoo_sample.cfg为zoo.cfg,更改内容如下

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/opt/zookeeper/data(根据需要自行设定)

dataLogDir=/opt/zookeeper/log

# the port at which the clients will connect

clientPort=2181

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

server.1=Register1:2888:3888

server.2=Register2:2888:3888

server.3=Register3:2888:3888

4. 启动ZooKeeper

ssh yrf@Register1 << Function

zkServer.sh start

exit

Function

ssh yrf@Register2 << Function

zkServer.sh start

exit

Function

ssh yrf@Register3 << Function

zkServer.sh start

exit

Function

5. 验证ZooKeeper启动成功

ssh yrf@Register1 << Function

zkServer.sh status

exit

Function

ssh yrf@Register2 << Function

zkServer.sh status

exit

Function

ssh yrf@Register3 << Function

zkServer.sh status

exit

Function

6. 结束ZooKeeper

ssh yrf@Register1 << Function

zkServer.sh stop

exit

Function

ssh yrf@Register2 << Function

zkServer.sh stop

exit

Function

ssh yrf@Register3 << Function

zkServer.sh stop

exit

Function

 

Hadoop版本:2.7.3

1. 将下载好的hadoop-2.7.3移动到/opt目录中

2. 对hadoop-2.7.3/etc/hadoop/中的几个配置文件进行配置

a. core-site.xml

<configuration>

         <property>

                <name>fs.defaultFS</name>

                <value>hdfs://NAMENODE/</value>

        </property>

        <property>

                <name>hadoop.tmp.dir</name>

                <value>/opt/hadoop/temp</value>

        </property>

        <property>

                <name>ha.zookeeper.quorum</name>

                <value>Register1:2181,Register2:2181,Register3:2181</value>

        </property>

        <property>

                <name>io.file.buffer.size</name>

                <value>4096</value>

        </property>

</configuration>

b. hadoop-env.sh

找到并修改,其中路径是安装java的路径,如果是通过系统安装的,则一般export JAVA_HOME=/usr

c. hdfs-site.xml

<configuration>

<property>

                      <name>dfs.nameservices</name>

                      <value>NAMENODE</value>

            </property>

            <property>

                      <name>dfs.ha.namenodes.NAMENODE</name>

                      <value>namenode1,namenode2</value>

            </property>

            <property>

                      <name>dfs.namenode.rpc-address.NAMENODE.namenode1</name>

                    <value>Lead1:9000</value>

            </property>

            <property>

                      <name>dfs.namenode.rpc-address.NAMENODE.namenode2</name>

                      <value>Lead2:9000</value>

            </property>

            <property>

                      <name>dfs.namenode.http-address.NAMENODE.namenode1</name>

                      <value>Lead1:50070</value>

            </property>

            <property>

                      <name>dfs.namenode.http-address.NAMENODE.namenode2</name>

                      <value>Lead2:50070</value>

            </property>

            <property> 

                      <name>dfs.ha.automatic-failover.enabled</name> 

                      <value>true</value> 

            </property>

            <property>

                      <name>dfs.namenode.shared.edits.dir</name>

                      <value>qjournal://Register1:8485;Register2:8485;Register3:8485/NAMENODE</value>

            </property>

            <property>

                    <name>dfs.journalnode.edits.dir</name>

                    <value>/opt/hadoop/journal/data</value>

            </property>

            <property>

                      <name>dfs.client.failover.proxy.provider.NAMENODE</name>

                      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

            </property>

            <property>

                <name>dfs.replication</name>

                <value>3</value>

            </property>

            <property>

                <name>dfs.namenode.name.dir</name>     

                <value>/opt/hadoop/hdfs/name</value>

            </property>

            <property>

                <name>dfs.datanode.data.dir</name>

                <value>/opt/hadoop/hdfs/data</value>

            </property>

            <property>

                <name>dfs.ha.fencing.ssh.connect-timeout</name>

                <value>10000</value>

            </property>

            <property>

                        <name>dfs.ha.fencing.methods</name>

                <value>

sshfence

shell(/bin/true)

</value>

            </property>

            <property>

                      <name>dfs.ha.fencing.ssh.private-key-files</name>

                      <value>/home/yrf/.ssh/id_rsa</value>

            </property>

</configuration>

d. mapred-site.xml

<configuration>

<property>

                <name>mapreduce.framework.name</name>

                <value>yarn</value>

            </property>

</configuration>

e. slaves

Follower1

Follower2

Follower3

Follower4

Follower5

f. yarn-site.xml

<configuration>

        <property>

                <name>yarn.resourcemanager.ha.enabled</name>

                <value>true</value>

        </property>

        <property>

                <name>yarn.resourcemanager.cluster-id</name>

                <value>YARN</value>

        </property>

        <property>

                <name>yarn.resourcemanager.ha.rm-ids</name>

                <value>yarn1,yarn2</value>

        </property>

        <property>

                <name>yarn.resourcemanager.hostname.yarn1</name>

                <value>Lead1</value>

        </property>

        <property>

                <name>yarn.resourcemanager.hostname.yarn2</name>

                <value>Lead2</value>

        </property>

        <property>

                <name>yarn.resourcemanager.recovery.enabled</name>

                <value>true</value>

        </property>

        <property>

                <name>yarn.resourcemanager.store.class</name>

                <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

        </property>

        <property>

                <name>yarn.resourcemanager.zk-address</name>

                <value>Register1:2181,Register2:2181,Register3:2181</value>

        </property>

        <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

        </property>

</configuration>

3. 启动Hadoop:

a. 初次启动:

ssh yrf@Register1 << Function

hadoop-daemon.sh start journalnode

exit

Function

ssh yrf@Register2 << Function

hadoop-daemon.sh start journalnode

exit

Function

ssh yrf@Register3 << Function

hadoop-daemon.sh start journalnode

exit

Function

ssh yrf@Lead1 << Function

hdfs zkfc -formatZK

hdfs namenode -format

start-dfs.sh

start-yarn.sh

exit

Function

ssh yrf@Lead2 << Function

yarn-daemon.sh start resourcemanager

exit

Function

b. 一般启动:

ssh yrf@Lead1 << Function

start-dfs.sh

start-yarn.sh

yarn-daemon.sh start resourcemanager

exit

Function

ssh yrf@Lead2 << Function

yarn-daemon.sh start resourcemanager

exit

Function

4. 停止Hadoop:

ssh yrf@Lead1 << Function

stop-yarn.sh

exit

Function

ssh yrf@Lead2 << Function

yarn-daemon.sh stop resourcemanager

exit

Function

ssh yrf@Lead1 << Function

stop-dfs.sh

exit

Function

5. 添加/etc/profile.d/hadoop.sh可以使hadoop命令全局使用

export HADOOP_HOME=/opt/hadoop-2.7.3

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

猜你喜欢

转载自www.cnblogs.com/fusiji/p/11409907.html