Environment to build -Linux, ZooKeeper, Hadoop

Linux versions: Ubuntu 16.04 Server LTS

1. Install Linux, the initial user name to hadoop, host are:

Lead1,Lead2,Register1,Register2,Register3,Follower1,,Follower2,Follower3,Follower4,Follower5

HA Lead1, Lead2 Namenode and for placement of Resourcemanager

Register1, Register2, Register3 used to run ZooKeeper cluster service and qjournal

Follower1,, Follower2, Follower3, Follower4, Follower5 used to run Datanode and Nodemanager

2. Install the software: openjdk 1.8, vim openssh-server,

3. Configure ssh-free dense Login:

a. ssh -keygen -t rsa

b. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

c scp ~ / .ssh / authorized_keys xxx (hostname):. ~ / .ssh / authorized_keys until all authorized_keys file contains all the hosts are host id_rsa.pub

4. Check the content using ifconfig ip address sequentially modify all hosts / etc / hosts, while removing their mapping point 127.0.0.1 etc.

5. sudo chmod 777 / opt and applied to all hosts

 

ZooKeeper Version: 3.4.10

1. ZooKeeper-3.4.10 extract to Redister1, Redister2, Redister3 three hosts / opt under

2. Add /etc/profile.d/zookeeper.sh, says:

export ZOOKEEPER_HOME=/opt/hadoop-2.7.3

export PATH=$ZOOKEEPER_HOME/bin:$PATH

3. Copy zookeeper-3.4.10 / conf / is in zoo_sample.cfg zoo.cfg, change as follows

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit = 10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir = / opt / zookeeper / data (self-set as needed)

dataLogDir=/opt/zookeeper/log

# the port at which the clients will connect

clientPort = 2181

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

# Autopurge.purgeInterval = 1

server.1=Register1:2888:3888

server.2=Register2:2888:3888

server.3=Register3:2888:3888

4. Start ZooKeeper

ssh yrf@Register1 << Function

zkServer.sh start

exit

Function

ssh yrf@Register2 << Function

zkServer.sh start

exit

Function

ssh yrf@Register3 << Function

zkServer.sh start

exit

Function

5. Verify successful start ZooKeeper

ssh yrf@Register1 << Function

zkServer.sh status

exit

Function

ssh yrf@Register2 << Function

zkServer.sh status

exit

Function

ssh yrf@Register3 << Function

zkServer.sh status

exit

Function

6. End ZooKeeper

ssh yrf@Register1 << Function

zkServer.sh stop

exit

Function

ssh yrf@Register2 << Function

zkServer.sh stop

exit

Function

ssh yrf@Register3 << Function

zkServer.sh stop

exit

Function

 

Hadoop Version: 2.7.3

1. The downloaded hadoop-2.7.3 moved to the / opt directory

2. hadoop-2.7.3 / etc / hadoop / in several configuration files to configure

a. core-site.xml

<configuration>

         <property>

                <name>fs.defaultFS</name>

                <value>hdfs://NAMENODE/</value>

        </property>

        <property>

                <name>hadoop.tmp.dir</name>

                <value>/opt/hadoop/temp</value>

        </property>

        <property>

                <name>ha.zookeeper.quorum</name>

                <value>Register1:2181,Register2:2181,Register3:2181</value>

        </property>

        <property>

                <name>io.file.buffer.size</name>

                <value>4096</value>

        </property>

</configuration>

b. hadoop-env.sh

Locate and modification in which the installation java path is the path, if it is installed through the system, it is generally export JAVA_HOME = / usr

c. hdfs-site.xml

<configuration>

<property>

                      <name>dfs.nameservices</name>

                      <value>NAMENODE</value>

            </property>

            <property>

                      <name>dfs.ha.namenodes.NAMENODE</name>

                      <value>namenode1,namenode2</value>

            </property>

            <property>

                      <name>dfs.namenode.rpc-address.NAMENODE.namenode1</name>

                    <value>Lead1:9000</value>

            </property>

            <property>

                      <name>dfs.namenode.rpc-address.NAMENODE.namenode2</name>

                      <value>Lead2:9000</value>

            </property>

            <property>

                      <name>dfs.namenode.http-address.NAMENODE.namenode1</name>

                      <value>Lead1:50070</value>

            </property>

            <property>

                      <name>dfs.namenode.http-address.NAMENODE.namenode2</name>

                      <value>Lead2:50070</value>

            </property>

            <property> 

                      <name>dfs.ha.automatic-failover.enabled</name> 

                      <value>true</value> 

            </property>

            <property>

                      <name>dfs.namenode.shared.edits.dir</name>

                      <value>qjournal://Register1:8485;Register2:8485;Register3:8485/NAMENODE</value>

            </property>

            <property>

                    <name>dfs.journalnode.edits.dir</name>

                    <value>/opt/hadoop/journal/data</value>

            </property>

            <property>

                      <name>dfs.client.failover.proxy.provider.NAMENODE</name>

                      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

            </property>

            <property>

                <name>dfs.replication</name>

                <value>3</value>

            </property>

            <property>

                <name>dfs.namenode.name.dir</name>     

                <value>/opt/hadoop/hdfs/name</value>

            </property>

            <property>

                <name>dfs.datanode.data.dir</name>

                <value>/opt/hadoop/hdfs/data</value>

            </property>

            <property>

                <name>dfs.ha.fencing.ssh.connect-timeout</name>

                <value>10000</value>

            </property>

            <property>

                        <name>dfs.ha.fencing.methods</name>

                <value>

sshfence

shell(/bin/true)

</value>

            </property>

            <property>

                      <name>dfs.ha.fencing.ssh.private-key-files</name>

                      <value>/home/yrf/.ssh/id_rsa</value>

            </property>

</configuration>

d. mapred-site.xml

<configuration>

<property>

                <name>mapreduce.framework.name</name>

                <value>yarn</value>

            </property>

</configuration>

e. slaves

Follower1

Follower2

Follower3

Follower4

Follower5

f. yarn-site.xml

<configuration>

        <property>

                <name>yarn.resourcemanager.ha.enabled</name>

                <value>true</value>

        </property>

        <property>

                <name>yarn.resourcemanager.cluster-id</name>

                <value>YARN</value>

        </property>

        <property>

                <name>yarn.resourcemanager.ha.rm-ids</name>

                <value>yarn1,yarn2</value>

        </property>

        <property>

                <name>yarn.resourcemanager.hostname.yarn1</name>

                <value>Lead1</value>

        </property>

        <property>

                <name>yarn.resourcemanager.hostname.yarn2</name>

                <value>Lead2</value>

        </property>

        <property>

                <name>yarn.resourcemanager.recovery.enabled</name>

                <value>true</value>

        </property>

        <property>

                <name>yarn.resourcemanager.store.class</name>

                <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

        </property>

        <property>

                <name>yarn.resourcemanager.zk-address</name>

                <value>Register1:2181,Register2:2181,Register3:2181</value>

        </property>

        <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

        </property>

</configuration>

3. Start Hadoop:

. A first start:

ssh yrf@Register1 << Function

hadoop-daemon.sh start journalnode

exit

Function

ssh yrf@Register2 << Function

hadoop-daemon.sh start journalnode

exit

Function

ssh yrf@Register3 << Function

hadoop-daemon.sh start journalnode

exit

Function

ssh yrf@Lead1 << Function

hdfs zkfc -formatZK

hdfs namenode -format

start-dfs.sh

start-yarn.sh

exit

Function

ssh yrf@Lead2 << Function

yarn-daemon.sh start resourcemanager

exit

Function

. B General start:

ssh yrf@Lead1 << Function

start-dfs.sh

start-yarn.sh

yarn-daemon.sh start resourcemanager

exit

Function

ssh yrf@Lead2 << Function

yarn-daemon.sh start resourcemanager

exit

Function

4. Stop Hadoop:

ssh yrf@Lead1 << Function

stop-yarn.sh

exit

Function

ssh yrf@Lead2 << Function

yarn-daemon.sh stop resourcemanager

exit

Function

ssh yrf@Lead1 << Function

stop-dfs.sh

exit

Function

5. Add /etc/profile.d/hadoop.sh hadoop command can be used globally

export HADOOP_HOME=/opt/hadoop-2.7.3

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

 

Guess you like

Origin www.cnblogs.com/fusiji/p/11409907.html