Teach you step by step to build a Hadoop cluster

First, the preparatory work

Suppose there are four machines, namely: 192.168.1.101,192.168.1.102,192.168.1.103,192.168.1.104.

Here to Centos7 system as an example (different versions of the Linux system commands when doing some operations are not the same, self-Baidu) node has four login account hadoop.

We plan so configuration:

Turn off the firewall

  1. systemctl stop firewalld.service

  2. systemctl stop iptables.service

Then check the machine's port 22 is open

Here we ip mapped to a host name, execute the following command:

  1. vi /etc/hosts

In the hosts file, edit the following

  1. 192.168.1.101 node1

  2. 192.168.1.102 node2

  3. 192.168.1.103 node3

  4. 192.168.1.104 node4

Save and exit

3306 port to check whether the open node1

Then sends the file to other nodes hosts

  1. scp /etc/hosts hadoop@node2:/etc/

  2. scp /etc/hosts hadoop@node3:/etc/

  3. scp /etc/hosts hadoop@node4:/etc/

Hadoop need to enter the user's password in the process of transmission of

Second, configure password-free

Home directory and then execute  the command, do not bother prompt, enter all the way until the command execution is completedssh-keygen-t rsa

Go to the other nodes perform the same operation

For example: ssh node2 (jump to node node2)

At this time, the home directory of each node has a .ssh directory with idrsa (private key) and idrsa.pub (public key) two files, node2, node3, id_rsa.pub node4 three nodes of a replication sent to the next part of the home directory node1

  1. scp .ssh/id_rsa.pub hadoop@node1:~/pub2

  2. scp .ssh/id_rsa.pub hadoop@node1:~/pub3

  3. scp .ssh/id_rsa.pub hadoop@node1:~/pub4

Node1 of the public files a copy of the home directory

  1. cp .ssh/id_rsa.pub pub1

Create a new file in the home directory of node1

  1. touch authorized_keys

The contents of the four key file input to all nodes in the authorized_keys

  1. cat pub1 >> authorized_keys

  2. cat pub2 >> authorized_keys

  3. cat pub3 >> authorized_keys

  4. cat pub4 >> authorized_keys

Then sends to each node authorized_keys file folder .ssh

  1. cp authorized_keys .ssh/

  2. scp authorized_keys hadoop@node2:~/

  3. scp authorized_keys hadoop@node3:~/

  4. scp authorized_keys hadoop@node4:~/

These words will now be copied to each node's / etc / ssh / ssh_config file

  1. Host *

  2. StrictHostKeyChecking no

  3. UserKnownHostsFile=/dev/null

Thus, there is no need to enter a password to switch between nodes

Third, the installation JDK

Download jdk1.8

Once downloaded node1 uploaded to the home directory, enter the following command to extract

  1. tar -zxvf jdk-8u131-linux-x64.tar.gz

After extracting delete archive, then unzip the file folder renamed jdk

  1. mv jdk1.8.0_131 jdk

Configuration environment variable:

  1. vi .bashrc

Write the following contents in the file

  1. export JAVA_HOME=/home/hadoop/jdk

  2. export PATH=$PATH:$JAVA_HOME/bin

Save and exit

Execute the following command to recompile .bashrc file, bring it into force

  1. source .bashrc

Transmitting jdk .bashrc file folders and to other nodes, and other nodes in the source file .bashrc

  1. scp r jdk/ hadoop@node2:~/

Fourth, the installation zookeeper

After downloading zookeeper node1 uploaded to the home directory, enter the following command to extract

  1. tar -zxvf zookeeper-3.4.8.tar.gz

After extracting archive delete, and modify the folder name

  1. mv zookeeper-3.4.8 zookeeper

Configuration environment variable in .bashrc file, export PATH sentence above plus one

  1. export ZOOKEEPER=/home/hadoop/zookeeper

Then export PATH at the end of this sentence added:

  1. :$ZOOKEEPER_HOME/bin

Save and exit, and then  looksource.bashrc

Enter zookeeper profile directory

  1. cd zookeeper/conf

Modifying profile names

  1. mv zoo_sample.cfg zoo.cfg

Edit Profile

  1. vi zoo.cfg

The following passage written profile

  1. server.1=node1:2888:3888

  2. server.2=node2:2888:3888

  3. server.3=node3:2888:3888

Then find and modify its value dataDir

  1. dataDir=/home/hadoop/tmp/zookeeper

Save and exit

Just configured / home / hadoop / tmp / zookeeper If this folder does not exist, create it manually

Then create a file in the directory myid, myid written on the corresponding figures, the numbers correspond to numbers and zoo.cfg configuration file, such as node1 of myid file to write 1, the other nodes and so on, the last save and exit.

Under the zookeeper .bashrc and folders to send to the node node2 and node3 hadoop home directory and source it .bashrc file

Run the node1, node2, node3 three nodes zkServer.sh start

Tip a successful start

After three successful start nodes, where a node execution zkServer.sh status

If you are prompted follower or leader, it means that the installation was successful zookeeper

Fifth, install Hadoop

The following address to download hadoop2.7.3 version http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

And then uploaded to the home directory node1, enter the following command to extract

  1. tar -zxvf hadoop-2.7.3.tar.gz

After extracting archive delete, and modify the folder name

  1. mv hadoop-2.7.3 hadoop

Configuration environment variable in .bashrc file, export PATH sentence above plus one

  1. export HADOOP_HOME=/home/hadoop/hadoop

Then export PATH at the end of this sentence added:

  1. :$ HADOOP_HOME /bin:$ HADOOP_HOME /sbin

Save and exit, and then  looksource.bashrc

Hadoop into the profile directory, and modify the following configuration files

1. We hadoop-env.sh

  1. export JAVA_HOME=/home/hadoop/jdk

2. vi hdfs-site.xml

  1. <configuration>

  2. <!--block块存放副本的个数 -->

  3. <property>

  4. <name>dfs.replication</name>

  5. <value>3</value>

  6. </property>

  7. <!--nameNode集群的名称 -->

  8. <property>

  9. <name>dfs.nameservices</name>

  10. <value>hwua</value>

  11. </property>

  12. <!--nameNode集群对应有几个namenode在这里可以填很多个 -->

  13. <property>

  14. <name>dfs.ha.namenodes.hwua </name>

  15. <value>nn1,nn2</value>

  16. </property>

  17. <!--nameNode程序rpc的通信 -->

  18. <property>

  19. <name>dfs.namenode.rpc-address.hwua.nn1</name>

  20. <value>node1:8020</value>

  21. </property>

  22. <!--nameNode程序rpc的通信 -->

  23. <property>

  24. <name>dfs.namenode.rpc-address.hwua.nn2</name>

  25. <value>node2:8020</value>

  26. </property>

  27. <!--nameNode程序http的通信 -->

  28. <property>

  29. <name>dfs.namenode.http-address.hwua.nn1</name>

  30. <value>node1:50070</value>

  31. </property>

  32. <!--nameNode程序http的通信 -->

  33. <property>

  34. <name>dfs.namenode.http-address.hwua.nn2</name>

  35. <value>node2:50070</value>

  36. </property>

  37. <!--这里那几台机器启动了journalNode服务 -->

  38. <property>

  39. <name>dfs.namenode.shared.edits.dir</name>

  40. <value>qjournal://node2:8485;node3:8485;node4:8485/hwua</value>

  41. </property>

  42. <!--nameNode的active和standby在切换时候,提供的一个API -->

  43. <property>

  44. <name>dfs.client.failover.proxy.provider.hwua</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

  45. </property>

  46. <!--当一个namenode挂了,那么standby状态的接管,会在要断没断时将通过ssh到active上将其进程杀死 ,自己接管即:当NN2failover接管时,将通过ssh到namenode里面去,kill process(namenode),防止脑裂-->

  47. <property>

  48. <name>dfs.ha.fencing.methods</name>

  49. <value>sshfence</value>

  50. </property>

  51. <!--通过钥匙去登录别的服务器,开启其它的服务,如DataNode -->

  52. <property>

  53. <name>dfs.ha.fencing.ssh.private-key-files</name>

  54. <value>/home/hadoop/.ssh/id_rsa</value>

  55. </property>

  56. <!--journal存放edits文件的地方 -->

  57. <property>

  58. <name>dfs.journalnode.edits.dir</name>

  59. <value>/usr/hadoopsoft/journal</value>

  60. </property>

  61. <!--当一台nameNode挂了,可以自动的切换 -->

  62. <property>

  63. <name>dfs.ha.automatic-failover.enabled</name>

  64. <value>true</value>

  65. </property>

  66. </configuration>

3. I heart-site.xml

  1. <configuration>

  2. <!-- 指定hdfs的nameservice为namenode -->

  3. <property>

  4. <name>fs.defaultFS</name>

  5. <value>hdfs://hwua</value>

  6. </property>

  7. <!-- 指定hadoop临时目录,如果没有请手动创建 -->

  8. <property>

  9. <name>hadoop.tmp.dir</name>

  10. <value>/home/hadoop/tmp/hadoop</value>

  11. </property>

  12. <!-- 指定zookeeper所在的机器 -->

  13. <property>

  14. <name>ha.zookeeper.quorum</name>

  15. <value>node1:2181,node2:2181,node3:2181</value>

  16. </property>

  17. </configuration>

4. mv mapred-site.xml.template mapred-site.xml

Then vi mapred-site.xml

  1. <property>

  2. <name>mapreduce.framework.name</name>

  3. <value>yarn</value>

  4. </property>

5. vi yarn-site.xml

  1. <property>

  2. <name>yarn.nodemanager.aux-services</name>

  3. <value>mapreduce_shuffle</value>

  4. </property>

  5. <property>

  6. <name>yarn.resourcemanager.hostname</name>

  7. <value>node1</value>

  8. </property>

6. we slaves

  1. node2

  2. node3

  3. node4

Fill a node dataNode where: node2 node3 node4

The masters set in the fully distributed to delete, because secondaryNameNode ip fully distributed node set, and here with high availability, without secondaryNameNode, to be replaced nn2 out

7. hadoop all directories are copied to other nodes

  1. scp -r /usr/hadoopsoft/hadoop-2.5.1 node2:/usr/hadoopsoft/

  2. scp -r /usr/hadoopsoft/hadoop-2.5.1 node3:/usr/hadoopsoft/

  3. scp -r /usr/hadoopsoft/hadoop-2.5.1 node4:/usr/hadoopsoft/

After modifying all the files, start journalnode

  1. Hadoop-deamonsh start journalnode

In which a formatted namenode: hdfs namenode -format

Copy metadata immediately after formatting to another namenode

  • Start just formatted namenode

  • Execution on no formatting namenode: hdfs namenode -bootstrapStandby

  • Start a second namenode

In the initialization one namenode zkfc: hdfs zkfc -formatZK

Stop above node: stop-dfs.sh

Finally, started Hadoop: start-all.sh

 

Guess you like

Origin www.cnblogs.com/weiwei-python/p/11972662.html