Service planning for 3 servers:
nameNode |
dataNode |
resourceManager |
nodeManager |
jobHistoryServer |
secondaryNameNode |
|
hadoop.node.server01 (192.168.159.130) |
AND |
AND |
AND |
AND |
||
hadoop.node.server02 (192.168.159.128) |
AND |
AND |
AND |
|||
hadoop.node.server03 (192.168.159.129) |
AND |
AND |
|
AND |
1. First configure all properties on server01:
1. Download the hadoop package.
I am using the mature version 2.5.0. There are two ways to download: one is to use the packaged version of CDH, download address: http://archive.cloudera.com/cdh5/cdh/5/ , one is apache Download from the official website. The latest version of the official website is already version 2.7.5. To find the old version of 2.5.0, you must go to https://archive.apache.org/dist/hadoop/common/
2. Unzip the hadoop package, the command is tar -zxvf hadoop.tar.gz
3. Configure the JAVA environment variable values for the following files
Configuration file address: ${HADOOP_HOME}/etc/hadoop, the configuration files are as follows:
hadoop-env.sh
mapred-env.sh
yarn-env.sh
The above three files are configured: export JAVA_HOME=/opt/modules/jdk1.8
4. Modify core-site.xml and add the following property configuration:
====core-site.xml====
<!--Specify the first one as namenode-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop.node.server01:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-2.5.0/data</value>
</property>
5. Modify the configuration hdfs-site.xml and add the following properties:
<!-- The number of distributed replicas is set to 3 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- secondarynamenode hostname-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop.node.server03:50090</value>
</property>
<!-- namenode's web access host name: port number -->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop.node.server01:50070</value>
</property>
<!-- Turn off permission checking user or user group-->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
6. Modify yarn-site.xml and add the following properties:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop.node.server02</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
7. Modify mapred.site.xml
First rename mapred-site.xml.template to mapred-site.xml, and then add the following properties:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop.node.server01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop.node.server01:19888</value>
</property>
2. Send the above modified configuration file to the other two services:
Execute the following two commands in server01 and send them to server02 and server03 respectively to replace the configuration file:
scp /usr/local/hadoop/hadoop-2.5.0/etc/hadoop/* hadoop.node.server02: /usr/local/hadoop/hadoop-2.5.0/etc/hadoop/
scp /usr/local/hadoop/hadoop-2.5.0/etc/hadoop/* hadoop.node.server03: /usr/local/hadoop/hadoop-2.5.0/etc/hadoop/
3. Configure passwordless login
On the server01 server where the namenode is located, and the server02 server where the resourceManager is located, configure passwordless ssh login to other servers respectively.
1), execute the following command on server01:
ssh-keygen -t rsa
ssh-copy-id hadoop.node.server02
ssh-copy-id hadoop.node.server03
2) Execute the following command on server02:
ssh-keygen -t rsa
ssh-copy-id hadoop.node.server01
ssh-copy-id hadoop.node.server03
Fourth, format the namenode in the server01 server where the namenode is located
/usr/local/hadoop/hadoop-2.5.0/bin/hdfs purpose -format
5. Start the namenode service
In the server01 server, execute the command: /usr/local/hadoop/hadoop-2.5.0/sbin/start-dfs.sh
6. Start the YARN service
In the server02 server, execute the command: /usr/local/hadoop/hadoop-2.5.0/sbin/start-yarn.sh
Seven, start jobhistoryserver
On the server01 server, execute the command: /usr/local/hadoop/hadoop-2.5.0/sbin/mr-jobhistory-daemon.sh start historyserver
At this point, all services are ready.