hadoop 集群搭建 基础配置测试

1、首先准备4台能上网的机器,主机名叫shizhan01,shizhan02,shizhan03,shizhan04

2、修改主机名和IP的映射关系 vim /etc/hosts 如下

每一台机器都要配置

192.168.137.200 shizhan01
192.168.137.201 shizhan02
192.168.137.202 shizhan03
192.168.137.203 shizhan04

3、关闭防火墙

#查看防火墙状态
service iptables status
#关闭防火墙
service iptables stop
#查看防火墙开机启动状态
chkconfig iptables --list
#关闭防火墙开机启动
chkconfig iptables off

4、hadoop用户获取root权限

vim /etc/sudoers 加入下面一行红色标记

root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL

5、重启Linux

reboot

6、安装JDK

2.1上传alt+p 后出现sftp窗口,然后put d:\xxx\yy\ll\jdk-7u_65-i585.tar.gz(windows的路径)

解压jdk
#创建文件夹
mkdir /home/hadoop/app
#解压
tar -zxvf jdk-7u55-linux-i586.tar.gz -C /home/hadoop/app

7、将java添加到环境变量中

vim /etc/profile
#在文件最后添加
export JAVA_HOME=/home/hadoop/app/jdk-7u_65-i585
export PATH=$PATH:$JAVA_HOME/bin

重新加载环境变量配置
source /etc/profile

8、安装hadoop2.6.x

先上传hadoop的安装包到服务器上去/home/hadoop/
注意:hadoop2.x的配置文件$HADOOP_HOME/etc/hadoop

分布式需要修改5个配置文件
8.1配置hadoop
第一个:hadoop-env.sh
vim hadoop-env.sh
#第27行
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_25

第二个:core-site.xml

<!-- 指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://shizhan01:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdpdata</value></property>

第三个:hdfs-site.xml
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>

<!-- secondary启动后可以访问的界面,地址:http://shizhan02:50090  -->
<property>
<name>dfs.secondary.http.address</name>
<value>shizhan02:50090</value>
</property>




第四个:mapred-site.xml 
mv mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<!-- 指定mapreduce运行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

第五个:yarn-site.xml
<!-- 指定YARN的老大(ResourceManager)的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>shizhan01</value>
</property>
<property>

<!-- reducer获取数据的方式 ,使用到的服务-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

8.2将hadoop添加到环境变量

vim /etc/proflie
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_25
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

source /etc/profile

8.3格式化namenode(是对namenode进行初始化)
hdfs namenode -format 

8.4启动hadoop
HDFS
start-dfs.sh

再启动YARN,这步自己做实验时忘记了,下面有讲
start-yarn.sh



8.5验证是否启动成功
使用jps命令验证

4809 ResourceManager
4670 SecondaryNameNode
4487 NameNode
7075 Jps

3542 Jps
2779 NodeManager  如果没有启动yarn这个是没有的
2665 DataNode

 

http://shizhan01:50070 (HDFS管理界面)

9、运行wordcount

使用hadoop fs -mkdir /wordcount/input在hdfs下创建一个目录

运行wordcount ,cd 到目录/home/hadoop/app/hadoop-2.6.4/share/hadoop/mapreduce

运行命令:hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /wordcount/input /wordcount/output

报如下错:测试

18/07/22 01:00:46 INFO client.RMProxy: Connecting to ResourceManager at shizhan01/192.168.137.200:8032
18/07/22 01:00:47 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:48 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:50 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:51 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:52 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:53 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 5 time(s); retry

原因:yarn没有运行,使用start-yarn.sh,这个问题得到解决但是又报如下错

192694875_0001 failed 2 times due to Error launching appattempt_1532192694875_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1532352626322 found 1532193319198
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

原因:集群的时间没有同步,也就是hadoop的namenode,datanode时间不一致出的错

解决办法

    多个datanode与namenode进行时间同步,在每台服务器执行如下两个命令进行同步

这里面用的是亚洲上海

    1)“cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime”

    2)“ntpdate pool.ntp.org”

再运行

运行命令:hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /wordcount/input /wordcount/output

成功如下:

18/07/28 14:52:26 INFO client.RMProxy: Connecting to ResourceManager at shizhan01/192.168.137.200:8032
18/07/28 14:52:27 INFO input.FileInputFormat: Total input paths to process : 2
18/07/28 14:52:27 INFO mapreduce.JobSubmitter: number of splits:2
18/07/28 14:52:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532760644619_0001
18/07/28 14:52:28 INFO impl.YarnClientImpl: Submitted application application_1532760644619_0001
18/07/28 14:52:28 INFO mapreduce.Job: The url to track the job: http://shizhan01:8088/proxy/application_1532760644619_0001/
18/07/28 14:52:28 INFO mapreduce.Job: Running job: job_1532760644619_0001
18/07/28 14:52:36 INFO mapreduce.Job: Job job_1532760644619_0001 running in uber mode : false
18/07/28 14:52:36 INFO mapreduce.Job: map 0% reduce 0%
18/07/28 14:52:54 INFO mapreduce.Job: map 50% reduce 0%
18/07/28 14:52:57 INFO mapreduce.Job: map 100% reduce 0%
18/07/28 14:53:03 INFO mapreduce.Job: map 100% reduce 100%
18/07/28 14:53:03 INFO mapreduce.Job: Job job_1532760644619_0001 completed successfully
18/07/28 14:53:03 INFO mapreduce.Job: Counters: 49

猜你喜欢

转载自www.cnblogs.com/wuyl/p/9382208.html