Hadoop2.7.1伪分布式环境搭建

系统环境:
Ubuntu15.10
Hadoop:2.7.1
java:1.7.0_79
1.安装SSH 并产生公私钥
sudo apt-get install ssh
ssh-keygen  -t  dsa  -P  ''  -f  ~/.ssh/id_dsa
cat  ~/.ssh/id_dsa.pub  >>  ~/.ssh/authorized_keys

2.安装同步工具:
sudo apt-get install rsync

3.下载jdk1.7.0_79
解压到/usr/lib/java/下:
4.下Hadoop2.7.1
解压到/hadoop下:
donald_draper@rain:/hadoop$ tar -zxvf hadoop-2.7.1
5.配置环境变量:
vim ./bashrc

在文件尾部添加:
export JAVA_HOME=/usr/lib/java/jdk1.7.0_79
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/hadoop/hadoop-2.7.1
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${PATH} 

:wq
保存退出
6.配置hadoop
hadoop2.7.1的所有配置文件从存在/hadoop/hadoop-2.7.1/etc/hadoop之中。
cd /hadoop/hadoop-2.7.1/etc/hadoop
1)修改hadoop-env.sh 加入jdk家目录
export  JAVA_HOME=/usr/lib/java/jdk1.7.0_79

2)修改core-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://rain:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/hadoop/tmp</value>
    </property>
</configuration>

3)修改hdfs-site.xml 
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

4)修改mapred-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
            <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
            </property>
            <!-- 启动historyserver  -->  
            <property>  
                 <name>mapreduce.jobhistory.address</name>  
                 <value>rain:10020</value>  
            </property>  
  
            <property>  
                  <name>mapreduce.jobhistory.webapp.address</name>  
                  <value>rain:19888</value>  
            </property>  
            <!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver -->  
            <property>  
                   <name>mapreduce.jobhistory.intermediate-done-dir</name>  
                   <value>/history/indone</value>  
            </property>  
            <!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver -->  
            <property>  
                  <name>mapreduce.jobhistory.done-dir</name>  
                  <value>/history/done</value>  
           </property>  
</configuration>


5)修改yarn-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat yarn-site.xml 
<?xml version="1.0"?>
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

6)修改slaves
slaves是指定子节点的位置,因为要在name上启动HDFS、在amrm启动yarn,所以name上的slaves文件指定的是datanode的位置,amrm上的slaves文件指定的是nodemanager的位置
cd /hadoop/hadoop-2.7.1/etc/hadoop/
vim slaves
rain
6.格式化HDFS,执行格式化命令 bin/
hdfs  namenode  -format  

7.启动HDFS,
cd  /hadoop/hadoop-2.7.1/sbin/
donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./start-dfs.sh
Starting namenodes on [rain]
rain: starting namenode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-namenode-rain.out
localhost: starting datanode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-datanode-rain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-secondarynamenode-rain.out

8.启动历史服务器
donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./mr-jobhistory-daemon.sh  start historyserver
starting historyserver, logging to /hadoop/hadoop-2.7.1/logs/mapred-donald_draper-historyserver-rain.out

9.启动YARN
cd  /hadoop/hadoop-2.7.1/sbin/
donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /hadoop/hadoop-2.7.1/logs/yarn-donald_draper-resourcemanager-rain.out
localhost: starting nodemanager, logging to /hadoop/hadoop-2.7.1/logs/yarn-donald_draper-nodemanager-rain.out

10.查看hdfs及yarn启动情况:
donald_draper@rain:/hadoop/hadoop-2.7.1/logs$ jps
7114 DataNode
7743 NodeManager
8921 Jps
7607 ResourceManager
7319 SecondaryNameNode
8779 JobHistoryServer
6984 NameNode

11.执行job
  
1)hdfs  dfs  -mkdir /test
   2)hdfs  dfs  -mkdir /test/input
   3)hdfs  dfs  -put  etc/hadoop/*.xml  /test/input
  4)donald_draper@rain:/hadoop/hadoop-2.7.1$ hadoop jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep /test/input /test/output 'dfs[a-z.]+'

执行过程:
16/08/15 11:37:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/08/15 11:37:52 INFO input.FileInputFormat: Total input paths to process : 9
16/08/15 11:37:52 INFO mapreduce.JobSubmitter: number of splits:9
16/08/15 11:37:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471230621598_0001
16/08/15 11:37:53 INFO impl.YarnClientImpl: Submitted application application_1471230621598_0001
16/08/15 11:37:53 INFO mapreduce.Job: The url to track the job: http://rain:8088/proxy/application_1471230621598_0001/
16/08/15 11:37:53 INFO mapreduce.Job: Running job: job_1471230621598_0001
16/08/15 11:38:16 INFO mapreduce.Job: Job job_1471230621598_0001 running in uber mode : false
16/08/15 11:38:16 INFO mapreduce.Job:  map 0% reduce 0%
16/08/15 11:45:11 INFO mapreduce.Job:  map 67% reduce 0%
16/08/15 11:48:06 INFO mapreduce.Job:  map 74% reduce 22%
16/08/15 11:48:22 INFO mapreduce.Job:  map 89% reduce 22%
16/08/15 11:48:23 INFO mapreduce.Job:  map 100% reduce 22%
16/08/15 11:48:49 INFO mapreduce.Job:  map 100% reduce 30%
16/08/15 11:48:51 INFO mapreduce.Job:  map 100% reduce 33%
16/08/15 11:48:54 INFO mapreduce.Job:  map 100% reduce 67%
16/08/15 11:49:03 INFO mapreduce.Job:  map 100% reduce 100%
16/08/15 11:49:25 INFO mapreduce.Job: Job job_1471230621598_0001 completed successfully
16/08/15 11:49:45 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=51
		FILE: Number of bytes written=1156955
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=28205
		HDFS: Number of bytes written=143
		HDFS: Number of read operations=30
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=2
		Launched map tasks=11
		Launched reduce tasks=1
		Data-local map tasks=11
		Total time spent by all maps in occupied slots (ms)=3308143
		Total time spent by all reduces in occupied slots (ms)=227199
		Total time spent by all map tasks (ms)=3308143
		Total time spent by all reduce tasks (ms)=227199
		Total vcore-seconds taken by all map tasks=3308143
		Total vcore-seconds taken by all reduce tasks=227199
		Total megabyte-seconds taken by all map tasks=3387538432
		Total megabyte-seconds taken by all reduce tasks=232651776
	Map-Reduce Framework
		Map input records=781
		Map output records=2
		Map output bytes=41
		Map output materialized bytes=99
		Input split bytes=969
		Combine input records=2
		Combine output records=2
		Reduce input groups=2
		Reduce shuffle bytes=99
		Reduce input records=2
		Reduce output records=2
		Spilled Records=4
		Shuffled Maps =9
		Failed Shuffles=0
		Merged Map outputs=9
		GC time elapsed (ms)=213752
		CPU time spent (ms)=39770
		Physical memory (bytes) snapshot=1636868096
		Virtual memory (bytes) snapshot=7041122304
		Total committed heap usage (bytes)=1388314624
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=27236
	File Output Format Counters 
		Bytes Written=143
16/08/15 11:49:47 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/08/15 11:49:48 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/08/15 11:49:49 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/08/15 11:49:50 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
16/08/15 11:50:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/08/15 11:50:51 INFO input.FileInputFormat: Total input paths to process : 1
16/08/15 11:50:51 INFO mapreduce.JobSubmitter: number of splits:1
16/08/15 11:50:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471230621598_0002
16/08/15 11:50:53 INFO impl.YarnClientImpl: Submitted application application_1471230621598_0002
16/08/15 11:50:53 INFO mapreduce.Job: The url to track the job: [color=red]http://rain:8088/proxy/application_1471230621598_0002/[/color]
16/08/15 11:50:53 INFO mapreduce.Job: Running job: job_1471230621598_0002
16/08/15 11:51:29 INFO mapreduce.Job: Job job_1471230621598_0002 running in uber mode : false
16/08/15 11:51:29 INFO mapreduce.Job:  map 0% reduce 0%
16/08/15 11:51:39 INFO mapreduce.Job:  map 100% reduce 0%
16/08/15 11:51:48 INFO mapreduce.Job:  map 100% reduce 100%
16/08/15 11:51:51 INFO mapreduce.Job: Job job_1471230621598_0002 completed successfully
16/08/15 11:51:51 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=51
		FILE: Number of bytes written=230397
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=276
		HDFS: Number of bytes written=29
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=6533
		Total time spent by all reduces in occupied slots (ms)=8187
		Total time spent by all map tasks (ms)=6533
		Total time spent by all reduce tasks (ms)=8187
		Total vcore-seconds taken by all map tasks=6533
		Total vcore-seconds taken by all reduce tasks=8187
		Total megabyte-seconds taken by all map tasks=6689792
		Total megabyte-seconds taken by all reduce tasks=8383488
	Map-Reduce Framework
		Map input records=2
		Map output records=2
		Map output bytes=41
		Map output materialized bytes=51
		Input split bytes=133
		Combine input records=0
		Combine output records=0
		Reduce input groups=1
		Reduce shuffle bytes=51
		Reduce input records=2
		Reduce output records=2
		Spilled Records=4
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=59
		CPU time spent (ms)=1660
		Physical memory (bytes) snapshot=467501056
		Virtual memory (bytes) snapshot=1429606400
		Total committed heap usage (bytes)=276299776
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=143
	File Output Format Counters 
		Bytes Written=29

查看结果
5)
donald_draper@rain:/hadoop/hadoop-2.7.1$  hdfs  dfs  -get /test/output   output
16/08/15 11:52:19 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/08/15 11:52:19 WARN hdfs.DFSClient: DFSInputStream has been closed already

6)
donald_draper@rain:/hadoop/hadoop-2.7.1$ cat   output/* 
1	dfsadmin
1	dfs.replication


备注:另外一种查看结果的方式
 
 hdfs dfs -cat /test/output/* 

12.关闭hadoop
stop-yarn.sh
mr-jobhistory-daemon.sh stop historyserver
stop-dfs.sh 

访问地址:
[url]http://192.168.126.136:50070 namenode[/url]



[url]http://192.168.126.136:8088 resourcemanager [/url]




[url]http://192.168.126.136:19888  jobhistroysever [/url]






相关错误:
2016-08-15 11:28:50,625 FATAL org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: Error starting JobHistoryServer
org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:279)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:156)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:121)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:195)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:222)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:231)
Caused by: java.net.SocketException: Unresolved address
问题解决:
查看mapred-site.xml的服务器地址,及web地址配置



猜你喜欢

转载自donald-draper.iteye.com/blog/2317446