Ubuntu15.10
Hadoop:2.7.1
java:1.7.0_79
1.安装SSH 并产生公私钥
sudo apt-get install ssh ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2.安装同步工具:
sudo apt-get install rsync
3.下载jdk1.7.0_79
解压到/usr/lib/java/下:
4.下Hadoop2.7.1
解压到/hadoop下:
donald_draper@rain:/hadoop$ tar -zxvf hadoop-2.7.1
5.配置环境变量:
vim ./bashrc
在文件尾部添加:
export JAVA_HOME=/usr/lib/java/jdk1.7.0_79 export JRE_HOME=${JAVA_HOME}/jre export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export HADOOP_HOME=/hadoop/hadoop-2.7.1 export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${PATH}
:wq
保存退出
6.配置hadoop
hadoop2.7.1的所有配置文件从存在/hadoop/hadoop-2.7.1/etc/hadoop之中。
cd /hadoop/hadoop-2.7.1/etc/hadoop
1)修改hadoop-env.sh 加入jdk家目录
export JAVA_HOME=/usr/lib/java/jdk1.7.0_79
2)修改core-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://rain:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/hadoop/tmp</value> </property> </configuration>
3)修改hdfs-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
4)修改mapred-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 启动historyserver --> <property> <name>mapreduce.jobhistory.address</name> <value>rain:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>rain:19888</value> </property> <!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver --> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/history/indone</value> </property> <!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver --> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/history/done</value> </property> </configuration>
5)修改yarn-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat yarn-site.xml <?xml version="1.0"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
6)修改slaves
slaves是指定子节点的位置,因为要在name上启动HDFS、在amrm启动yarn,所以name上的slaves文件指定的是datanode的位置,amrm上的slaves文件指定的是nodemanager的位置
cd /hadoop/hadoop-2.7.1/etc/hadoop/
vim slaves
rain
6.格式化HDFS,执行格式化命令 bin/
hdfs namenode -format
7.启动HDFS,
cd /hadoop/hadoop-2.7.1/sbin/
donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./start-dfs.sh Starting namenodes on [rain] rain: starting namenode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-namenode-rain.out localhost: starting datanode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-datanode-rain.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-secondarynamenode-rain.out
8.启动历史服务器
donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /hadoop/hadoop-2.7.1/logs/mapred-donald_draper-historyserver-rain.out
9.启动YARN
cd /hadoop/hadoop-2.7.1/sbin/ donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./start-yarn.sh starting yarn daemons starting resourcemanager, logging to /hadoop/hadoop-2.7.1/logs/yarn-donald_draper-resourcemanager-rain.out localhost: starting nodemanager, logging to /hadoop/hadoop-2.7.1/logs/yarn-donald_draper-nodemanager-rain.out
10.查看hdfs及yarn启动情况:
donald_draper@rain:/hadoop/hadoop-2.7.1/logs$ jps 7114 DataNode 7743 NodeManager 8921 Jps 7607 ResourceManager 7319 SecondaryNameNode 8779 JobHistoryServer 6984 NameNode
11.执行job
1)hdfs dfs -mkdir /test 2)hdfs dfs -mkdir /test/input 3)hdfs dfs -put etc/hadoop/*.xml /test/input 4)donald_draper@rain:/hadoop/hadoop-2.7.1$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep /test/input /test/output 'dfs[a-z.]+'
执行过程:
16/08/15 11:37:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/08/15 11:37:52 INFO input.FileInputFormat: Total input paths to process : 9 16/08/15 11:37:52 INFO mapreduce.JobSubmitter: number of splits:9 16/08/15 11:37:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471230621598_0001 16/08/15 11:37:53 INFO impl.YarnClientImpl: Submitted application application_1471230621598_0001 16/08/15 11:37:53 INFO mapreduce.Job: The url to track the job: http://rain:8088/proxy/application_1471230621598_0001/ 16/08/15 11:37:53 INFO mapreduce.Job: Running job: job_1471230621598_0001 16/08/15 11:38:16 INFO mapreduce.Job: Job job_1471230621598_0001 running in uber mode : false 16/08/15 11:38:16 INFO mapreduce.Job: map 0% reduce 0% 16/08/15 11:45:11 INFO mapreduce.Job: map 67% reduce 0% 16/08/15 11:48:06 INFO mapreduce.Job: map 74% reduce 22% 16/08/15 11:48:22 INFO mapreduce.Job: map 89% reduce 22% 16/08/15 11:48:23 INFO mapreduce.Job: map 100% reduce 22% 16/08/15 11:48:49 INFO mapreduce.Job: map 100% reduce 30% 16/08/15 11:48:51 INFO mapreduce.Job: map 100% reduce 33% 16/08/15 11:48:54 INFO mapreduce.Job: map 100% reduce 67% 16/08/15 11:49:03 INFO mapreduce.Job: map 100% reduce 100% 16/08/15 11:49:25 INFO mapreduce.Job: Job job_1471230621598_0001 completed successfully 16/08/15 11:49:45 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=51 FILE: Number of bytes written=1156955 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=28205 HDFS: Number of bytes written=143 HDFS: Number of read operations=30 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Killed map tasks=2 Launched map tasks=11 Launched reduce tasks=1 Data-local map tasks=11 Total time spent by all maps in occupied slots (ms)=3308143 Total time spent by all reduces in occupied slots (ms)=227199 Total time spent by all map tasks (ms)=3308143 Total time spent by all reduce tasks (ms)=227199 Total vcore-seconds taken by all map tasks=3308143 Total vcore-seconds taken by all reduce tasks=227199 Total megabyte-seconds taken by all map tasks=3387538432 Total megabyte-seconds taken by all reduce tasks=232651776 Map-Reduce Framework Map input records=781 Map output records=2 Map output bytes=41 Map output materialized bytes=99 Input split bytes=969 Combine input records=2 Combine output records=2 Reduce input groups=2 Reduce shuffle bytes=99 Reduce input records=2 Reduce output records=2 Spilled Records=4 Shuffled Maps =9 Failed Shuffles=0 Merged Map outputs=9 GC time elapsed (ms)=213752 CPU time spent (ms)=39770 Physical memory (bytes) snapshot=1636868096 Virtual memory (bytes) snapshot=7041122304 Total committed heap usage (bytes)=1388314624 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=27236 File Output Format Counters Bytes Written=143 16/08/15 11:49:47 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 16/08/15 11:49:48 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 16/08/15 11:49:49 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 16/08/15 11:49:50 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 16/08/15 11:50:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/08/15 11:50:51 INFO input.FileInputFormat: Total input paths to process : 1 16/08/15 11:50:51 INFO mapreduce.JobSubmitter: number of splits:1 16/08/15 11:50:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471230621598_0002 16/08/15 11:50:53 INFO impl.YarnClientImpl: Submitted application application_1471230621598_0002 16/08/15 11:50:53 INFO mapreduce.Job: The url to track the job: [color=red]http://rain:8088/proxy/application_1471230621598_0002/[/color] 16/08/15 11:50:53 INFO mapreduce.Job: Running job: job_1471230621598_0002 16/08/15 11:51:29 INFO mapreduce.Job: Job job_1471230621598_0002 running in uber mode : false 16/08/15 11:51:29 INFO mapreduce.Job: map 0% reduce 0% 16/08/15 11:51:39 INFO mapreduce.Job: map 100% reduce 0% 16/08/15 11:51:48 INFO mapreduce.Job: map 100% reduce 100% 16/08/15 11:51:51 INFO mapreduce.Job: Job job_1471230621598_0002 completed successfully 16/08/15 11:51:51 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=51 FILE: Number of bytes written=230397 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=276 HDFS: Number of bytes written=29 HDFS: Number of read operations=7 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=6533 Total time spent by all reduces in occupied slots (ms)=8187 Total time spent by all map tasks (ms)=6533 Total time spent by all reduce tasks (ms)=8187 Total vcore-seconds taken by all map tasks=6533 Total vcore-seconds taken by all reduce tasks=8187 Total megabyte-seconds taken by all map tasks=6689792 Total megabyte-seconds taken by all reduce tasks=8383488 Map-Reduce Framework Map input records=2 Map output records=2 Map output bytes=41 Map output materialized bytes=51 Input split bytes=133 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=51 Reduce input records=2 Reduce output records=2 Spilled Records=4 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=59 CPU time spent (ms)=1660 Physical memory (bytes) snapshot=467501056 Virtual memory (bytes) snapshot=1429606400 Total committed heap usage (bytes)=276299776 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=143 File Output Format Counters Bytes Written=29
查看结果
5)
donald_draper@rain:/hadoop/hadoop-2.7.1$ hdfs dfs -get /test/output output 16/08/15 11:52:19 WARN hdfs.DFSClient: DFSInputStream has been closed already 16/08/15 11:52:19 WARN hdfs.DFSClient: DFSInputStream has been closed already
6)
donald_draper@rain:/hadoop/hadoop-2.7.1$ cat output/* 1 dfsadmin 1 dfs.replication
备注:另外一种查看结果的方式
hdfs dfs -cat /test/output/*
12.关闭hadoop
stop-yarn.sh mr-jobhistory-daemon.sh stop historyserver stop-dfs.sh
访问地址:
[url]http://192.168.126.136:50070 namenode[/url]
[url]http://192.168.126.136:8088 resourcemanager [/url]
[url]http://192.168.126.136:19888 jobhistroysever [/url]
相关错误:
2016-08-15 11:28:50,625 FATAL org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: Error starting JobHistoryServer
org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:279)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:156)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:121)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:195)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:222)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:231)
Caused by: java.net.SocketException: Unresolved address
问题解决:
查看mapred-site.xml的服务器地址,及web地址配置