运行mapreduce任务 统计某个单词出现的次数

版权声明:@抛物线 https://blog.csdn.net/qq_28513801/article/details/89481120

在这里插入图片描述
mapper
在这里插入图片描述
reducer
在这里插入图片描述

在这里插入图片描述
root@master opt]# hadoop jar /opt/hadoop-2.7.6/share/hadoop/tools/lib/hadoop-streaming-2.7.6.jar -files ‘mapper.py,reducer.py’ -input /data/word.txt
-output /data/out6 -mapper ./mapper.py -reducer ./reducer.py

packageJobJar: [/tmp/hadoop-unjar5048570426455366485/] [] /tmp/streamjob2720383777039712169.jar tmpDir=null
19/04/23 08:53:38 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.100.10:8032
19/04/23 08:53:38 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.100.10:8032
19/04/23 08:53:39 INFO mapred.FileInputFormat: Total input paths to process : 1
19/04/23 08:53:39 INFO mapreduce.JobSubmitter: number of splits:2
19/04/23 08:53:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1556020919261_0005
19/04/23 08:53:39 INFO impl.YarnClientImpl: Submitted application application_1556020919261_0005
19/04/23 08:53:39 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1556020919261_0005/
19/04/23 08:53:39 INFO mapreduce.Job: Running job: job_1556020919261_0005
19/04/23 08:53:45 INFO mapreduce.Job: Job job_1556020919261_0005 running in uber mode : false
19/04/23 08:53:45 INFO mapreduce.Job: map 0% reduce 0%
19/04/23 08:53:51 INFO mapreduce.Job: map 100% reduce 0%
19/04/23 08:53:56 INFO mapreduce.Job: map 100% reduce 100%
19/04/23 08:53:57 INFO mapreduce.Job: Job job_1556020919261_0005 completed successfully
19/04/23 08:53:57 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=378090
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=239
HDFS: Number of bytes written=7
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=9132
Total time spent by all reduces in occupied slots (ms)=2713
Total time spent by all map tasks (ms)=9132
Total time spent by all reduce tasks (ms)=2713
Total vcore-milliseconds taken by all map tasks=9132
Total vcore-milliseconds taken by all reduce tasks=2713
Total megabyte-milliseconds taken by all map tasks=9351168
Total megabyte-milliseconds taken by all reduce tasks=2778112
Map-Reduce Framework
Map input records=3
Map output records=9
Map output bytes=65
Map output materialized bytes=95
Input split bytes=168
Combine input records=0
Combine output records=0
Reduce input groups=9
Reduce shuffle bytes=95
Reduce input records=9
Reduce output records=1
Spilled Records=18
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=226
CPU time spent (ms)=2220
Physical memory (bytes) snapshot=713166848
Virtual memory (bytes) snapshot=6310293504
Total committed heap usage (bytes)=484442112
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=71
File Output Format Counters
Bytes Written=7
19/04/23 08:53:57 INFO streaming.StreamJob: Output directory: /data/out6

[root@master opt]# hadoop fs -ls -R /data/out6
-rw-r–r-- 3 root supergroup 0 2019-04-23 08:53 /data/out6/_SUCCESS
-rw-r–r-- 3 root supergroup 7 2019-04-23 08:53 /data/out6/part-00000

[ root@master opt]# hadoop fs -text /data/out6/part-00000
like 1
[root@master opt]#
在这里插入图片描述

如果出现reducer返回值为0的情况下 可以加入-D参数
-D stream.non.zero.exit.is.failure=false

猜你喜欢

转载自blog.csdn.net/qq_28513801/article/details/89481120