【十八掌●武功篇】第七掌:MapReduce之计数器

计数器是Hadoop框架使用的一种针对错误信息收集的手段,主要用于对数据的控制及收集统计信息,计数器可以帮助程序设计人员收集某一类特定信息的数据,对于大多数的Hadoop框架内的事件和组件,使用计数器来获取信息比查阅日志文件要容易的多。MapReduce框架中已经内置了一些计数器,也可以自定义计数器。

(1) 内置计数器的分类

分组 属性名
MapReduce任务计数器 org.apache.hadoop.mapreduce.TaskCounter
文件系统计数器 org.apache.hadoop.mapreduce.FileSystemCounter
输入文件计数器 org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
输出文件计数器 org.apache.hadoop.mapreduce.lib.input.FileOutputFormatCounter
任务计数器 org.apache.hadoop.mapreduce.JobCounter

(2) MapReduce任务计数器

MapReduce任务计数器主要用于收集任务在运行时的任务信息。

分组 描述
MAP_INPUT_RECORDS 输入的数据记录数
MAP_SKIPPED_RECORDS 输入跳过的记录数
MAP_INPUT_BYTES 输入的记录字节数
SPLIT_RAW_BYTES 输入分片中的字节数
MAP_OUTPUT_RECORDS Map任务输出记录数
MAP_OUTPUT_BYTES Map输出的字节数
MAP_OUTPUT_MATERIALIZED_BYTES Map写入磁盘的字节数
COMBINE_INPUT_RECORDS 合并任务记录数
COMBINE_OUTPUT_RECORDS 合并任务输出记录数
REDUCE_INPUT_GROUPS Reduce任务的分组数
REDUCE_INPUT_RECORDS Reduce任务记录数
REDUCE_OUTPUT_RECORDS Reduce任务输出记录数
REDUCE_SKIPPED_RECORDS Reduce跳过记录数
REDUCE_SKIPPED_GROUPS Reduce跳过分组
CPU_MILLISECONDS CPU运行时间
GC_TIME_MILLS 垃圾收集器运行时间
SHUFFLED_MAPS 排序任务的数目

(3) 文件系统计数器

分组 属性名
BYTES_READ 输入的总的数据数
BYTES_WRITTEN 写出的总的数据数

(4) 任务计数器

分组 属性名
TOTAL_LAUNCHED_MAPS 已经启用的MAP任务数
TOTAL_LAUNCHED_REDUCE 已经启用的Reduce任务数
TOTAL_LAUNCHED_UBERTASKS 已经启用的全部上级任务数
NUM_UBER_SUBREDUCES 对于Reduce任务启动的上级任务数
NUM_FAILED_REDUCES 失败的Reduce任务
NUM_FAILED_MAPS 失败的Map任务数
DATA_LOCAL_MAPS 与输入数据处于同一节点的任务数
OTHER_LOCAL_MAPS 其他节点输入数据的任务数
SLOTS_MMILLIS_REDUCES 在Reduce任务上运行的是时间数
SLOTS_MILLS_MAPS 在MAP任务上运行的时间数
RACK_LOCAL_MAPS 与输入数据处于同一个节点上的任务数

(5) 自定义计数器

下面这个计数器例子是统计mapreduce中分布在三个数值段中的记录的个数。
另外是还有动态计数器的例子。

/**
 * Created by 鸣宇淳 on 2017/5/23.
 */
public  class MyCounter {
   public static enum PvSoltEnum
    {
        Solt_0_to_1000,
        Solt_1000_to_10000,
        Solt_more_10000
    }
}
static final String NUMGROUP = "NumGroup";
static final String STARTBYONE = "BigPV";

//Mapper类
public static class SortWCMapper extends
        Mapper<LongWritable, Text, MyDataTypeWritable, IntWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String lineValue = value.toString();

        String[] strs = lineValue.split(",");
        if (2 != strs.length) {
            return;
        }
        Integer keyInt = Integer.valueOf(strs[0]);
        Integer valInt = Integer.valueOf(strs[1]);

        //自定义计数器
        if (valInt.compareTo(1000) < 0) {
            context.getCounter(MyCounter.PvSoltEnum.Solt_0_to_1000).increment(1);
        } else if (valInt.compareTo(1000) >= 0 && valInt.compareTo(10000) < 0) {
            context.getCounter(MyCounter.PvSoltEnum.Solt_1000_to_10000).increment(1);
        }

        //动态计数器
        if (valInt > 20000) {
            context.getCounter(NUMGROUP, STARTBYONE).increment(1);
        }

        MyDataTypeWritable mapOutputKey = new MyDataTypeWritable(keyInt, valInt);
        context.write(mapOutputKey, new IntWritable(mapOutputKey.getSecond()));
    }
}
public int run(String[] args) throws Exception {
    //获取配置
    Configuration configuration = this.getConf();
    //创建job
    Job job = Job.getInstance(configuration, SortWCMapReduce.class.getSimpleName());
    //指定MapReduce主类
    job.setJarByClass(SortWCMapReduce.class);
    //指定输入路径
    Path inpath = new Path(args[0]);
    FileInputFormat.addInputPath(job, inpath);
    //指定输出路径
    Path outpath = new Path(args[1]);
    FileOutputFormat.setOutputPath(job, outpath);

    job.setInputFormatClass(TextInputFormat.class);
    job.setMapperClass(SortWCMapper.class);
    job.setMapOutputKeyClass(MyDataTypeWritable.class);
    job.setMapOutputValueClass(IntWritable.class);
    job.setReducerClass(SortWCReducer.class);
    boolean isSucces = job.waitForCompletion(true);

    job.getJobID();
    Counters counters = job.getCounters();
    //读取自定义计数器
    Counter myc=counters.findCounter(MyCounter.PvSoltEnum.Solt_0_to_1000);
    //读取动态计数器
    Counter c = counters.findCounter(NUMGROUP, STARTBYONE);
    System.out.println("自定义计数器——Solt_0_to_1000:" + myc.getValue() );
    System.out.println("动态计数器——NUMGROUP:" + c.getValue() );
    return isSucces ? 0 : 1;
}

所有的计数器会在job运行完成后打印显示:

17/05/24 06:36:28 INFO mapreduce.Job: Counters: 53
    File System Counters
        FILE: Number of bytes read=199133652
        FILE: Number of bytes written=399356575
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=130421676
        HDFS: Number of bytes written=213744
        HDFS: Number of read operations=33
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=20
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=10
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=66692
        Total time spent by all reduces in occupied slots (ms)=112790
        Total time spent by all map tasks (ms)=66692
        Total time spent by all reduce tasks (ms)=56395
        Total vcore-seconds taken by all map tasks=66692
        Total vcore-seconds taken by all reduce tasks=56395
        Total megabyte-seconds taken by all map tasks=68292608
        Total megabyte-seconds taken by all reduce tasks=115496960
    Map-Reduce Framework
        Map input records=14223828
        Map output records=14223828
        Map output bytes=170685936
        Map output materialized bytes=199133652
        Input split bytes=95
        Combine input records=0
        Combine output records=0
        Reduce input groups=19289
        Reduce shuffle bytes=199133652
        Reduce input records=14223828
        Reduce output records=19289
        Spilled Records=28447656
        Shuffled Maps =10
        Failed Shuffles=0
        Merged Map outputs=10
        GC time elapsed (ms)=1117
        CPU time spent (ms)=107380
        Physical memory (bytes) snapshot=5327536128
        Virtual memory (bytes) snapshot=31331090432
        Total committed heap usage (bytes)=5032050688
    NumGroup
        BigPV=144
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    mapreduce.counter.MyCounter$PvSoltEnum
        Solt_0_to_1000=14218978
        Solt_1000_to_10000=4532
        Solt_more_10000=317
    File Input Format Counters 
        Bytes Read=130421581
    File Output Format Counters 
        Bytes Written=213744
自定义计数器——Solt_0_to_1000:14218978
动态计数器——NUMGROUP:144
发布了74 篇原创文章 · 获赞 74 · 访问量 5万+

猜你喜欢

转载自blog.csdn.net/chybin500/article/details/79389871