How to understand MapReduce calculator, this article will give you the answer

What MapReduce counters are?

Counter is used to record the progress and execution of the job status. Its role can be understood as a log. We can insert a counter somewhere in the program, record changes in data or progress.

What MapReduce counter do?

MapReduce counter (Counter) to provide a window for us, for a variety of detailed data observed MapReduce Job operation period. MapReduce is helpful for performance tuning, MapReduce performance optimization assessments are mostly based on demonstrated value of these Counter.

What are MapReduce built-in counter?

MapReduce comes with a number of default Counter, now we have to analyze the meaning of these Counter, to facilitate observation of Job result, the number of bytes and pieces, such as the number of bytes of input, output the number of bytes, Map-ended input / output, Reduce end of the input / bytes and the number of pieces and the like output. Now we need to know these built-in counter, the counter know the name of the group (groupName) and counter name (counterName), after using the counter looks groupName and counterName can be.

Counter task
during task execution, the task of gathering information about the job counter of the results of all tasks of each job will be gathered. For example, MAP_INPUT_RECORDS counter counts the total number of each map task input record, and collect on the map all tasks of a job, so that the final number is the sum of all input records for the entire job. Task counter its associated maintenance tasks, and regularly sent to TaskTracker, and then sent to JobTracker by the TaskTracker. Thus, the counter can be aggregated globally. Below we understand the various tasks counter.

  • MapReduce task counter
    • GroupName MapReduce task counter is org.apache.hadoop.mapreduce.TaskCounter, the counter comprising the following table:
      Here Insert Picture DescriptionHere Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
  • File System counter **
  • groupName file system counter is org.apache.hadoop.mapreduce.FileSystemCounter, the counter comprising the following table:
    Here Insert Picture Description
    Here Insert Picture Description
  • FileInputFormat (input file tasks) counter
    • GroupName FileInputFormat counter is org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter, which contains a counter in the table below, the column counter name in parentheses () is the content counterName:
      Here Insert Picture Description
  • FileOutputFormat (output file tasks) counter
    • GroupName FileOutputFormat counter is org.apache.hadoop.mapreduce.lib.input.FileOutputFormatCounter, which contains a counter in the table below:
      Here Insert Picture Description
      Job Counter
  • Counter maintained by the JobTracker job (or YARN), there is no need to transmit data across the network, which includes other counter "User-defined counters", including different. These counters are job-level statistics, its value will not change with the task to run.
    • groupName job counter counter is org.apache.hadoop.mapreduce.JobCounter, the counter comprising the following table: Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description
      Here Insert Picture Description

How to counter the use? Let's review how to use the counter.

  • Custom Counters
    • Enum declaration counter
// 自定义枚举变量Enum 
Counter counter = context.getCounter(Enum enum)
  • Custom counter
// 自己命名groupName和counterName 
Counter counter = context.getCounter(String groupName,String counterName)
  • To counter assignment
    • Initializes the counter
counter.setValue(long value);// 设置初始值
- 计数器自增
counter.increment(long incr);// 增加计数
  • Gets the value of the counter
    • Gets the value of the enumeration counter
Configuration conf = new Configuration(); 
Job job = new Job(conf, "MyCounter"); 
job.waitForCompletion(true); 
Counters counters=job.getCounters(); 
Counter counter=counters.findCounter(LOG_PROCESSOR_COUNTER.BAD_RECORDS_LONG);// 查找枚举计数器,假如Enum的变量为BAD_RECORDS_LONG long value=counter.getValue();//获取计数值
  • Gets the value of custom counter
Configuration conf = new Configuration(); 
Job job = new Job(conf, "MyCounter"); 
job.waitForCompletion(true); 
Counters counters = job.getCounters(); 
Counter counter=counters.findCounter("ErrorCounter","toolong");// 假如groupName为ErrorCounter,counterName为toolong long value = counter.getValue();// 获取计数值 
  • Gets the value of the built-in counter
Configuration conf = new Configuration(); 
Job job = new Job(conf, "MyCounter"); 
job.waitForCompletion(true); 
Counters counters=job.getCounters(); // 查找作业运行启动的reduce个数的计数器,groupName和counterName可以从内置计数器表格查询(前面已经列举有) 
Counter counter=counters.findCounter("org.apache.hadoop.mapreduce.JobCounter","TOTAL_LAUNCHED_REDUCES");// 假如groupName为org.apache.hadoop.mapreduce.JobCounter,counterName为TOTAL_LAUNCHED_REDUCES long value=counter.getValue();// 获取计数值
  • Gets the value of all counters
Configuration conf = new Configuration(); 
Job job = new Job(conf, "MyCounter"); 
Counters counters = job.getCounters();
 for (CounterGroup group : counters) { 
  for (Counter counter : group) { 
    System.out.println(counter.getDisplayName() + ": " + counter.getName() + ": "+ counter.getValue()); 
  } 
}

Custom counter

Custom counter used widely, especially in the number of invalid statistical data, we will use a counter to record the number of error log. Here we custom counters, statistics invalid input.

Data set
, if a file specification format is three fields, "\ t" as a separator, two of which abnormal data, a data field is only two, one with four data fields. The contents of which are as follows:
Here Insert Picture Description

achieve

public class MyCounter {
    // \t键
    private static String TAB_SEPARATOR = "\t";
    public static class MyCounterMap extends Mapper<LongWritable, Text, Text, Text> {
        // 定义枚举对象
        public static enum LOG_PROCESSOR_COUNTER {
            BAD_RECORDS_LONG, BAD_RECORDS_SHORT
        };    
            
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String arr_value[] = value.toString().split(TAB_SEPARATOR);
            if (arr_value.length > 3) {
                /* 自定义计数器 */
                context.getCounter("ErrorCounter", "toolong").increment(1);
                /* 枚举计数器 */                context.getCounter(LOG_PROCESSOR_COUNTER.BAD_RECORDS_LONG).increment(1);
            } else if (arr_value.length < 3) {
                // 自定义计数器
                context.getCounter("ErrorCounter", "tooshort").increment(1);
                // 枚举计数器                context.getCounter(LOG_PROCESSOR_COUNTER.BAD_RECORDS_SHORT).increment(1);
            }
        }
    }
    @SuppressWarnings("deprecation")
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        String[] args0 = { 
                "hdfs://hadoop2:9000/buaa/counter/counter.txt",
                "hdfs://hadoop2:9000/buaa/counter/out/" 
            };
            
        // 读取配置文件
        Configuration conf = new Configuration();        
        // 如果输出目录存在,则删除
        Path mypath = new Path(args0[1]);
        FileSystem hdfs = mypath.getFileSystem(conf);
        if (hdfs.isDirectory(mypath)) {
            hdfs.delete(mypath, true);
        }
        
        // 新建一个任务
        Job job = new Job(conf, "MyCounter");
        // 主类
        job.setJarByClass(MyCounter.class);
        // Mapper
        job.setMapperClass(MyCounterMap.class);
        // 输入目录
        FileInputFormat.addInputPath(job, new Path(args0[0]));
        // 输出目录
        FileOutputFormat.setOutputPath(job, new Path(args0[1]));        
        // 提交任务,并退出
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Operation results in the output log, the value of the counter view
Here Insert Picture Description
can be seen from the log, and through enum custom counter in two ways, the statistics of the data is the same irregularities.

Published 36 original articles · won praise 13 · views 1054

Guess you like

Origin blog.csdn.net/weixin_44598691/article/details/105011660