Tuning mr

First, the purpose of tuning
the full advantage of the performance of the machines, done faster computing tasks of program mr. Even under the limited conditions of the machine, capable of supporting more than enough to run the program mr.
Second, a general overview of the tuning
from the inner workings of the program mr, mr we can understand a program by the mapper and reducer two phases, wherein the mapper includes a read data phase, map processing, and write operations (sort and merge / sort & merge), and phase reducer comprises a mapper output data acquisition, data consolidation (sort & merge), reduce processing and write operations. Then the seven sub-phase, the intensity can be large tuning map is output, tuning data merge operation and reducer reducer number of these three areas. That although the performance tuning including cpu, memory, disk, and network io four major areas, but execution flow from mr program, we can know is that there are tuning memory, disk, and network io. Tuning in mr program, the main consideration is to reduce network traffic and reduce disk IO operations, so mr tuning this course includes server tuning code tuning, Mapper tuning, the reducer tuning and runner tuning this five.

Three, MapReduce Tuning
1. mapreduce.task.io.sort.factor: mr merge sort program when the number of open files, default is 10.
2. mapreduce.task.io.sort.mb: program of Mr merge sort operation when the time mapper or write data, memory size, a default 100M
3. mapreduce.map.sort.spill.percent: mr program threshold flush operation, default 0.80.
4. mapreduce.reduce.shuffle.parallelcopies: mr threads reducer copy program data, default 5.
5. mapreduce.reduce.shuffle.input.buffer.percent: reduce the time to copy the map data memory heap size percentage specified, the default is 0.70, an appropriate value can be reduced to increase the disk map data overflow, system performance can be improved.
6. mapreduce.reduce.shuffle.merge.percent: reduce time to carry out shuffle, the threshold for starting the process of merging overflow output and disk write, the default is 0.66. If allowed, and increase the proportion of overflow can reduce the number of disk write, improve system performance. Used with mapreduce.reduce.shuffle.input.buffer.percent.
7. mapreduce.task.timeout: task implementation mr reporting program expiration time, default 600 000 (10 min), 0 to not set the judgment value.

Fourth, the code tuning
the code tuning, mainly mapper and reducer, the object more than once for the creation of the code proposed operation. The tuning codes and general procedure as java.
Five, mapper tuning
mapper tuning mainly on one goal: to reduce output. We can combine mapper tune by increasing the output stage and compression settings.
combine Introduction:
implement a custom class reducer combine to inherit characteristics:
to map the output of key / value pair as the key input and output of the key, is to reduce the output of the network, part of the data on the merger node map.
More suitable, map output is numeric, convenient statistics.
Compression settings:
Set start compression and compression specified when submitting job, respectively.
Six, tuning reducer
reducer tuning is accomplished primarily by the number of tuning parameters and settings of the reducer.
Tuning reducer Number:
Requirements: consistent with the results of a reducer and a plurality of the reducer, can not result in an exception because the execution result plurality reducer.
Rules: General requirements mr execution program in hadoop cluster, after map execution is completed 100 percent, as early as possible reducer see to the implementation of 33 per cent, can job_id command hadoop job -status or web page to view.
Cause: Process number of the map is performed through the return recordread inputformat defined; the reducer is formed of three parts, respectively, to read mapper output data, output data and merge all reduce processing, wherein the first step to be performed dependent on the map, Therefore, in the case of the case of large amount of data, a reducer can not meet the performance requirements, we can solve this problem by increase the number of reducer.
Advantages: full use of the advantages of the cluster.
Disadvantages: Some programs not mr reducer using multiple advantages, such as access to the top n mr program.
Seven, runner tuning
runner tuning is actually a set job parameters when submitting job, generally can be set through code and xml files in two ways.
1-8 See ActiveUserRunner (before and configure methods), 9 Detailed TransformerBaseRunner (initScans Method)

1. mapred.child.java.opts: Modify childyard processes executing jvm parameters are valid for the map and reducer, default: -Xmx200m
2. mapreduce.map.java.opts: the need to change the process map childyard stage of the implementation of jvm parameters, the default is empty, when is empty, use mapred.child.java.opts.
3. mapreduce.reduce.java.opts: Modify childyard process reducer stage of the implementation of jvm parameters, the default is empty, when is empty, use mapred.child.java.opts.
4. mapreduce.job.reduces: number reducer modify default is 1. It can be changed by job.setNumReduceTasks method.
5. mapreduce.map.speculative: whether to start the implementation phase of the speculative map, the default is true. In fact, the general situation is better set to false. You may be set by a method job.setMapSpeculativeExecution.
6. mapreduce.reduce.speculative: whether to start the implementation phase reduce speculation, the default is true, in fact, generally set fase better. You may be set by a method job.setReduceSpeculativeExecution.
7. mapreduce.map.output.compress: Set whether to start the compression mechanism map output, the default is false. When needed to reduce network transmission can be set to true.
8. mapreduce.map.output.compress.codec: map output compression mechanism is provided, the default is org.apache.hadoop.io.compress.DefaultCodec, recommended SnappyCodec (version needs to be performed before the mounting operation, the current version is not clear , installation parameters: http: //www.cnblogs.com/chengxin1982/p/3862309.html)

Guess you like

Origin www.cnblogs.com/nacyswiss/p/12627891.html