How hadoop record -MapReduce of treatment failure task (reprint)

1.1 work a task blocked, prolonged occupation does not release resources

1.2 MapTask task has finished running, the process ReduceTask operation, a node MapTask hung up, or the result of a MapTask disk storage that is broken

In TaskTracker, each task will regularly report to the TaskTracker progress, if progress is not the same report, so once they reach the timeout limit, TaskTracker will kill the task, and the task of reporting to the state KILLED YARN, in order to re-schedule the task.

Case1: If the node is hung, JobTracker heartbeat mechanism know TaskTracker died, before re-running Task Scheduling and running jobs already run finished MapTask

Case2: If the node is not linked to only store the results of MapTask damaged disk, then two possibilities

# ReduceTask all phases have been completed shuffle

# There are some ReduceTask not completed shuffle stage, you need to read the task MapTask

And later executed by By adjusting mapreduce.reduce.java.opts = -Xmx5000m task. That is to shuffle 4G memory

Adjust parameters
mapreduce.reduce.shuffle.merge.percent = 0.4
mapreduce.reduce.shuffle.parallelcopies =. 5
mapreduce.reduce.shuffle.input.buffer.percent = 0.6
mapreduce.reduce.shuffle.memory.limit.percent = 0.17
shuffle any natural to 4G of memory. Feel shuffle.memory.limit.percent not play a role

If the physical memory overflow, mapreduce.reduce.memory.mb need to adjust the parameters, the default is 1024,

If virtual memory overflow, need to adjust yarn.nodemanager.vmem-pmem-ratio, the default is 2.1, you can transfer large,

Virtual memory is not limited either directly, i.e. yarn.nodemanager.vmem-check-enabled is set to false (the yarn-site.xml, the default is to true),

Physical memory may not check the yarn.nodemanager.pmem-check-enabled set to false (default is true).

100G data about each treatment, but a program is running, CPU usage may reach 100 percent until the end of the task, the memory usage does not reach the maximum. On NodeMangager machine, use the jps command to view, you will find eight YarnChild process, whether by changing the configuration file to reduce the number of YarnChild process, to achieve reduction in CPU utilization.

 

There are several ways, one is to use a multi-tenant resource scheduler, such as fairscheduler, capacity scheduler, configuring multiple types of resource scheduling functions, so you can use the parameters specified mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores the number of CPU use for each task, the default is one, the other, by default only supports memory scheduler, you can increase memory usage by the task parameters mapreduce.map.memory.mb and mapreduce.reduce.memory.mb, a reduction the number of concurrent tasks on the node

If you configure yarn.nodemanager.resource.memory-mb this configuration item, you change the value of larger, or directly with the default and then to adjust as needed


The reason for the suspension in question from the task, if NMr period of time (default is 10min, can be set by mapred.task.timeout property value, in milliseconds) has not been received within its progress report, put it mark to fail

###########################################

MapReduce optimization
optimization (1) Resource Parameters:
The following parameters are configured in their MapReduce applications can take effect

mapreduce.map.memory.mb: a Map Task memory cap that can be used (unit: MB), the default is 1024. If the amount of resources Map Task actual use exceeds this value, it will be forced to kill.
mapreduce.reduce.memory.mb: a Reduce Task resource limit that can be used (unit: MB), the default is 1024. If the amount of the actual use of resources Reduce Task exceeds this value, it will be forced to kill.
mapreduce.map.cpu.vcores: each cpu core Maptask maximum number of available, default values:. 1
mapreduce.reduce.cpu.vcores: Each Reducetask number using up cpu core Default:. 1
mapreduce.map.java.opts: Map Task JVM parameters, you can configure the default java heap size and other parameters in this example: "- Xmx1024m -verbose: gc -Xloggc: /tmp/@[email protected]
(@ @ taskid Hadoop framework will be wrapped the corresponding taskid), the default value: ""
mapreduce.reduce.java.opts: the Reduce Task JVM arguments, you can configure the default java heap size and other parameters in this example: "- Xmx1024m -verbose: gc -Xloggc : /tmp/@[email protected] ", default:" "
should start before the yarn on the configuration server's configuration file to take effect

minimum configuration yarn.scheduler.minimum-allocation-mb RM request in each container, in MB, 1024 by default.
maximum distribution yarn.scheduler.maximum-allocation-mb RM request in each container, in MB, 8192 by default.
vcores. 1-Allocation-yarn.scheduler.minimum
yarn.scheduler.maximum-Allocation-vcores 32
yarn.nodemanager.resource.memory-MB represents the total amount of physical memory available on that node YARN, the default is 8192 (MB), attention If your node is not enough memory resources 8GB, you need to Tiaojian small value, while the total physical memory detection node YARN not intelligent.
shuffle performance optimization of key parameters, it should be configured prior to starting yarn

circular buffer size mapreduce.task.io.sort.mb 100 shuffle default 100m
threshold mapreduce.map.sort.spill.percent 0.8 ring buffer overflow, default 80%

Optimization (2) Fault-tolerant parameters:
mapreduce.map.maxattempts: The maximum number of times each Map Task retry, retry once the parameter exceeds this value, it is considered Map Task fails, default: 4.

mapreduce.reduce.maxattempts: The maximum number of times each Reduce Task retry, retry once the parameter exceeds this value, it is considered Map Task fails, default: 4.

mapreduce.map.failures.maxpercent: Map Task fails when failure ratio exceeds this value, the entire job fails, the default value is 0. If the application allows you discard parts of the input data, the value is set which is greater than 0 value, 5 for example, it indicates if there is less than 5% of the Map Task failure (if a retry count exceeds Map Task mapreduce.map.maxattempts, is considered the Map Task fails, the corresponding input data will not produce any results) , throw the entire job considered successful.

mapreduce.reduce.failures.maxpercent: Reduce Task fails when failure ratio exceeds this value, the entire job fails, the default value is 0.

mapreduce.task.timeout: If a task does not enter at any given time, that will not read the new data, there is no output data is considered the task in block state, may be temporarily stuck, perhaps forever will be stuck . In order to prevent because the user does not exit the program never block, then forced to set a timeout (in milliseconds), the default is 600,000 value of 0 disables the timeout.

Optimization (3) efficiency with stability parameter (speculative execution tasks):
Straggle (stragglers) are those run very slowly but eventually successful completion of the task. A task left behind Map Reduce tasks will prevent started.

Hadoop not automatically correct task left behind, it can identify those tasks run slow, and it will produce a result equivalent to the other as a backup task, the task using the completed first, at this time another task will be asked stop execution. This technique is called speculative execution (speculative execution).

Default speculative execution.
Property description
mapreduce.map.speculative Map speculative execution control task (default true)
speculative execution mapreduce.reduce.speculative control Reduce task (default true)
mapreduce.job.speculative.speculativecap task speculative execution function of the total number of tasks can ratio (0.1 default range. 1 ~ 0)
mapreduce.job.speculative.slownodethreshold TaskTracker determines whether a start of a task for the task Speculative (default. 1)
mapreduce.job.speculative.slowtaskthreshold task to determine whether a task can be started Speculative (default 1)
The minimum size of a slice mapreduce.input.fileinputformat.split.minsize FileInputFormat do when sliced, default 1.

mapreduce.input.fileinputformat.split.maxsize FileInputFormat maximum size slice sliced ​​doing

 

Guess you like

Origin www.cnblogs.com/xinfang520/p/10994528.html