How to troubleshoot data skew in HiveQL

Go to https://blog.csdn.net/u010010664/article/details/79731044

The phenomenon of data skew:

Will cause problems

May cause the following 2 problems

1) A reduce task is stuck at 99.9% for half a day. as follows

 

2) The task is killed when overtime

The amount of data processed by Reduce is huge. When doing full gc, stop the world. The response timed out, and the task was killed if it exceeded the default 600 seconds. Error message

AttemptID:attempt_1498075186313_242232_r_000021_1 Timed outafter 600 secs Container killed by the ApplicationMaster. Container killed onrequest. Exit code is 143 Container exited with a non-zero exit code 143。

How to judge:

Judge by time

If the time of one reduce is much longer than the time of other reduce. (Note: If the execution time of each reduce is similar, and they are all extremely long, it may be caused by too few reduce settings). As shown below. Most tasks are completed within 4 minutes, only the task r_000021 has not been completed within 30 minutes.

Also note that there is a special case that needs to be ruled out. Sometimes, there may be a problem with the node where a task is executed, causing the task to run particularly slowly. At this time, the speculative execution of mapreduce will restart a task. If the new task can be completed in a short time, it is usually due to the slowness of the individual task caused by the task execution node problem. If it is speculated that the execution of the task after execution is also particularly slow, it is more indicative that the task may have a tilt problem.

 

//everything

 

 

Guess you like

Origin blog.csdn.net/qq_24271537/article/details/113360571