Spark tuning data skew wherein 1- / common cause / effect / tuning scheme common

Data skew characteristics: Individual Task most data processing

Consequences:. 1.OOM; 2 slow down or even become unacceptably slow

 

Common Causes:

Inclined positioning data:

1.WebUI (the size of the amount of data to view the Task running).

2.Log, log in to see which line appears OOM, which look for specific Stage, which in turn determines that a shuffle data skew.

3. To view the code, mainly join, groupByKey, reduceByKey and other code.

4. Distribution of data characteristics.

Guess you like

Origin www.cnblogs.com/ywdjx/p/spark-performance1.html