Data skew characteristics: Individual Task most data processing
Consequences:. 1.OOM; 2 slow down or even become unacceptably slow
Common Causes:
Inclined positioning data:
1.WebUI (the size of the amount of data to view the Task running).
2.Log, log in to see which line appears OOM, which look for specific Stage, which in turn determines that a shuffle data skew.
3. To view the code, mainly join, groupByKey, reduceByKey and other code.
4. Distribution of data characteristics.