Data processing spark inclined

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/wangfenghui132/article/details/91494070

1, in the absence of groupby reduceby, the original data can repartition () look to increase the number of task.

2, if the case groupby reduceby presence, if only count the number of key, the key can be added in front of the random number, the key re-refining, can significantly improve the processing speed.

3, if the situation groupby existence, to count all the data for a key, you can use the hive was pretreated, at worst do data filtering. Other temporarily not seem a good way.

Guess you like

Origin blog.csdn.net/wangfenghui132/article/details/91494070