Narrow and Wide Transformation

1, in narrow transformations, child partitions of a RDD depend on the partitions of the parent RDD. In life, we inherit much of our ambitions and living locations from our parents. But we all know that some time in life, we need to perform the wide transformation and take a charge. To break through, to change, to make all the amazing things in life that is worthwhile.

Narrow transformation is for children. Wide transformation is for adults. It is time for us to build overarching purpose in life. It's time for us to search meanings and the purpose of exitence. Time to go. Way ahead.

Wide transformation makes us to grow independent of our parents. To search our purpose and meaning in life. To finally get on our own feet, It is a slow and long struggle, but we will make it.

2, narrow transformation is faster to compute because narrow transformation can be completed in one pass of data. Wide transformation is much harder because we need to find different purpose in life and search for different ways our selves. Narrow transformation is very bullish and aggressive because of the smoothness of lineage and background. Wide transformation is much more calm and resolute. It knows it has so many things it hasn't done. It also has to find its own way because it is not blessed with political or financial legacy. We have to fight our way through the jungle. We have to advance inspite of difficulties. We have to win because we are all in and there is no way to go back. We have risked all of it in. We know no way of backing out. Time to change the game. Time to change the direction. Old partition is comfy but we know that's not where we belong. We belong the faraway nation. We belong to the far side of the moon.

3, I could go away at any second. But I choose to fight it. I choose to fight to the bottom of it. I choose to test my limits and break through.

3, wide transformations (shuffles) are very expensive because they involve data movements and potential disk I/O (for shuffle files). They also limit parallelization because the processes of different partitions have to be coordinated instead of doing their own things during the shuffle.

4,the cost of wide dependencies are much higher than that of narrow dependencies because we have to redo the computations from scratch. While for narrow transformations, it's just redoing the failed operations for a particular partition instead of all partitions.

5, since one parent partition only corresponds to one child partition using coalesce(), coalesce() could be described as a narrow dependency.

6, increasing the number of partitions using repartition() is a wide dependency.

猜你喜欢

转载自blog.csdn.net/qq_25527791/article/details/89326139