11. spark源代码分析（基于yarn cluster模式）- 聊聊Stage和Task - Code World

11. spark源代码分析（基于yarn cluster模式）- 聊聊Stage和Task

Others 2021-11-18 14:03:03 views: null

通过前面的分析，我们了解到，在Spark中只存在两种Stage：

ResultStage
ShuffleMapStage

Stage之前划分的条件是遍历当前RDD和父RDD的依赖列表，如果遇到了ShuffleDependency则进行Stage的划分，Spark中最后一个Stage永远都是ResultStage，其他的都是ShuffleMapStage。
每个ShuffleMapStage都对应一个ShuffleMapTask
而每个ResultStage都对应一个ResultTask
我们经常说一个Stage对应了多个Task，这里在源代码层面来说，Master节点都是根据待计算的RDD的分区数量来生成Task，一般一个分区对应一个Task，么个task会包含需要执行的节点信息，然后Master节点会将这些Task发往对应的Executor节点去执行Task任务。
在谈谈Spark中的依赖，从大的层面来说，Spark中只有两种依赖：

NarrowDependency 这就是我们常说的窄依赖
ShuffleDependency 这就是我们常说的宽依赖

其他的：

OneToOneDependency 继承自NarrowDependency
PruneDependency 继承自NarrowDependency
RangeDependency 继承自NarrowDependency

这里只有遇到ShuffleDependency类型才会进行Stage划分，也就会发生我们常说的MapReduce操作。

这里的：

OneToOneDependency就是父RDD和子RDD的分区一一对应
RangeDependency多个父RDD和子RDD的分区一一对应
PruneDependency父RDD的某几个分区和子RDD一一对应

Guess you like

Origin blog.csdn.net/LeoHan163/article/details/121100686

11. spark源代码分析（基于yarn cluster模式）- 聊聊Stage和Task

11. spark源代码分析（基于yarn cluster模式）- 聊聊Stage和Task

11. spark源代码分析（基于yarn cluster模式）- 聊聊Stage和Task

10. spark源代码分析（基于yarn cluster模式）- 聊聊RDD和Depedency

10. spark源代码分析（基于yarn cluster模式）- 聊聊RDD和Depedency

10. spark源代码分析（基于yarn cluster模式）- 聊聊RDD和Depedency

11.外观模式

2. spark源码分析（基于yarn cluster模式）-YARN client启动，提交ApplicationMaster

3. spark源码分析（基于yarn cluster模式）-YARN ApplicationMaster启动

Redis - 11. Cluster (Cluster)

1. spark源码分析（基于yarn cluster模式）-任务提交

11. to build a complete cluster K8S

[Getting Started with Stateflow] Task 11. Chart Hierarchy

Spark's stage task division

Spark的Yarn模式及其案例

11.利用Socket发送和接收图片，音频，视频等数据文件

C# 11. 数据表格DataGridView和DataTable数据表绑定

11. Closure

11. database operations

11. enumerated type

11. Model Loading

11. File Upload

11. Year

11. The two arrays

11. jenkins backup

11. The template engine

11. Interrupt

11. InputFormat in MapReduce

11. Deadlock

11. Function three

Recommended

LFOSSA Yuanlaisusu Open Course | Mastering the Cloud Native Future: Comprehensive Guide to CNCF Certification and Exam Preparation Tips

Ranking

C++ Basic Syntax

bootstrapTable hides a column based on a condition

Why is reentrant lock recommended instead of Synchronized when dynamic high concurrency?

hexo create a blog

[Fully open source and non-encrypted version] Imitation of the eighth district distribution/online signature/multiple sets of download templates/APP distribution hosting/APP packaging and packaging

Polymerization combination

https://www.flysnow.org/2017/05/06/go-in-action-go-log.html

From the perspective of Flutter and the front-end, talk about how to ensure UI fluency under the single-threaded model

nginx-301, 302 redirect

Geolocation by IP Address in ASP.NET

Daily

More

2024-04-26(22)

2024-04-25(32)

2024-04-24(30)

2024-04-23(30)

2024-04-22(5)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(31)