Detailed explanation of the difference between Application, Job, Stage and Task in Spark

insert image description here

In Spark, there are key concepts such as Application, Job, Stage, and Task, which describe the different layers and components of a Spark job. Below I describe each concept in detail and illustrate the differences between them with examples.

1. Application

In Spark, an application (Application) is an independent and complete computing task, which is composed of user code on the driver (Driver). A Spark application usually includes data processing, transformation, and analysis, as well as calls to Spark APIs. An application can contain one or more jobs (Jobs), and each job represents a specific computing task in the application.

A Spark application usually consists of the following key components:

  1. SparkSession (or SparkContext) : It is the entry point for communicating with the Spark cluster, providing access to cluster resources, and the ability to create distributed data structures such as RDD, DataFrame, and Dataset

Guess you like

Origin blog.csdn.net/m0_47256162/article/details/132363267