Article Directory
In Spark, there are key concepts such as Application, Job, Stage, and Task, which describe the different layers and components of a Spark job. Below I describe each concept in detail and illustrate the differences between them with examples.
1. Application
In Spark, an application (Application) is an independent and complete computing task, which is composed of user code on the driver (Driver). A Spark application usually includes data processing, transformation, and analysis, as well as calls to Spark APIs. An application can contain one or more jobs (Jobs), and each job represents a specific computing task in the application.
A Spark application usually consists of the following key components:
-
SparkSession (or SparkContext) : It is the entry point for communicating with the Spark cluster, providing access to cluster resources, and the ability to create distributed data structures such as RDD, DataFrame, and Dataset