1. Briefly describe the Spark operation process:
1. Build the running environment of Spark Application and start SparkContext
2. SparkContext applies to the resource manager (can be Standalone, Mesos, Yarm) to run Executor resources, and starts
StandaloneExecutorbackend
3. Executor applies to SparkContext Task
4. SparkContext distributes the application to Executor
5. SparkContext is built into a DAG graph, decomposes the DAG graph into Stage, sends Taskset to Task Scheduler, and finally Task Scheduler sends Task
to Executor to run
6. Task runs on Executor and releases all resources after running
2. Briefly describe the Spark partition:
Spark partitioning is divided into two stages:
On a distributed file system, files are stored in blocks. The file blocks are sent to tasks, and the files are partitioned. After shuffle grouping, the shuffle grouping is sent to the next task for calculation. By default, each core executes one task at a time, one task per partition, and one partition at a time.
3. Briefly describe sparkContext:
Each Spark application is a SparkContext instance, which can be understood as a SparkContext is the life cycle of a spark application. Once the SparkContext is created, you can use this SparkContext to create RDDs, accumulators, broadcast variables, and you can access Spark's through the SparkContext Services, running tasks. Spark context sets up internal services and establishes a connection with the spark execution environment.