Article Directory
1. Accumulator
In Spark, an accumulator is a special variable used to accumulate the output of all tasks into a shared variable. Accumulators in Spark are particularly well-suited for "adding up" operations such as counting and summing.
The main characteristics and uses of accumulators are as follows:
-
Global : The accumulator is a variable shared by all tasks. Each task can add data to the accumulator, but only the driver can access the value of the accumulator.
-
Concurrency and efficiency : Spark ensures that each accumulator is updated only once per task, thus avoiding unnecessary communication overhead.
-
Fault tolerance : If a task fails, Spark will automatically recalculate and update the value of the accumulator.
-
Often used for debugging and monitoring : Accumulators can be used to easily monitor and debug the state of a Spark application.
Note that the update of the accumulator is done in action operations (e.g.col