Article Directory
1. Broadcast variables
In Spark, when we run a task, each copy of the task gets a copy of all the variables used by that task. This means that if a large data set is used in multiple Spark tasks, then multiple copies of the data set will be sent to each node in the cluster, which may result in a large network transfer.
To solve this problem, Spark introduces Broadcast Variables. Broadcast variables are used to efficiently broadcast a large read-only value to all worker nodes to reduce data transmission overhead.
The following are the basic characteristics and usage of broadcast variables:
-
Read-only features:
- A broadcast variable is a