Article directory
1. Create RDD from memory data
Since Spark is written in the Scala language, sometimes there will be Scala-type collection data types in our projects, such as collections, tuples, arrays, etc. At this time, we can convert these collections into the data type RDD in Spark. Commonly used There are two ways, namely: makeRDD()
and parallelize
.
1、parallelize
In Spark, parallelize is a method of SparkContext, which is used to create a distributed RDD (Resilient Distributed Dataset) from an existing collection (such as an array or list). This method divides the elements in the collection into multiple partitions, and distributes the partitions on different nodes of the cluster for parallel processing.
In Spark, parallelize
is SparkContext
a method for creating a distributed RDD (Resilie