Spark reads memory data to create RDD

insert image description here


1. Create RDD from memory data

Since Spark is written in the Scala language, sometimes there will be Scala-type collection data types in our projects, such as collections, tuples, arrays, etc. At this time, we can convert these collections into the data type RDD in Spark. Commonly used There are two ways, namely: makeRDD()and parallelize.

1、parallelize

In Spark, parallelize is a method of SparkContext, which is used to create a distributed RDD (Resilient Distributed Dataset) from an existing collection (such as an array or list). This method divides the elements in the collection into multiple partitions, and distributes the partitions on different nodes of the cluster for parallel processing.

In Spark, parallelizeis SparkContexta method for creating a distributed RDD (Resilie

Guess you like

Origin blog.csdn.net/m0_47256162/article/details/132282472