【SparkAPI JAVA版】JavaPairRDD——collect(七)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/sdut406/article/details/88663308

JavaPairRDD的collect方法讲解

官方文档说明
/**
   * Return an array that contains all of the elements in this RDD.
   *
   * @note this method should only be used if the resulting array is expected to be small, as
   * all the data is loaded into the driver's memory.
   */
中文含义

返回包含此RDD中所有元素的数组。
注意:只有当结果数组很小时才应使用此方法,因为所有的数据都被载入节点的内存中。

方法原型
//scala
/**
 * Return an array that contains all of the elements in this RDD.
 */
def collect(): List[(K, V)]
//java
public static java.util.List<T> collect()
实例
public class Collect {

    public static void main(String[] args) {
        System.setProperty("hadoop.home.dir", "E:\\hadoop-2.7.1");
        SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("Spark_DEMO");

        JavaSparkContext sc = new JavaSparkContext(sparkConf);

        JavaPairRDD<String,String> javaPairRDD1 = sc.parallelizePairs(Lists.newArrayList(new Tuple2<String, String>("1","abc11"),
                new Tuple2<String, String>("2","abc22"),new Tuple2<String, String>("3","33333")));
        // 返回一个列表
        List<Tuple2<String,String>> list =  javaPairRDD1.collect();
        // 遍历列表
        for (Tuple2<String, String> stringStringTuple2 : list) {
            System.out.println(stringStringTuple2);
        }

    }
}
结果
19/03/19 15:19:38 INFO DAGScheduler: Job 0 finished: collect at Collect.java:22, took 0.834232 s
19/03/19 15:19:38 INFO SparkContext: Invoking stop() from shutdown hook
(1,abc11)
(2,abc22)
(3,33333)
19/03/19 15:19:38 INFO SparkUI: Stopped Spark web UI at http://10.124.209.6:4040
19/03/19 15:19:38 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.124.209.6:57881 in memory (size: 897.0 B, free: 357.6 MB)
19/03/19 15:19:38 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/03/19 15:19:38 INFO MemoryStore: MemoryStore cleared
19/03/19 15:19:38 INFO BlockManager: BlockManager stopped
19/03/19 15:19:38 INFO BlockManagerMaster: BlockManagerMaster stopped
19/03/19 15:19:38 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/03/19 15:19:38 INFO SparkContext: Successfully stopped SparkContext
19/03/19 15:19:38 INFO ShutdownHookManager: Shutdown hook called
19/03/19 15:19:38 INFO ShutdownHookManager: Deleting directory C:\Users\Administrator\AppData\Local\Temp\spark-5762ad13-6044-421b-96f4-08fa3685b17f
注意

数据量太大的情况下,不要用collect,会造成内存溢出

猜你喜欢

转载自blog.csdn.net/sdut406/article/details/88663308