打印RDD中的内容到logs中【一篇就够】 - 代码天地

打印RDD中的内容到logs中【一篇就够】

其他 2020-10-28 00:36:14 阅读次数: 0

Printing elements of an RDD

Another common idiom is attempting to print out the elements of an RDD using rdd.foreach(println) or rdd.map(println). On a single machine, this will generate the expected output and print all the RDD’s elements. However, in cluster mode, the output to stdout being called by the executors is now writing to the executor’s stdout instead, not the one on the driver, so stdout on the driver won’t show these! To print all elements on the driver, one can use the collect() method to first bring the RDD to the driver node thus: rdd.collect().foreach(println). This can cause the driver to run out of memory, though, because collect() fetches the entire RDD to a single machine; if you only need to print a few elements of the RDD, a safer approach is to use the take(): rdd.take(100).foreach(println).

关于RDD的输出，以上叙述摘自官网：
意思就是：利用rdd.foreach(println) 或者 rdd.map(println)，在一台机器上时，会得到理想的输出，打印出所有的RDD的数值；

但在集群环境中，输出会被executors唤起，被写到executors的输出，而不是驱动所在的主机，所以在主机上不会显示打印信息，为了能够在主机上打印信息，要使用collect()函数首先把RDD放到主机节点上，rdd.collect().foreach(println),但因为collect()会将整个RDD的数据放到主机上，会使得驱动主机内存溢出。

如果你只想打印出有限个RDD数据，一个靠谱的方法就是用take(): rdd.take(100).foreach(println)

例：下面这样可以正常打印出rdd信息到一台机器上

rdd.collect().foreach(row => {
    println("row.length===:" + row.length)
    for ( i <- 0 to (row.length -1))
        println("===<" + i + ">===:" + row.get(i))
})

下面这样会存在看不到的情况

rdd.foreach(row => {
    println("row.length===:" + row.length)
    for ( i <- 0 to (row.length -1))
        println("===<" + i + ">===:" + row.get(i))
})

猜你喜欢

转载自blog.csdn.net/sjmz30071360/article/details/88787971

打印RDD中的内容到logs中【一篇就够】

Java中的Error和Exception【一篇就够】

理解Java中的多态机制，一篇就够啦

SringMVC从入门到源码，这一篇就够

webpack一篇就够！

Linux(CentOS)中Redis介绍、安装、使用【一篇就够】

Java 中的 override 和 overload 运算，看这一篇就够

Java中抽象类与接口，一篇就够啦

理解实时音视频聊天中的延时问题一篇就够

Docker | 深度学习中的docker看这一篇就够啦

实时音视频聊天中的延时问题一篇就够，低延时场景及优化

一篇文章带你使用 Python 将 txt 文档内容存储到 excel 表中

Lombok使用详解【一篇就够】

JS 装饰器，一篇就够

Hive安装（3.0.0）【一篇就够】

掌握JVM一篇就够

Android 混淆，一篇就够

关于HTTP协议，一篇就够

设计模式，一篇就够：

MyBatis详解一篇就够啦

线程池的设计，一篇就够

Zookeeper入门，一篇就够啦

typescript 入门，一篇就够

Java内部类【一篇就够】

PyautoGui 常用教程(一篇就够)

理解redis，一篇就够

Go Module一篇就够

Mybatis轻松入门，一篇就够

猿创征文｜Java开发工具，从环境到开发，一篇管够！

JetCache 使用简单案例到源码解析读这一篇就够

今日推荐

技术解析 GPT-4o：即时语音交互的突破与 GenAI 发展策略

开源大模型与闭源大模型

微信小程序授权登录获取用户的openid

亿级流量系统架构设计与实战

人工智能时代的程序设计教学与课程设计

纽交所技术问题致伯克希尔 (BRK.A) 显示跌近 100%

探索 api.maynor1024.live：一站式 AI 服务平台

AI一键去衣技术：窥见深度学习在图像处理领域的革命(最后有彩蛋)

艾体宝案例 | 使用Redis和Spring Ai构建rag应用程序

Apple M1 vs 高通8Gen2 vs Apple A12Z各方面比较

【升职加薪必备架构图】Springboot学习路线汇总_springboot四层架构流程图

与Apollo共创生态：Apollo7周年大会自动驾驶生态利剑出鞘

周排行

事务隔离级及脏读、幻读和不可重复读

rtos：zephyr同步信号量

把对象转换为JSON格式的数据

iOS Dev (56) iTunes Store 销售日报更新时间

Failed to start mongod.service: Unit not found;mongodb in unbuntu

Upgrading PHP on CentOS 6.5 (Final)

（四）王道机试指南___排版问题

TensorFlow之手写体识别

xcode xib报错 Safe Area Layout Guide Before IOS 9.0

【LeetCode】76. Minimum Window Substring（C++）

每日归档

更多

2024-06-05(0)

2024-06-04(10)

2024-06-03(52)

2024-06-02(4)

2024-06-01(60)

2024-05-31(47)

2024-05-30(4)

2024-05-29(65)

2024-05-28(2)

2024-05-27(56)