spark RDD分区2GB限制(Size exceeds Integer.MAX_VALUE) - 代码天地

spark RDD分区2GB限制(Size exceeds Integer.MAX_VALUE)

其他 2018-07-31 06:36:17 阅读次数: 0

最近使用spark处理较大的数据文件，遇到了分区2G限制的问题，spark日志会报如下的日志：
WARN scheduler.TaskSetManager: Lost task 19.0 in stage 6.0 (TID 120, 10.111.32.47): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:123)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:132)
at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:517)
at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:432)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:618)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:146)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)

解决方法：
手动设置RDD的分区数量。当前使用的Spark默认RDD分区是18个，后来手动设置为500个，上面这个问题就迎刃而解了。可以在RDD加载后，使用RDD.repartition(numPart:Int)函数重新设置分区数量。
val data_new = data.repartition(500)

下面是一些相关的资料，有兴趣的读者可以进一步的阅读:

猜你喜欢

转载自blog.csdn.net/levy_cui/article/details/77574448

spark RDD分区2GB限制(Size exceeds Integer.MAX_VALUE)

RDD分区2GB限制

Variable binding depth exceeds max-specpdl-size

Integer.MIN_VALUE和Integer.MAX_VALUE

java int 的最大值 Integer.MAX_VALUE

Integer.MAX_VALUE是什么意思

Arrays.fill(dp, Integer.MAX_VALUE)

bitmap size exceeds VM budget

mysql备库报错：exceeds of slave_pending_jobs_size_max

原码、反码、补码（Integer.MIN_VALUE、Integer.MAX_VALUE）

Java中 Integer.MAX_VALUE和Integer.MIN_VALUE到底是多大

nesting exceeds `max-lisp-eval-depth'

The specified size exceeds the maximum representable size.

Size of a request header field exceeds server limit

win_size exceeds image extent

Java中INT_MIN和INT_MAX怎么表示(Integer.MAX_VALUE和Integer.MIN_VALUE)

为什么ArrayList的最大数组大小是Integer.MAX_VALUE - 8？

为什么ArrayList的最大数组大小是Integer.MAX_VALUE - 8

[Spark RDD_add_2] Spark RDD 分区补充内容

Web MVC编程：The length of the string exceeds the value set on the maxJsonLength propert

解决：The length of the query string for this exceeds the configuratiod maxQueryStringLength value

HibernateException: The length of the string value exceeds the length configured in the mapping/para

the request was rejected because its size (5263618) exceeds the configured maxim

java.lang.OutOfMemoryError: bitmap size exceeds VM budget

Struts中the request was rejected because its size (***) exceeds the configured ma

The field file exceeds its maximum permitted size of 1048576 bytes.

解决The total number of locks exceeds the lock table size错误

Mysql解决The total number of locks exceeds the lock table size错误

OutOfMemoryError系列（7）: Requested array size exceeds VM limit

Git remote: error: this exceeds file size limit of 100.0 MB

今日推荐

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

周排行

计算机组成与设计（七）—— 除法器

Integer Approximation(分治+枚举)

大话数据库索引

windows10系统JDK的配置及下载地址

mysql实现秒值转换中原六仔平台搭建

Codeforces Round #556 (Div. 1)

百练1064 网线主管

Codeforces 995F Cowmpany Cowmpensation

子集生成之增量构造法，位向量法，二进制法

ERROR: cmd.exe failed with args /c "/APK\gradle\rungradle.bat...

每日归档

更多

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)