当我对rdd进行map操作的时候,就是添加一个字段,表示其中的两个字段是否相等,然后报这个错误。
Google后也没找到确切的解决方案,因为是用python编程,对java不熟悉,估计是java对象读写数据的时候发生的问题.
google的答案:
I can tell you that this usually means somewhere something wrote
objects to the same OutputStream with multiple ObjectOutputStreams. AC
is a header value.
I don't obviously see where/how that could happen, but maybe it rings
a bell for someone. This could happen if an OutputStream is reused
across object serializations but new ObjectOutputStreams are opened,
for example.
objects to the same OutputStream with multiple ObjectOutputStreams. AC
is a header value.
I don't obviously see where/how that could happen, but maybe it rings
a bell for someone. This could happen if an OutputStream is reused
across object serializations but new ObjectOutputStreams are opened,
for example.
既然是这样,就重启pyspark看看能否解决,果然,重启后就解决了。没再报那个错误了。
但是后来又报这个错误,重启不能解决,又百度:在一个文件都有一个文件的头部和文件体。由于对多次使用FileOutputStream构建的ObjectOutputStream对象向同一个文件读数据,在每次读数据的时候他都会向这个文件末尾先写入header再写入你要写的对象数据,在读取的时候遇到这个在文件体中的header就会报错。导致读出时,出现streamcorrput异常
然后我就想,可能是比较rdd中两个元素的时候,多次使用了FileOutputStream构建的ObjectOutputStream对象
如果哪位大神看到知道具体是什么原因引起的,可以留下你的答案!!!