assertion failed:concurrent update to the log .mutiple streaming jobs delete 4

版权声明:本文为博主九师兄(QQ群:spark源代码 198279782 欢迎来探讨技术)原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_21383435/article/details/89493393

1. 背景

Spark Structured Streaming 读取kafka,然后进行转换,最后写入到kafka中,中间运行的时候出现这个,但是清除checkpoin目录后,就可以正常使用。

错误代码:

assertion failed:concurrent update to the log .mutiple streaming jobs delete

2. 源码定位

spark 2.3版本

org.apache.spark.sql.execution.streaming.MicroBatchExecution#constructNextBatch

 updateStatusMessage("Writing offsets to log")
      reportTimeTaken("walCommit") {
        assert(offsetLog.add(
          currentBatchId,
          availableOffsets.toOffsetSeq(sources, offsetSeqMetadata)),
          s"Concurrent update to the log. Multiple streaming jobs detected for $currentBatchId")
        logInfo(s"Committed offsets for batch $currentBatchId. " +
          s"Metadata ${offsetSeqMetadata.toString}")

        // NOTE: The following code is correct because runStream() processes exactly one
        // batch at a time. If we add pipeline parallelism (multiple batches in flight at
        // the same time), this cleanup logic will need to change.

        // Now that we've updated the scheduler's persistent checkpoint, it is safe for the
        // sources to discard data from the previous batch.
        if (currentBatchId != 0) {
          val prevBatchOff = offsetLog.get(currentBatchId - 1)
          if (prevBatchOff.isDefined) {
            prevBatchOff.get.toStreamProgress(sources).foreach {
              case (src: Source, off) => src.commit(off)
              case (reader: MicroBatchReader, off) =>
                reader.commit(reader.deserializeOffset(off.json))
            }
          } else {
            throw new IllegalStateException(s"batch $currentBatchId doesn't exist")
          }
        }

3.问题解决

暂无

猜你喜欢

转载自blog.csdn.net/qq_21383435/article/details/89493393