Dstream Checkpoint's true colors

Dstream Checkpoint usage

object DstreamCheckpoint {
  def main(args: Array[String]): Unit = {
    val ssc = StreamingContext.getOrCreate("checkpoint_dir",functionToCreateContext)
    ssc.sparkContext.setLogLevel("ERROR")
    ssc.start()
    ssc.awaitTermination()
  }

  def functionToCreateContext(): StreamingContext = {
    println("functionToCreateContext invoke")
    val sparkConf = new SparkConf()
      .setMaster("local[*]")
      .setAppName("DstreamCheckpoint")
    val ssc = new StreamingContext(sparkConf,Durations.seconds(2))
    val kafkaParams = Map[String, Object](
      "bootstrap.servers" -> "s1:9092",
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "group.id" -> "group_test",
      "auto.offset.reset" -> "earliest",
      "enable.auto.commit" -> (false: java.lang.Boolean)
    )
    val topics = Array("test_mxb")
    val dstream = KafkaUtils.createDirectStream(ssc,PreferConsistent,Subscribe[String, String](topics, kafkaParams))
    dstream.map(record => (record.key, record.value,record.partition(),record.offset()))
      .foreachRDD(rdd => {
        ....
        })
      })
    ssc.checkpoint("checkpoint_dir")
    ssc
  }
}

The code above can be achieved offset back to the previous failure recovery and restart, but if you modify the code can not be rolled back.

 StreamingContext.getOrCreate("checkpoint_dir",functionToCreateContext) 是StreamingContext 的一个伴生对象的方法

Spark source code:

  1. Checkpoint_dir read from the Checkpoint objects, new StreamingContext, not contrary to read incoming creatingFunc call our function to create StreamingContext. When using Checkpoint objects to new StreamingContext, will trigger some way, and then to restore StreamingContext in SparkContext from Checkpoint objects, DStreamGraph object.
def getOrCreate(
    checkpointPath: String,
    creatingFunc: () => StreamingContext,
    hadoopConf: Configuration = SparkHadoopUtil.get.conf,
    createOnError: Boolean = false
  ): StreamingContext = {
  val checkpointOption = CheckpointReader.read(
    checkpointPath, new SparkConf(), hadoopConf, createOnError)
  checkpointOption.map(new StreamingContext(null, _, null)).getOrElse(creatingFunc())
}
  1. Recovery SparkContext, DStreamGraph from Checkpoint Object
private[streaming] val sc: SparkContext = {
  if (_sc != null) {
    _sc
  } else if (isCheckpointPresent) {
    SparkContext.getOrCreate(_cp.createSparkConf())
  } else {
    throw new SparkException("Cannot create StreamingContext without a SparkContext")
  }
}
private[streaming] val graph: DStreamGraph = {
  if (isCheckpointPresent) {
    _cp.graph.setContext(this)
    _cp.graph.restoreCheckpointData()
    _cp.graph
  } else {
    require(_batchDur != null, "Batch duration for StreamingContext cannot be null")
    val newGraph = new DStreamGraph()
    newGraph.setBatchDuration(_batchDur)
    newGraph
  }
}

Guess you like

Origin www.cnblogs.com/chouc/p/12341944.html