【Spark100】Spark Streaming Checkpoint的一个坑

Spark Streaming UI这块是本篇额外的内容,与主题无关,只是把它记录下来

Spark Streaming UI上一组统计数字的含义

Streaming

  • Started at: 1433563238275(Spark Streaming开始运行的时间)
  • Time since start: 3 minutes 51 seconds(Spark Streaming已经运行了多长时间)
  • Network receivers: 2(Receiver个数)
  • Batch interval: 1 second(每个Batch的时间间隔,即接收多长时间的数据就生成一个Batch,或者说是RDD)
  • Processed batches: 231 (已经处理的Batch个数,不管Batch中是否有数据,都会计算在内,)
  • Waiting batches: 0 (等待处理的Batch数据,如果这个值很大,表明Spark的处理速度较数据接收的速度慢,需要增加计算能力或者降低接收速度)
  • Received records: 66 (已经接收到的数据,每读取一次,读取到的所有数据称为一个record)
  • Processed records: 66 (已经处理的record)

(Processed batches + Waiting batches) * Batch Interval = Time Since Start

 

 

Spark Streaming Checkpoint的一个坑

 源代码:

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}


object SparkStreamingCheckpointEnabledTest {
  def main(args: Array[String]) {

    val checkpointDirectory = "file:///d:/data/chk_streaming"
    def funcToCreateSSC(): StreamingContext = {
      val conf = new SparkConf().setAppName("NetCatWordCount")
      conf.setMaster("local[3]")
      val ssc = new StreamingContext(conf, Seconds(1))
      ssc.checkpoint(checkpointDirectory)
      ssc
    }
    val ssc = StreamingContext.getOrCreate(checkpointDirectory, funcToCreateSSC)
    val numStreams = 2
    val streams = (1 to numStreams).map(i => ssc.socketTextStream("localhost", 9999))
    val lines = ssc.union(streams)
    lines.print()
    ssc.start()
    ssc.awaitTermination()
  }
}

 以上代码是错误的,因为停掉Driver后再次重启,将无法启动,解决办法是将streams的操作放到funcToCreateSSC函数里,ssc返回前

object SparkStreamingCheckpointEnabledTest {
  def process(streams: Seq[DStream[String]], ssc: StreamingContext) {
    val lines = ssc.union(streams)
    lines.print
  }

  def main(args: Array[String]) {
    val checkpointDirectory = "file:///d:/data/chk_streaming"
    def funcToCreateSSC(): StreamingContext = {
      val conf = new SparkConf().setAppName("NetCatWordCount")
      conf.setMaster("local[3]")
      val ssc = new StreamingContext(conf, Seconds(1))
      ssc.checkpoint(checkpointDirectory)
      val numStreams = 2
      val streams = (1 to numStreams).map(i => ssc.socketTextStream("localhost", 9999))
      process(streams, ssc)
      ssc
    }
    val ssc = StreamingContext.getOrCreate(checkpointDirectory, funcToCreateSSC)
    ssc.start()
    ssc.awaitTermination()
  }
}

猜你喜欢

转载自bit1129.iteye.com/blog/2217505