Spark Streaming UI这块是本篇额外的内容,与主题无关,只是把它记录下来
Spark Streaming UI上一组统计数字的含义
Streaming
- Started at: 1433563238275(Spark Streaming开始运行的时间)
- Time since start: 3 minutes 51 seconds(Spark Streaming已经运行了多长时间)
- Network receivers: 2(Receiver个数)
- Batch interval: 1 second(每个Batch的时间间隔,即接收多长时间的数据就生成一个Batch,或者说是RDD)
- Processed batches: 231 (已经处理的Batch个数,不管Batch中是否有数据,都会计算在内,)
- Waiting batches: 0 (等待处理的Batch数据,如果这个值很大,表明Spark的处理速度较数据接收的速度慢,需要增加计算能力或者降低接收速度)
- Received records: 66 (已经接收到的数据,每读取一次,读取到的所有数据称为一个record)
- Processed records: 66 (已经处理的record)
(Processed batches + Waiting batches) * Batch Interval = Time Since Start
Spark Streaming Checkpoint的一个坑
源代码:
import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} object SparkStreamingCheckpointEnabledTest { def main(args: Array[String]) { val checkpointDirectory = "file:///d:/data/chk_streaming" def funcToCreateSSC(): StreamingContext = { val conf = new SparkConf().setAppName("NetCatWordCount") conf.setMaster("local[3]") val ssc = new StreamingContext(conf, Seconds(1)) ssc.checkpoint(checkpointDirectory) ssc } val ssc = StreamingContext.getOrCreate(checkpointDirectory, funcToCreateSSC) val numStreams = 2 val streams = (1 to numStreams).map(i => ssc.socketTextStream("localhost", 9999)) val lines = ssc.union(streams) lines.print() ssc.start() ssc.awaitTermination() } }
以上代码是错误的,因为停掉Driver后再次重启,将无法启动,解决办法是将streams的操作放到funcToCreateSSC函数里,ssc返回前
object SparkStreamingCheckpointEnabledTest { def process(streams: Seq[DStream[String]], ssc: StreamingContext) { val lines = ssc.union(streams) lines.print } def main(args: Array[String]) { val checkpointDirectory = "file:///d:/data/chk_streaming" def funcToCreateSSC(): StreamingContext = { val conf = new SparkConf().setAppName("NetCatWordCount") conf.setMaster("local[3]") val ssc = new StreamingContext(conf, Seconds(1)) ssc.checkpoint(checkpointDirectory) val numStreams = 2 val streams = (1 to numStreams).map(i => ssc.socketTextStream("localhost", 9999)) process(streams, ssc) ssc } val ssc = StreamingContext.getOrCreate(checkpointDirectory, funcToCreateSSC) ssc.start() ssc.awaitTermination() } }