Accumulators and broadcasting variables
Accumulator (Accumulators) variable and broadcast (Broadcast variables) can not be recovered from the checkpoint in Spark Streaming
Word count down logic intermediate, 7 * 2 uninterrupted operation
if the accumulator can not get the data from the cache 10 from the downtime to an accumulated data can only be re-calculated
to create the time to use the singleton
If you enable the check and also the use of accumulator and broadcasting variables,
Then you must create a single instance of the accumulator and delayed broadcasts of variables in order to drive due to failure after the restart they can be re-instantiated.
Case:
com.bw.streaming.day03 Package Import org.apache.spark. SparkConf {,} SparkContext Import org.apache.spark.broadcast.Broadcast Import org.apache.spark.streaming. Seconds The {,} StreamingContext Import org.apache.spark .util.LongAccumulator // broadcast variables and totalizer to monitor sensitive words occurrences Object WordBlakList { DEF main (args: Array [String]): Unit = { Val conf = new new . SparkConf () setAppName (S " $ {the this } .getClass.getSimpleName " ) .setMaster ( " local [2] " ) Val SSC = new new StreamingContext (the conf, Seconds The ( 2)) // Get stream socket Val = ssc.socketTextStream Stream ( " linux04 " , 9999 ) // service processing stream.foreachRDD (R & lt => { // Get sensitive words from the broadcast variable Val words: Seq [String] = WordBlackListBC .getInstance (r.sparkContext) .Value // accumulator, calculating the word occurrence frequency and sensitive Val accum: LongAccumulator = WordBlackListAccum.getInstance (r.sparkContext) . R & lt the foreach (T => { IF (words.contains (T)) { accum.add ( . 1 ) } }) println("敏感词汇:"+accum.value) }) ssc.start() ssc.awaitTermination() } } //object 是单例模式 object WordBlackListBC{ @volatile private var instance:Broadcast[Seq[String]]=null def getInstance(sc:SparkContext):Broadcast[Seq[String]]={ if(instance==null){ synchronized{ if(instance==null){ instance= sc.broadcast(Seq("a","c")) } } } instance } } //累加器 object WordBlackListAccum{ @volatile private var instance: LongAccumulator = null def getInstance(sc: SparkContext): LongAccumulator = { if (instance == null) { synchronized { if (instance == null) { instance = sc.longAccumulator("WordsInBlacklistCounter") } } } instance } }
2.DataFream ans SQL Operations
You can easily use DataFrames and SQL data on the stream. You have to use
SparkContext to create StreamingContext use of SQLContext. In addition, this process can be
To restart after a drive failure. We do this by creating an instance of a single-instance SQLContext
This work. In the following example. We were on the precedent word count modified to use DataFrames
And SQL to generate word counts. Each RDD is converted to DataFrame, a temporary table configuration
And queries with SQL.