Streaming radio accumulator and variable and sql

Accumulators and broadcasting variables

Accumulator (Accumulators) variable and broadcast (Broadcast variables) can not be recovered from the checkpoint in Spark Streaming
Word count down logic intermediate, 7 * 2 uninterrupted operation 
if the accumulator can not get the data from the cache 10 from the downtime to an accumulated data can only be re-calculated
to create the time to use the singleton
If you enable the check and also the use of accumulator and broadcasting variables,
Then you must create a single instance of the accumulator and delayed broadcasts of variables in order to drive due to failure after the restart they can be re-instantiated.
Case:
com.bw.streaming.day03 Package 

Import org.apache.spark. SparkConf {,} SparkContext 
Import org.apache.spark.broadcast.Broadcast 
Import org.apache.spark.streaming. Seconds The {,} StreamingContext 
Import org.apache.spark .util.LongAccumulator 

// broadcast variables and totalizer to monitor sensitive words occurrences 
Object WordBlakList { 
  DEF main (args: Array [String]): Unit = { 

    Val conf = new new . SparkConf () setAppName (S " $ {the this } .getClass.getSimpleName " ) .setMaster ( " local [2] " ) 
    Val SSC = new new StreamingContext (the conf, Seconds The ( 2))
     // Get stream socket 
    Val = ssc.socketTextStream Stream ( " linux04 " , 9999 )
     // service processing 
    stream.foreachRDD (R & lt => {
       // Get sensitive words from the broadcast variable 

      Val words: Seq [String] = WordBlackListBC .getInstance (r.sparkContext) .Value
       // accumulator, calculating the word occurrence frequency and sensitive 
      Val accum: LongAccumulator = WordBlackListAccum.getInstance (r.sparkContext) 
      . R & lt the foreach (T => {
         IF (words.contains (T)) { 
          accum.add ( . 1 ) 
        } 
      })
      println("敏感词汇:"+accum.value)

    })
    ssc.start()
    ssc.awaitTermination()
  }
}

//object  是单例模式
object WordBlackListBC{
  @volatile  private var instance:Broadcast[Seq[String]]=null
  def getInstance(sc:SparkContext):Broadcast[Seq[String]]={
    if(instance==null){
      synchronized{
        if(instance==null){
          instance= sc.broadcast(Seq("a","c"))
        }
      }
    }
   instance
  }
}

//累加器
object WordBlackListAccum{
  @volatile private var instance: LongAccumulator = null
  def getInstance(sc: SparkContext): LongAccumulator = {
    if (instance == null) {
      synchronized {
        if (instance == null) {
          instance = sc.longAccumulator("WordsInBlacklistCounter")
        }
      }
    }
    instance
  }
}

 2.DataFream ans SQL Operations

You can easily use DataFrames and SQL data on the stream. You have to use
SparkContext to create StreamingContext use of SQLContext. In addition, this process can be
To restart after a drive failure. We do this by creating an instance of a single-instance SQLContext
This work. In the following example. We were on the precedent word count modified to use DataFrames
And SQL to generate word counts. Each RDD is converted to DataFrame, a temporary table configuration
And queries with SQL.
 

 

Guess you like

Origin www.cnblogs.com/JBLi/p/11367424.html