How to stop SparkStreaming gracefully

Spark has developed so far and has its own advantages for stream processing. This blog post will introduce you to the graceful stop of Spark stream processing, which is used to stop computing itself, rather than artificially violent intervention.

I wrote you a stream processing to do word frequency statistics, the results are written into mysql, and an example of graceful stopping. I hope it can help you. When you look at the code, you should pay attention to one thing. I am worried that everyone will not understand it. Write the code more broadly, after everyone understands it, you can simplify it when you write it yourself

object StreamGranceStop2 {
    
    
  def main(args: Array[String]): Unit = {
    
    
	
	//下面的计算主体和正常的没差别
    val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
    val ssc = new StreamingContext(conf, Seconds(5))

    ssc.sparkContext.setLogLevel("WARN")

    val dataDS: ReceiverInputDStream[String] = ssc.socketTextStream("192.168.182.147",9999)

    val wordDS: DStream[String] = dataDS.flatMap(_.split(" "))

    val tupleDS: DStream[(String, Int)] = wordDS.map((_,1))
    tupleDS.reduceByKey(_+_).foreachRDD(rdd=>{
    
    
      rdd.foreach(word=>{
    
    
        Class.forName("com.mysql.jdbc.Driver")
        //获取mysql连接
        val conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/test", "root", "")
        //把数据写入mysql
        try {
    
    
          var totalcount = word._2
          var sql = "";
          var querysql="select count from wordcount where word='"+word._1+"'"
          val queryresult: ResultSet = conn.prepareStatement(querysql).executeQuery()
          if(queryresult.next()){
    
    
            totalcount = queryresult.getString("count").toInt+word._2
            sql = "update wordcount set count='"+totalcount+"' where word='"+word._1+"'"
          }else{
    
    
            sql = "insert into wordcount (word,count) values ('" + word._1 + "','" + totalcount + "')"
          }

          conn.prepareStatement(sql).executeUpdate()
          println("保存结束--------------------------------------------------------------")
        } finally {
    
    
          conn.close()
        }
      })
    })


	//开始优雅停止
    val checkIntervalMillis = 10000 //等待的毫秒数
	var stopFlag:Boolean = false //准备一个布尔对象，后面做是否真正停止用
    var isStopped = false //流对象没有对数据流执行任务的标识，默认是false，标识任然在计算
	
    ssc.start() //和正常流计算一样要调用start方法
	
	//但是注意start之后不在是直接调用之前的awaitTermination
	
	//优雅停止是Driver判断的事所以这里写的while不会影响到Spark子节点的运算
	//整个while的结束条件就是之前准备的isStopped
    while (! isStopped) {
    
    
      
	  //优雅停止的方法，需要传入一个毫秒数，用来判断在传入的毫秒数内，executor中
	  //是否还有数据曾被计算，返回是(true)则表示没有了，注意这里思路一定要清晰
	  //再次强调这个方法是有数据曾被计算则返回false，否则为true
      isStopped = ssc.awaitTerminationOrTimeout(checkIntervalMillis)
	  
	  //这里做一个当前状态的提示输出
      if (isStopped)
        println("ssc 10秒内流对象没有进行过计算任务，预计可以退出程序！")
      else
        println("ssc正在运行. 不可退出...")
	  
	  //调用自定义的方法--用实际行动去判断是否真的可以停了
      checkShutdownMarker
	  
	  //对最终结果做处理，注意我任然用!isStopped是因为我没对isStopped的值做修改
      println("!isStopped && stopFlag"+(!isStopped && stopFlag))
      if ( !isStopped && stopFlag ) {
    
    
        println("以确认可以停止，现在开始停止ssc:")
        ssc.stop(true, true)//立马停止
        println("ssc 停止!!!!!!!")
      }
    }
	
	//因为这个例子是词频统计写入mysql，所以我们对mysql里面的数据做查询如果到达了我们的预期就可以停止
    def checkShutdownMarker:Unit={
    
    
      println("开始检查是否满足停止条件。。。。。")
      Class.forName("com.mysql.jdbc.Driver")
      //获取mysql连接
      val conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/test", "root", "")
      var querysql="select sum(count) sumcnt from wordcount"
      val queryresult: ResultSet = conn.prepareStatement(querysql).executeQuery()
	  //满足条件就正式停止
      if(queryresult.next()&&queryresult.getInt("sumcnt")>=10){
    
    
        stopFlag=true
      }
    }

  }
}

How to stop SparkStreaming gracefully

Guess you like