Fault Tolerance (Fault-tolerance)

Spark Streaming include fault tolerance, fault-tolerant three places:
1, Executor failed fault-tolerant: Executor of failure will restart a new Executor, this is the Spark's own characteristics. If Executor Receiver resides fails, then Spark Streaming will start the Receiver (this may exist data that has been received on Executor backup) on another Executor
2, Driver failed FT: If Driver fails, then the entire Spark Streaming applications will all hang. So Driver-side fault tolerance is very important, we can first configure checkpoint Driver terminals for regular saving Driver-side state; then we can be configured to automatically restart mechanism Driver-side failure (each cluster management configuration are not the same); Finally, we need to open the end of WAL mechanism Executor
3, a Task failed FT: Spark of a Task failed to run again, this Task where the Stage if it fails, we are also re-run the Stage father Stage according to RDD dependency, and then re-run the failed Stage in the process of real-time computing, and certainly can not tolerate a Task run time is too long, Spark Streaming for too long a running time of this Task Task kill will re-run on another more adequate resources Executor. The presumed mechanism is the use of the Task scheduling Spark.

Executor failure Fault Tolerance

 

 

 

 Driver failed fault-tolerant

 

 

 

 

checkpoint mechanisms: Driver-side information regularly writes in HDFS
1, configuration (configuration information)
2, DStream defined operation
3, unfinished batches of information
 
1, automatic restart Driver Program
standalone, yarn and mesos support
 
2, set up the checkpoint directory hdfs
streamingContext.setCheckpoint(hdfsDirectory)
3, using the correct API in the driver side to achieve fault-tolerant Driver, you need to write code

 

org.apache.spark.storage.StorageLevel Import 
Import org.apache.spark.streaming. {Seconds The fast-, StreamingContext} 
Import org.apache.spark. {SparkConf, SparkContext} 

/ ** 
  * WordCount program, Spark Streaming consumer TCP Server sent me examples of real-time data: 
  * 
  * 1 to start a Netcat server on the master server 
  * `$ nc -lk 9998` (nc if the command is not the case, we can install install -y nc nc with yum) 
  * 
  * 2, with the following command in the run up in Spark Streaming application in the cluster 
  * the Spark-the Submit --class com.twq.wordcount.JavaNetworkWordCount \ 
  * --master the Spark: // Master: 7077 \ 
  * the --deploy the MODE-cluster \ 
  * - Memory 512M--driver \ 
  * Memory --executor-512M \ 
  * --total-Executor. 4-Cores \ 
  * --executor-2 Cores \
  /Home/hadoop-twq/spark-course/streaming/spark-streaming-basic-1.0-SNAPSHOT.jar * 
  * / 
Object NetworkWordCount { 
  DEF main (args: the Array [String]) { 

    Val checkpointDirectory = "HDFS: // Master : 9999 / User / Hadoop-twq / Spark-Course / Streaming / chechpoint " 

    DEF functionToCreateContext (): = {StreamingContext 
      Val sparkConf new new sparkConf = () 
      .setAppName (" NetworkWordCount ") 
      Val SC = new new SparkContext (sparkConf) 

      // the Create a SECOND with context. 1 the size BATCH 
      Val SSC = new new StreamingContext (SC, Seconds the (. 1))     

      data // create a sink (ReceiverInputDStream), the receiver receives a transmission over the port by a socket on the machine and deal with
      val lines = ssc.socketTextStream ( "master" , 9998, StorageLevel.MEMORY_AND_DISK_SER_2) // increase the high availability of data blocks, two backup, but will take up some memory 

      logic // processing is simply performed COUNT Word 
      Val words lines.flatMap = (_. Split ( "")) 
      Val wordcounts = words.map (X => (X,. 1)). reduceByKey (_ + _) 

      // outputs the result to the console 
      wordCounts.print () 
      SSC .checkpoint (checkpointDirectory) 
      SSC 
    } 
    // Code 
    Val SSC = StreamingContext.getOrCreate (checkpointDirectory, functionToCreateContext _) 

    // start streaming processing flow 
    ssc.start () 

    // wait streaming program termination 
    ssc.awaitTermination () 
  } 
}

  

Set automatic restart Driver program
standalone :
Increase in spark-submit the following two parameters:
--deploy-mode cluster
--supervise
 
 
yarn :
Increase a spark-submit the following parameters:
--deploy-mode cluster
Disposed in the yarn yarn.resourcemanager.am.max-attemps configuration
 
 
months :
Marathon can restart the application Mesos

 

The received data loss fault tolerance

 

 

checkpoint mechanisms: regularly DStream DAG information Driver-side written in HDFS (write memory and writing to disk at the same time)

 

 

WAL use to restore the configuration data
1. Set the checkpoint directory hdfs
streamingContext.setCheckpoint(hdfsDirectory)
2, open configuration WAL
sparkConf.set(“spark.streaming.receiver.writeAheadLog.enable”, “true”)
3, Receiver should be reliable in
When the data is finished WAL, only data source has told the consumer
For did not tell the data source data can be re-consumption data from the data source
4, cancel in-memory data backup
StorageLevel.MEMORY_AND_DISK_SER use to store data source, has been written to disk, no need to backup the other executor memory, thereby saving space

 

 

 

Whether the received data is backed up to other Executor or saved to HDFS, will send a receipt to the data source, assuming no receipt is sent to re-consumption data did not send a receipt, and then to ensure that data is not lost, eg: Kafka
Reliable Receiver :
After receiving the data, and has backup storage, and then send the receipt to the data source
Unreliable Receiver :
It does not send an acknowledgment to the data source

When a task is very slow fault-tolerant

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11488299.html