Performance: Receiver level

Create multiple receivers

 

 A plurality of other ports in the Executor start plurality of receiver, receiving a plurality of data ports, which improve the throughput performance. Code:

org.apache.spark.storage.StorageLevel Import 
Import org.apache.spark.streaming. {Seconds The fast-, StreamingContext} 
Import org.apache.spark. {HashPartitioner, SparkConf, SparkContext} 

/ ** 
  * WordCount program, Spark Streaming consumer TCP Server examples of real-time data sent me: 
  * 
  * 1 to start a Netcat server on the master server 
  * `$ nc -lk 9998` (nc if the command is not the case, we can use to install yum install -y nc nc) 
  * 
  * 2 in the run up with the following command in Spark Streaming application in the cluster 
  * the Spark-the Submit --class com.twq.wordcount.JavaNetworkWordCount \ 
  * --master the Spark: // Master: 7077 \ 
  * the --deploy the MODE-Client \ 
  --driver-Memory 512M * \ 
  * --executor-Memory 512M \ 
  * --total-Executor-Cores 4 \ 
  * --executor-2 Cores \
  /Home/hadoop-twq/spark-course/streaming/spark-streaming-basic-1.0-SNAPSHOT.jar * 
  * 
  * the Spark-shell --master the Spark: // Master: 7077 --total-Executor-Cores 4 - Cores 2-Executor 
  * / 
Object MultiReceiverNetworkWordCount { 
  DEF main (args: the Array [String]) { 
    . = Val sparkConf new new sparkConf () setAppName ( "NetworkWordCount") 
    Val SC = new new SparkContext (sparkConf) 

    // The context with the Create A. 1 BATCH size SECOND 
    Val SSC = new new StreamingContext (SC, Seconds the (. 5)) 

    // create a plurality of receivers (ReceiverInputDStream), the receiver receives data transmitted over a port through the socket on the machine and a process 
    val lines1 = ssc.socketTextStream ( "master", 9998, StorageLevel.MEMORY_AND_DISK_SER)

    ssc.socketTextStream lines2 = Val ( "Master", 9997, StorageLevel.MEMORY_AND_DISK_SER) 

    Val = lines1.union Lines (lines2) 
    ///// Val = lines1.union Lines (lines2) .union (lines3) 
    lines.repartition (100 ) 

    // logical processing is simply performed COUNT Word 
    Val = lines.repartition words (100) .flatMap (_. Split ( "")) 
    Val wordcounts = words.map (X => (X,. 1)). reduceByKey ((A: Int, B: Int) => A + B, new new HashPartitioner (10)) 

    // outputs the result to the console 
    wordCounts.print () 

    // start streaming processing flow 
    ssc.start () 

    // wait Streaming program termination 
    ssc.awaitTermination () 

    ssc.stop (to false) 
  } 
}

  

Receiver Number of data blocks

A data receiving one to store a block of a block mode in the memory, record how many blocks composed of a block:
batchInterval: time-triggered batch interval
blockInterval: The received data generation time interval Block: spark.streaming.blockInterval (default 200ms)
The number of partitions BlockRDD = batchInterval / blockInterval i.e., a Block is a partition of RDD, a task is
For example, the BatchInterval is 2 seconds, and blockInterval is 200ms, then the number of task 10
If the number of task is too small, even less than the number of executor of a core, then you can reduce blockInterval
blockInterval best not to less than 50ms, too small, resulting in too many task number, then launch task for a long time more

 Receiver accepts data rate

QPS -> queries per second
permits per second allows the amount of data received per second
Spark Streaming default PPS is no limit
Can be controlled by parameters spark.streaming.receiver.maxRate, the default is Long.Maxvalue

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11488361.html