Examples NetworkWordCount

org.apache.spark.storage.StorageLevel Import 
Import org.apache.spark.streaming. {Seconds The fast-, StreamingContext} 
Import org.apache.spark. {SparkConf, SparkContext} 

/ ** 
  * WordCount program, Spark Streaming consumer TCP Server sent me examples of real-time data: 
  * 
  * 1 to start a Netcat server on the master server 
  * `$ nc -lk 9998` (nc if the command is not the case, we can use to install yum install -y nc nc) 
  * 
  * 
  * / 
Object {LocalNetworkWordCount 
  DEF main (args: the Array [String]) { 

    // StreamingContext programming inlet 
    // local [2] Core two enabled, a thread for receiving data, processing data for a thread 
    // Seconds (1) every a second processing time 
    val ssc = new StreamingContext ( "local [2]", "LocalNetworkWordCount", seconds (1),
      System.getenv ( "SPARK_HOME"), StreamingContext.jarOfClass (this.getClass) .toSeq) 

    // data receiver (Receiver) 
    // create a sink (ReceiverInputDStream), the receiver receives a port on a machine data sent from the process through the socket and 
    Val = ssc.socketTextStream Lines ( "localhost", 9998, StorageLevel.MEMORY_AND_DISK_SER) 

    // data processing (process) 
    logic // treated, is simply carried COUNT Word 
    Val words = lines.flatMap (_.split ( "")) 
    Val wordPairs = words.map (X => (X,. 1)) 
    Val wordcounts = wordPairs.reduceByKey (_ + _) 

    // result output (the output) 
    // outputs the result to the control Taiwan 
    wordCounts.print () 

    // start streaming processing flow 
    ssc.start () 

    // wait streaming program ends 
    7 X 24 hours of operation //, has been waiting for will not stop 
    after // comment this line of code, run once it is terminated (must turn on)
    ssc.awaitTermination () 
  } 
}

  NetworkWordCount

org.apache.spark Import. {SparkConf, SparkContext} 
Import org.apache.spark.storage.StorageLevel 
Import org.apache.spark.streaming. {Seconds The fast-, StreamingContext} 

/ ** 
  * WordCount program, Spark Streaming consumer TCP Server sent me examples of real-time data: 
  * 
  * 1 to start a Netcat server on the master server 
  * `$ nc -lk 9998` (nc if the command is not the case, we can install install -y nc nc with yum) 
  * 
  * 2, with the following command will be run in Spark Streaming application clusters up 
   the Spark-the Submit --class com.twq.streaming.NetworkWordCount \ 
   --master the Spark: // Master: 7077 \ 
   the --deploy the MODE-Client \ 
   --driver-Memory 512M \ 
   --executor-Memory 512M \ 
   --total-Executor. 4-Cores \ 
   --executor-2 Cores \ 
   /home/hadoop-twq/spark-course/streaming/spark-streaming-basic-1.0-SNAPSHOT.jar 
  * /
{NetworkWordCount Object 
  DEF main (args: the Array [String]) { 
    Val sparkConf new new SparkConf = () setAppName ( "NetworkWordCount"). 
    Val SC = new new SparkContext (sparkConf) 

    // StreamingContext programming inlet 
    val ssc = new StreamingContext (sc, Seconds (1)) 

    // data receiver (receiver) 
    // create a sink (ReceiverInputDStream), the receiver receives data transmitted over a port through the socket on the machine and a process 
    // StorageLevel.MEMORY_AND_DISK_SER_2 this way stored in memory into the first memory is not enough memory on the disk, stored in bytes manner, two storage 
    Val = ssc.socketTextStream Lines ( "Master", 9998, StorageLevel.MEMORY_AND_DISK_SER_2)reduceByKey (_ + _) 
    // result output (Output)
 
    // data processing (process) 
    // logic processing, carried out it is a simple word count
    lines.flatMap words = Val (_. Split ( "")) 
    Val wordPairs = words.map (X => (X,. 1)) 
    Val wordcounts = wordPairs.reduceByKey (_ + _) 

    // outputs the result to the console 
    wordCounts.print () 

    // start streaming processing flow 
    ssc.start () 

    // wait streaming program termination 
    ssc.awaitTermination () 
  } 
}

  

org.apache.spark.storage.StorageLevel Import 
Import org.apache.spark.streaming. {Seconds The fast-, StreamingContext} 
Import org.apache.spark. {SparkConf, SparkContext} 

/ ** 
  * WordCount program, Spark Streaming consumer TCP Server sent me examples of real-time data: 
  * 
  * 1 to start a Netcat server on the master server 
  * `$ nc -lk 9998` (nc if the command is not the case, we can install install -y nc nc with yum) 
  * 
  * 2, with the following command will be run in Spark Streaming application clusters up 
 the Spark-the Submit --class com.twq.streaming.NetworkWordCountDetail \ 
   --master the Spark: // Master: 7077 \ 
   the --deploy the MODE-Client \ 
   --driver-Memory 512M \ 
   --executor-Memory 512M \ 
   --total-Executor. 4-Cores \ 
   --executor-2 Cores \
   /home/hadoop-twq/spark-course/streaming/spark-streaming-basic-1.0-SNAPSHOT.jar 
    Val SSC2 new new StreamingContext = ( sparkConf, Seconds (1)) // this code will start within a SparkContext 
    ssc.sparkContext // can be obtained from StreamingContext into SparkContext 
  * /
{NetworkWordCountDetail Object 
  DEF main (args: the Array [String]) { 
    . = Val sparkConf new new SparkConf () setAppName ( "NetworkWordCount") 
    Val SC = new new SparkContext (sparkConf) 

    // The context with the Create A SECOND BATCH size. 1 

    //. 1 , StreamingContext entrance Spark Streaming program, then what is the relationship StreamingContext and SparkContext is it? 
    //1.1,StreamingContext needs to hold a reference to SparkContext 
    Val ssc = new new StreamingContext (sc, Seconds The fast-(1)) 

    //1.2, if SparkContext does not start, then we can start a StreamingContext with the following code 
    //1.3 on StreamingContext call to stop, it might SparkContext stop off 
    // if do not want to stop off SparkContext, we can call 
    ssc.stop (false) 

    sc.stop () 

    // 2: StreamingContext Notes: 
    // 2.1, in the same time, the same StreamingContext a JVM in only one 
    // 2.2, if a StreamingContext start up, 
    // then we can not add any new Streaming StreamingContext for this calculation 
    // 2.3, if a StreamingContext is stop, then it can not again start the 
    // 2.4, can start a plurality StreamingContext SparkContext, 
    // StreamingContext is provided in front of the stop off, but not stop off SparkContext 

    // create a sink (ReceiverInputDStream), the receiver receives a machine data sent from a port through the socket and processing 
    Val Lines = ssc.socketTextStream ( "Master", 9998, StorageLevel.MEMORY_AND_DISK_SER) 

    // logic processing is carried out simply COUNT Word 
    Val words = lines.flatMap (_. split ( ""))
    wordcounts = Val (X => (X,. 1)). reduceByKey (_ + _) words.map 

    // outputs the result to the console 
    wordCounts.print () 
 
    // Start Streaming processing flow 
    ssc.start () 

    // Wait Streaming program terminates  
    ssc.awaitTermination ()
  } 
}

  

 

 

 

☛ DStream (Discretized Stream discrete flow) features
A dependent parent DStream list of (dependent on fault tolerance in favor)
RDD generating a time interval (Batch Interavl)
A generating function of RDD (RDD conversion to DSTREAM)
 

 

 

1, Spark Streaming cut into the input data stream Batches, and then stored in the memory Spark
2, generates Spark jobs (RDD conversion operations and Actions) each to process a batch

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11488249.html
Recommended