SparkStreaming simple example (oldAPI)

SparkStreaming simple example

◆ Construction of a Streaming program: (wordCount) 

  ◆ Spark Streaming the best program to use Maven in the form of standalone applications or sbt translation of the run.

  ◆ Preparation:
  1. Spark Streaming introduction of JAR
  2.scala stream computing import declaration
  import org.apache.spark.streaming.StreamingContext
  import org.apache.spark.streaming.StreamingContext._
  import org.apache.spark.streaming.dstream .DStream
  Import org.apache.spark.streaming.Duration
  Import org.apache.spark.streaming.Seconds

 

1. Initialization StreamingContext objects

   // Create a local StreamingContext two worker threads and batch interval of 1 second.
   new new SparkConf the conf = Val ()
   conf.setMaster ( "local [2]")
   conf.setAppName ( "NetworkWordCount")
   Val SSC = new new StreamingContext (the conf, Seconds The (. 1))

2. Get DStream objects 

  // create a connection to the host name DStream, like localhost: 9999

   val lines = ssc.socketTextStream("localhost", 9999)

3. Operation DStream objects

  // Each row of the received data is divided into words by a space 

  lines.flatMap words = Val (_. Split ( ""))
  // StreamingContext introduced in the implicit conversion
  import org.apache.spark.streaming.StreamingContext._

   // for each batch convert word to sum

  words.map pairs = Val (Word => (Word,. 1))
  Val wordcounts = pairs.reduceByKey (_ + _)
  // default print each batch before ten elements to the console
  wordCounts.print ()

4. Start stream handler

  ssc.start // counted

  ssc.awaitTermination () // wait calculate termination

  ssc.stop () // End application

 

Start network port, the analog data transmission

  1. nc means of command, enter the data manually

    Linux/Mac :nc

    Windows:cat

      nc -lk 9999

  2 by means of the code written in a simulation data generator  

Package com.briup.streaming 

Import java.io.PrintWriter
 Import the java.net.ServerSocket 

Import scala.io.Source 

Object MassageServer { 

  // definition of random integers acquisition method 
  DEF index (length: Int) = {
     Import java.util.Random 
    RDM Val = new new the Random 
    rdm.nextInt (length) 
  } 

  DEF main (args: the Array [String]) { 
    the println ( "analog data start !!!" )
     // Get the total number of lines specified file 
    val filename = "Spark / ihaveadream.txt " ; 
    Val Lines = = Source.fromFile (filename) .getLines.toList
    filerow Val lines.length 

    // specifies a port monitor, connection is established when an external program requests 
    Val = ServerSocket new new the ServerSocket (9999 ); 

    the while ( to true ) {
       // listening port 9999, acquires the socket object 
      Val socket = ServerSocket.accept ()
       //       the println (Socket) 
      new new the Thread () { 
        the override RUN DEF = { 
          the println ( "Got from Connected Client:" + socket.getInetAddress) 

          Val OUT = new new the PrintWriter (Socket.getOutputStream (), to true ) 

          the while ( to true ) { 
            the Thread.sleep ( 1000 )
             // when the port accepts the request, acquiring a random data transmission line to each other 
            Val Content = Lines (index (filerow)) 

            the println (Content) 

            out.write (Content + '\ n-' ) 

            OUT .flush () 
          } 
          the Socket.close () 
        } 
      } .start () 
    } 
  } 
}
Analog transmission data

 

 

Precautions:

◆ all steps before 1. Start Spark Streaming did was create the flow of execution, the program is no real
connection to the data source, nor any data operation, just set up all of the implementation plan
◆ 2. When ssc.start ( ) after starting the program really make all anticipated operating
◆ 3. execution will be carried out in another thread, so it is necessary to wait for the call awaitTermination flow calculation is completed
◆ 4. a streaming context can only be activated once
◆ 5. If the mode is the local mode , be sure to set the local [n], n> = 2 1 th for receiving, for processing a

 
package com.briup.streaming

import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Duration, StreamingContext}

object MyTestOldAPI {
  def main(args: Array[String]): Unit = {
    //设置日志级别
    Logger.getLogger("org").setLevel(Level.WARN)

    //1 获取DS
    val conf = new SparkConf().setAppName("MyTestOldAPI").setMaster("local[*]")
    val dss = new StreamingContext(conf, Duration(1000))
    val ds Dss.socketTextStream = ( "localhost", 9999 ) 

    // 2 logic   // Statistical 
    val res = ds.filter (_! = "") .FlatMap (_. Split ( "")). Map (word => ( Word,. 1)). reduceByKey (_ + _) 

    res.print () 

    // . 3 open real-time processing task 
    dss.start () 
    dss.awaitTermination () 
    dss.stop () 
  } 
}

 

Guess you like

Origin www.cnblogs.com/Diyo/p/11392059.html