SparkStreaming simple example
◆ Construction of a Streaming program: (wordCount)
◆ Spark Streaming the best program to use Maven in the form of standalone applications or sbt translation of the run.
◆ Preparation:
1. Spark Streaming introduction of JAR
2.scala stream computing import declaration
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream .DStream
Import org.apache.spark.streaming.Duration
Import org.apache.spark.streaming.Seconds
1. Initialization StreamingContext objects
// Create a local StreamingContext two worker threads and batch interval of 1 second.
new new SparkConf the conf = Val ()
conf.setMaster ( "local [2]")
conf.setAppName ( "NetworkWordCount")
Val SSC = new new StreamingContext (the conf, Seconds The (. 1))
2. Get DStream objects
// create a connection to the host name DStream, like localhost: 9999
val lines = ssc.socketTextStream("localhost", 9999)
3. Operation DStream objects
// Each row of the received data is divided into words by a space
lines.flatMap words = Val (_. Split ( ""))
// StreamingContext introduced in the implicit conversion
import org.apache.spark.streaming.StreamingContext._
// for each batch convert word to sum
words.map pairs = Val (Word => (Word,. 1))
Val wordcounts = pairs.reduceByKey (_ + _)
// default print each batch before ten elements to the console
wordCounts.print ()
4. Start stream handler
ssc.start // counted
ssc.awaitTermination () // wait calculate termination
ssc.stop () // End application
Start network port, the analog data transmission
1. nc means of command, enter the data manually
Linux/Mac :nc
Windows:cat
nc -lk 9999
2 by means of the code written in a simulation data generator
Package com.briup.streaming Import java.io.PrintWriter Import the java.net.ServerSocket Import scala.io.Source Object MassageServer { // definition of random integers acquisition method DEF index (length: Int) = { Import java.util.Random RDM Val = new new the Random rdm.nextInt (length) } DEF main (args: the Array [String]) { the println ( "analog data start !!!" ) // Get the total number of lines specified file val filename = "Spark / ihaveadream.txt " ; Val Lines = = Source.fromFile (filename) .getLines.toList filerow Val lines.length // specifies a port monitor, connection is established when an external program requests Val = ServerSocket new new the ServerSocket (9999 ); the while ( to true ) { // listening port 9999, acquires the socket object Val socket = ServerSocket.accept () // the println (Socket) new new the Thread () { the override RUN DEF = { the println ( "Got from Connected Client:" + socket.getInetAddress) Val OUT = new new the PrintWriter (Socket.getOutputStream (), to true ) the while ( to true ) { the Thread.sleep ( 1000 ) // when the port accepts the request, acquiring a random data transmission line to each other Val Content = Lines (index (filerow)) the println (Content) out.write (Content + '\ n-' ) OUT .flush () } the Socket.close () } } .start () } } }
Precautions:
◆ all steps before 1. Start Spark Streaming did was create the flow of execution, the program is no real
connection to the data source, nor any data operation, just set up all of the implementation plan
◆ 2. When ssc.start ( ) after starting the program really make all anticipated operating
◆ 3. execution will be carried out in another thread, so it is necessary to wait for the call awaitTermination flow calculation is completed
◆ 4. a streaming context can only be activated once
◆ 5. If the mode is the local mode , be sure to set the local [n], n> = 2 1 th for receiving, for processing a
package com.briup.streaming import org.apache.log4j.{Level, Logger} import org.apache.spark.SparkConf import org.apache.spark.streaming.{Duration, StreamingContext} object MyTestOldAPI { def main(args: Array[String]): Unit = { //设置日志级别 Logger.getLogger("org").setLevel(Level.WARN) //1 获取DS val conf = new SparkConf().setAppName("MyTestOldAPI").setMaster("local[*]") val dss = new StreamingContext(conf, Duration(1000)) val ds Dss.socketTextStream = ( "localhost", 9999 ) // 2 logic // Statistical val res = ds.filter (_! = "") .FlatMap (_. Split ( "")). Map (word => ( Word,. 1)). reduceByKey (_ + _) res.print () // . 3 open real-time processing task dss.start () dss.awaitTermination () dss.stop () } }