2. Run Spark Streaming

2.1 IDEA programming 

      Pom.xml dependence by adding the following:

<dependency>
    <groupId>org.apache.spark</groupId> 
    <artifactId>spark-streaming_2.11</artifactId>
    <version>${spark.version}</version> 
    <scope>provided</scope>
</dependency>

      Case are as follows:

import org.apache.spark.SparkConf

import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Created by huicheng on 25/07/2019.
  * */

object WorldCount {
  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
    val ssc = new StreamingContext(conf, Seconds(1))

    // Create a DStream that will connect to hostname:port, like localhost:9999
    val lines = ssc.socketTextStream("master01", 9999)

    // Split each line into words
    val words = lines.flatMap(_.split(" "))

    //import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3
    // Count each word in each batch
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)

    // Print the first ten elements of each RDD generated in this DStream to the console
    wordCounts.print()

    ssc.start() // Start the computation
    ssc.awaitTermination() // Wait for the computation to terminate }
  }

}

      Spark Core according packaged in a manner, and to upload the program Spark machine. And run:

bin/spark-submit --class com.c.streaming.WorldCount ~/wordcount-jar-with- dependencies.jar

      Transmitting data Netcat:

# TERMINAL 1:
# Running Netcat

$ nc -lk 9999

hello world

      If the program is running, log log too much, you can log level log4j file under the spark conf directory is changed to WARN 

 

Guess you like

Origin www.cnblogs.com/zhanghuicheng/p/11227372.html