2.1 IDEA programming
Pom.xml dependence by adding the following:
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>${spark.version}</version> <scope>provided</scope> </dependency>
Case are as follows:
import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} /** * Created by huicheng on 25/07/2019. * */ object WorldCount { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") val ssc = new StreamingContext(conf, Seconds(1)) // Create a DStream that will connect to hostname:port, like localhost:9999 val lines = ssc.socketTextStream("master01", 9999) // Split each line into words val words = lines.flatMap(_.split(" ")) //import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3 // Count each word in each batch val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) // Print the first ten elements of each RDD generated in this DStream to the console wordCounts.print() ssc.start() // Start the computation ssc.awaitTermination() // Wait for the computation to terminate } } }
Spark Core according packaged in a manner, and to upload the program Spark machine. And run:
bin/spark-submit --class com.c.streaming.WorldCount ~/wordcount-jar-with- dependencies.jar
Transmitting data Netcat:
# TERMINAL 1: # Running Netcat $ nc -lk 9999 hello world
If the program is running, log log too much, you can log level log4j file under the spark conf directory is changed to WARN