tool import
Here we need to use a tool, netcat-win32 is used to simulate the port to send data
Baidu network disk download
link: https://pan.baidu.com/s/1iiet53Rki78GaEIkpp9bnA
Extraction code: hzug
After downloading and decompressing, there are the following files:
here Note that there are many people who unzip it without the two files nc.exe and nc64.exe. If you encounter this situation, delete the entire folder, turn off the anti-virus software, and unzip it again to solve the whole problem.
Configure environment variables
The netcat-win32 tool needs to configure system environment variables
- Open the system variable path editor
- Create a new one and copy the path we just decompressed
- Open win+R to check, enter nc -lp 9999
, the cursor is waiting, it means success
code writing
package com.niit.streaming
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{
Seconds, StreamingContext}
/**
* @Author YanTianCheng
* @Date 2023/3/27 10:52
* @Title: Spark_Streaming_WordCont
* @Package com.niit.streaming
*/
object Spark_Streaming_WordCont {
def main(args: Array[String]): Unit = {
//创建环境对象
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("spark")
//初始化SparkStreamingContext 采集周期3秒
val ssc = new StreamingContext(sparkConf,Seconds(3))
//获取端口数据
val lines = ssc.socketTextStream("localhost",9999)
//将每一行数据分析,形成一个个单词
val words = lines.flatMap(_.split(" "))
//将单词转换(映射)元组
val wordOne = words.map((_,1))
//统计单词
val wordCount = wordOne.reduceByKey(_+_)
//打印
wordCount.print()
//启动采集器SparkStreamingContext
ssc.start()
//等待采集器关闭
ssc.awaitTermination()
}
}
In this application, we first created a Spark Streaming Context and set a batch every 5 seconds. Then, we create a socket stream and connect to localhost:9999. Next, we divide each line of text into words and count each word as 1. Finally, we count the total number of words in each 5-second period and print it.
Once the app is started, you can start a TCP server on another terminal with the nc command (e.g. nc -lk 9999 ), send words to it, and see a live count of the words.
starting program
- First open the tool to open port 9999
- Start the Spark_Streaming_WordCont program
- port input data
- output statistical results
problem solved
- If the output box has a lot of things, we can change the collection cycle, but in actual development, the collection cycle is not allowed to be very long. It is
changed to 10 seconds per cycle. - Pay attention to start the tool first, and then start running the program