SparkStreaming real-time word statistics WordCont, netcat-win32 tool use

tool import

Here we need to use a tool, netcat-win32 is used to simulate the port to send data
Baidu network disk download
link: https://pan.baidu.com/s/1iiet53Rki78GaEIkpp9bnA
Extraction code: hzug
After downloading and decompressing, there are the following files:
insert image description here
here Note that there are many people who unzip it without the two files nc.exe and nc64.exe. If you encounter this situation, delete the entire folder, turn off the anti-virus software, and unzip it again to solve the whole problem.

Configure environment variables

The netcat-win32 tool needs to configure system environment variables

  1. Open the system variable path editor
    insert image description here
  2. Create a new one and copy the path we just decompressed
    insert image description here
  3. Open win+R to check, enter nc -lp 9999
    insert image description here
    , the cursor is waiting, it means success

code writing

package com.niit.streaming

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{
    
    Seconds, StreamingContext}


/**
 * @Author YanTianCheng
 * @Date 2023/3/27 10:52
 * @Title: Spark_Streaming_WordCont
 * @Package com.niit.streaming
 */
object Spark_Streaming_WordCont {
    
    
  def main(args: Array[String]): Unit = {
    
    
    //创建环境对象
    val sparkConf = new SparkConf().setMaster("local[*]").setAppName("spark")
    //初始化SparkStreamingContext            采集周期3秒
    val ssc = new StreamingContext(sparkConf,Seconds(3))

    //获取端口数据
    val lines = ssc.socketTextStream("localhost",9999)
    //将每一行数据分析,形成一个个单词
    val words = lines.flatMap(_.split(" "))
    //将单词转换(映射)元组
    val wordOne = words.map((_,1))
    //统计单词
    val wordCount = wordOne.reduceByKey(_+_)
    //打印
    wordCount.print()


    //启动采集器SparkStreamingContext
    ssc.start()
    //等待采集器关闭
    ssc.awaitTermination()

  }

}

In this application, we first created a Spark Streaming Context and set a batch every 5 seconds. Then, we create a socket stream and connect to localhost:9999. Next, we divide each line of text into words and count each word as 1. Finally, we count the total number of words in each 5-second period and print it.

Once the app is started, you can start a TCP server on another terminal with the nc command (e.g. nc -lk 9999 ), send words to it, and see a live count of the words.

starting program

  1. First open the tool to open port 9999
    insert image description here
  2. Start the Spark_Streaming_WordCont program
    insert image description here
  3. port input data
    insert image description here
  4. output statistical results
    insert image description here

problem solved

  1. If the output box has a lot of things, we can change the collection cycle, but in actual development, the collection cycle is not allowed to be very long. It is
    insert image description here
    changed to 10 seconds per cycle.
  2. Pay attention to start the tool first, and then start running the program

Guess you like

Origin blog.csdn.net/m0_58353740/article/details/130101573