12 integration Kafka and Spark Streaming

Earlier we use to monitor the port Spark Streaming data, then we will use Spark Streaming as kafka consumers.

1 systems, software and premise constraints

  • CentOS 7 64 workstations of the machine ip is 192.168.100.200, host name danji, the reader is set according to their actual situation
  • Completed spark access Hbase
    https://www.jianshu.com/p/6f7c89a62173
  • Has been installed Kafka used to live
    https://www.jianshu.com/p/1a7b9970d073
  • Idea 2018.1
  • Permission to remove the effects of the operation, all operations are carried out in order to root, Spark start, Hadoop start.

2 operation

  • 1 Create a project in Win10 sbt in the Idea
  • 2 wherein the modified build.sbt
name := "sbt-spark"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0"
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-8_2.11" % "2.1.0"
  • 3 in a new SparkStreamingAsKafkaConsumer.scala src / main / scala in
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka.KafkaUtils
object SparkStreamingAsKafkaConsumer{
  def main(args:Array[String]){
    val sc = new SparkConf().setAppName("KafkaWordCount").setMaster("local[2]")
    val ssc = new StreamingContext(sc,Seconds(10))
    ssc.checkpoint("file:///root/hadoop/checkpoint")
    val zkQuorum = "localhost:2181" //Zookeeper服务器地址
    val group = "1"  //topic所在的group
    val topics = "spark"  //topics的名称
    val numThreads = 1  //每个topic的分区数
    val topicMap =topics.split(",").map((_,numThreads.toInt)).toMap
    val lineMap = KafkaUtils.createStream(ssc,zkQuorum,group,topicMap)
    val lines = lineMap.map(_._2)
    val words = lines.flatMap(_.split(" "))
    val pair = words.map(x => (x,1))
    val wordCounts = pair.reduceByKeyAndWindow(_ + _,_ - _,Minutes(2),Seconds(10),2) //这行代码的含义在下一节的窗口转换操作中会有介绍
    wordCounts.print
    ssc.start
    ssc.awaitTermination
  }
}
  • 4 sbt use package, generating sbt-spark_2.11-0.1.jar, and upload it to the linux / root directory
  • 5 using the log Xshell linux, copy jar package
cd /root/spark-2.2.1-bin-hadoop2.7/jars
mkdir kafka
cd kafka
cp /root/kafka_2.11-2.2.1/libs/* .

Download jar package:
http://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8_2.11/2.1.0
this jar package uploaded to /root/spark-2.2.1- bin-hadoop2.7 / jars / kafka

  • 6 Start Kafka services and news producer
cd /root/kafka_2.11-2.2.1/bin
# 启动ZK服务
./zookeeper-server-start.sh ../config/zookeeper.properties &
# 启动Kafka服务
./kafka-server-start.sh ../config/server.properties
# 新打开一个xshell窗口,再次连接到linux
cd /root/kafka_2.11-2.2.1/bin
# 创建topic
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic spark & 
# 启动生产者
./kafka-console-producer.sh --broker-list localhost:9092 --topic spark 
  • 7 Spark submit jobs to
    open a new window Xshell, again connected to linux
cd /root/spark-2.2.1-bin-hadoop2.7/bin
./spark-submit --driver-class-path /root/spark-2.2.1-bin-hadoop2.7/jars/*:/root/spark-2.2.1-bin-hadoop2.7/jars/kafka/* --class SparkStreamingAsKafkaConsumer  /root/sbt-spark_2.11-0.1.jar
  • 8 test
    in Kafka's message producer window constantly input string, Spark View Submitted Tasks window, word frequency statistics every ten seconds will enter the string.
    These are the Spark Streaming as Kafka message consumer process.

Guess you like

Origin www.cnblogs.com/alichengxuyuan/p/12576823.html