As a producer of Kafka in sparkStream

Hello everyone:

  As the producer of Kafka in sparkStream, the data in a certain file is sent to Kafka.

package Traffic

import java.util.Properties
import kafka.producer.{KeyedMessage, Producer, ProducerConfig}
import org.apache.spark.{SparkConf, SparkContext}
import org.codehaus.jettison.json.JSONObject


/**
  * Created by Administrator on 2017/10/14.
  * 功能:SparkStream作为kafka的生产者,将制定文件数据打到kafka中
  *
  */
object KafkaEventProducer {
  def main(args: Array[String]): Unit = {


    //创建topic
    val topic="car_event"
    val brokers="192.168.17.108:9092"
    val props=new Properties()
    //把broker put进去
    props.put("metadata.broker.list",brokers)
    //把kafka编译器放进去
    props.put("serializer.class","kafka.serializer.StringEncoder")
   //配置kafka的config(配置)
    val kafkaconfig=new ProducerConfig(props)
    val producer=new Producer[String,String](kafkaconfig)
   //配置spark的config
    val conf=new SparkConf().setAppName("KafkaEventProducer").setMaster("local[2]")
    val sc=new SparkContext(conf)
    //从path中加载数据
//    val filePath="data/shuju.txt"
    val filePath="c://test//shuju.txt"
    //加载数据并进行切分
    val records=sc.textFile(filePath)
       .filter(!_.startsWith(";"))
          .map(_.split(",")).collect()
    //对数据进行预处理形成Json形式
    for(temp <-records)
      {
        val event=new JSONObject()
        //因为要put很多数据,这样看起来很规范
        event
          .put("camer_id",temp(0))    //相机编号
          .put("car_id",temp(2))      //车牌号
          .put("event_time",temp(4))  //时间
          .put("car_speed",temp(6))   //速度
          .put("car_speed",temp(13))  //车道编号
        //生产event信息 topic 是往哪个topic中生产数据 event.toString是生产的真正的内容
        producer.send(new KeyedMessage[String,String](topic,event.toString))
        println("Message Sent:  "+event)
        Thread.sleep(200) //休息200微秒
      }


  sc.stop()
  }
}

Description: Need three jars commons-pool2-2.2.jar, jedis-2.6.1.jar and json-lib-2.3-jdk15.jar

----Sample data is shown below: Only 5 rows of data

'310999003001', '3109990030010220140820141230292','00000000','','2017-08-20 14:09:35','0',255,'SN',  0.00,'4','','310999','310999003001','02','','','2','','','2017-08-20 14:12:30','2017-08-20 14:16:13',0,0,'2017-08-21 18:50:05','','',' '
'310999003102', '3109990031020220140820141230266','粤BT96V3','','2017-08-20 14:09:35','0',21,'NS',  0.00,'2','','310999','310999003102','02','','','2','','','2017-08-20 14:12:30','2017-08-20 14:16:13',0,0,'2017-08-21 18:50:05','','',' '
'310999000106', '3109990001060120140820141230316','沪F35253','','2017-08-20 14:09:35','0',57,'OR',  0.00,'2','','310999','310999000106','01','','','2','','','2017-08-20 14:12:30','2017-08-20 14:16:13',0,0,'2017-08-21 18:50:05','','',' '
'310999000205', '3109990002050220140820141230954','沪FN0708','','2017-08-20 14:09:35','0',33,'IR',  0.00,'2','','310999','310999000205','02','','','2','','','2017-08-20 14:12:30','2017-08-20 14:16:13',0,0,'2017-08-21 18:50:05','','',' '
'310999000205', '3109990002050120140820141230975','皖N94028','','2017-08-20 14:09:35','0',40,'IR',  0.00,'2','','310999','310999000205','01','','','2','','','2017-08-20 14:12:30','2017-08-20 14:16:13',0,0,'2017-08-21 18:50:05','','',' '

The definition of each field in the sample data is as follows:

sjkk_gcjl 所有字段名称解释:
    
    kkbh   卡口编号                     
    jlbh   记录编号                     
    hphm   号牌号码                     
    hpzl   号牌种类                     
    jgsj   经过时间                     
    xszt   行驶状态                     
    clsd   车辆速度                     
    cdfx   车道方向                     
    cwkc   车身长度                     
    hpys   号牌颜色                     
    cllx   车辆类型                     
    xzqh   行政区号                     
    sbbh   设备编号                     
    cdbh   车道编号                     
    csys   车身颜色                     
    clpp   车辆品牌                     
    tplx   图片类型                     
    tztp   特征图片                     
    qjtp   全景图片                     
    rksj   入库时间                     
    yzsj   //这个字段暂时没用           
    sjcz   时间差值                     
    ylzd1  预留字段1                    
    ylzd2  预留字段2                    
    ylzd3  预留字段3                    
    ylzd4  预留字段4                    
    ylzd5  预留字段5                    

Start kafka, and create the topic of car_event

[root@hadoop ~]# start-kafka.sh 
[root@hadoop ~]# kafka-topics.sh --create --zookeeper hadoop:2181 --topic car_event --partitions 1 --replication-factor 1
Created topic "car_event".

Consumers who start the topic of car_event, this step is only to verify the data, the screenshot is shown below:

Note: The mouse display is always waiting. This is because SparkSteam has not yet sent data to Kafka, there is no data in the topic, and the consumer is of course empty

Start the ss program in idea, the background screenshot is as follows:

Explanation: As you can see, the data is displayed. The running button is green because there are only 5 pieces of data in the file, all of which have been entered into Kafka. If the amount of data is large, the run button should be red (always run).

View the consumers of kafka, the screenshot is as follows:

Note: As you can see, kafka consumers have shown the data in the topic

Guess you like

Origin blog.csdn.net/zhaoxiangchong/article/details/78379927