Get data from kafka and write it to redis

everyone:

it is good! To get data from kafka and write it to redis, you need to use the redis client configuration in spark, please refer to the previous blog ( https://blog.csdn.net/zhaoxiangchong/article/details/78379883 ).

The first step is to enter the data into kafka, please refer to my previous blog https://blog.csdn.net/zhaoxiangchong/article/details/78379927

Note: Pay special attention to the name of topic in kafka, these two must be the same.

The second step is to deploy the code of kafka into redis in idea, as shown below:

package Traffic


import java.text.SimpleDateFormat
import java.util.Calendar
import kafka.serializer.{StringDecoder, StringEncoder}
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
import net.sf.json.JSONObject


/**
  * Created by Administrator on 2017/10/14.
  * 功能： 从kafka中获取数据写入到redis中
  *
  */
object CarEventAnalysis {
  def main(args: Array[String]): Unit = {
   //配置SparkStrteaming
    val conf=new SparkConf().setAppName("CarEventAnalysis").setMaster("local[2]")
    val sc=new SparkContext(conf)
    val ssc=new StreamingContext(sc,Seconds(5))
    val dbindex=1 //指定是用哪个数据库进行连接
    //从kafka中读取数据(用直连的方法)
    val topics=Set("car_event")
    // 只要和brokers相关的都要写全
    val brokers="192.168.17.108:9092"
    //配置kafka参数
    val kafkaParams=Map[String,String](
      "metadata.broker.list"->brokers,
    "serializer.class"->"kafka.serializer.StringEncoder"
    )
    //创建一个流  这是一个模板代码  参数中的两个String代表的是kafka的键值对的数据，及key和value
    val kafkaStream=KafkaUtils.createDirectStream[String,String,
                    StringDecoder,StringDecoder](ssc,kafkaParams,topics)
    //从kafka中将数据读出
    val events=kafkaStream.flatMap(line=>{
      //转换为object
      val data=JSONObject.fromObject(line._2) // ._2是真正的数据
//      println(data)
      //必须用Some修饰data option有两个子类 none 代表无值  some代表有值
      // 加上some表示一定有值，后面有x.getString和x.getInt,保证程序能知道有值
      Some(data)
    })
    //从kafka中取出卡口编号和速度数据
    val carspeed=events.map(x=>(x.getString("camer_id"),x.getInt("car_speed")))
    //把数据变成(camer_id,(car_speed,1))
        .mapValues((x:Int)=>(x,1.toInt))
      //每隔10秒计算一次前20秒的速度（4个rdd） Tuple2表示两个参数
      // (速度，数量)  （速度,数量）
      .reduceByKeyAndWindow((a:Tuple2[Int,Int], b:Tuple2[Int,Int]) =>
      {(a._1 + b._1,a._2 + b._2)},Seconds(20),Seconds(10))
    // carspeed  速度之和  数量之和
//    carspeed.map{case(key,value)=>(key,value._1/value._2.toFloat)}
     carspeed.foreachRDD(rdd=>{
       rdd.foreachPartition(partitionofRecords=>{
         //得到连接池的一个资源
         val jedis=RedisClient.pool.getResource
         // camer_id 卡口以及总的速度
         partitionofRecords.foreach(pair=>{
           val camer_id=pair._1  //卡口
           val total_speed=pair._2._1  //总的速度
           val count=pair._2._2  //总的数量
           val now=Calendar.getInstance().getTime() //获取当前的时间
           val minuteFormat=new SimpleDateFormat("HHmm") //获取分钟格式
           val dayFormat=new SimpleDateFormat("yyyyMMdd") //获取天格式
           val time = minuteFormat.format(now) //获取分钟
           val day = dayFormat.format(now)     //获取天


           //开始往redis中插入数据
           if(count!=0){
             jedis.select(dbindex)   //用选择的数据库
             // set进去一个map
             jedis.hset(day + "_" + camer_id, time ,total_speed + "_" + count)
             // 从redis中取数据
             val foreachdata=jedis.hget(day + "_" + camer_id, time)
             println(foreachdata)
           }
         })
         RedisClient.pool.returnResource(jedis)
       })
     })
    println("----------计算开始---------------------------")


    ssc.start()
    ssc.awaitTermination()
  }
}

Step 3: Run the code of kafka deployed in the second step to enter redis in the idea. The screenshot of the program is as follows:

Explanation: This shows that there is no job in the topic corresponding to kafka to enter data for it. This is normal, because at this time there is no data in Kafka, this step is to test the correctness of the code

If the dependent jar is not introduced, an error will be reported at this step. The screenshot is as follows:

Description: Need to introduce ezmorph-1.0.6.jar, and three dependent jars commons-collections-3.2.jar, commons-lang-2.3.jar, commons-pool2-2.2.jar, a total of four jars

The fourth step: Run the first step to print data in Kafka, the screenshot is shown below:

Explanation: 1 The screenshot shows that it has started to print data in Kafka

2 At this time, the data source, the test data should be made more points, to prevent all the data from being hit in Kafka in a while

Step 5: Now that there is data in Kafka, check the running result of the program in Step 3. The screenshot is as follows:

Explanation: This means that Kafka has started to punch data in redis

Step 6: Log in to the redis client and verify whether the data is stored in redis

[root@hadoop ~]# redis-cli -p 12002
127.0.0.1:12002> select 1
OK
127.0.0.1:12002[1]> hgetall 20180824_310999015305
1) "2038"
2) "40_2"

说明: 结果的意思是，在20180824的20点38分，在卡口310999015305，共有2辆车通过，这2辆车的总速度是40

Explanation: 1 Because the time of the key value in redis in the program is based on the current date, 20180824 is displayed instead of the date in the test data. In actual production, the date in the data is very close to the current date because it is calculated in real time

Get data from kafka and write it to redis

Guess you like