Flink 实现实时计数和按窗口计时功能程序设计

最近在开发程序的时候,有需要实时的计数处理,以及需要按照设计的窗口来补时长,在要求选用Flink处理后,程序大致架构设计入下:
其设计思路为:从kafk读取数据生成DataStream[Message]这样一个中间变量,然后将这个中间流分两个流,一个流用于实时计数,另一个流,按照互动窗口,按照EventTime将消息划分到不同的窗口,然后取出整个窗口的数据以及取出历史数据,形成完整窗口的数据,惊醒排序,然后对消息遍历,按窗口计算计算时间.


/** *****************************************************************************
  * 版权信息:北京中通天鸿武汉分公司
  *
  * @author xuchang
  *         Copyright: Copyright (c) 2007北京中通天鸿武汉分公司,Inc.All Rights Reserved.
  *         Description:
  ******************************************************************************/
object CtiReportRealTime {
  def main(args: Array[String]): Unit = {
    val parameterTool = ParameterTool.fromArgs(args)
    // 获取flink运行环境对象
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    // 设置sys.out 打印功能失效
    env.getConfig.enableSysoutLogging()
    // 设置flink重启策略
    env.getConfig.setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 1000))
    env.setStateBackend(new FsStateBackend("file:///tmp/flink-checkpoints"))
    //设置每5s一个checkpoint
    env.enableCheckpointing(5000)
    // 设置参数全局可用
    env.getConfig.setGlobalJobParameters(parameterTool)
    //  设置时间特性为eventTime
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    val flinkKafkaConsumer = new FlinkKafkaConsumer010[Message[Object]](
      parameterTool.getRequired("input-topic"),
      new MessageSchema,
      parameterTool.getProperties
    ).assignTimestampsAndWatermarks(new CustomWatermarkExtractor)
    // 从kafka中读取数据流实现
    val inputStream: DataStream[Message[Object]] = env.addSource(flinkKafkaConsumer)

    //计数数据流
    val counterMap: DataStream[EventMap] =
      inputStream.keyBy("mainType", "extType")
        .flatMap((message: Message[Object], collector) => {
          val proxy: ProxyImpl = new ProxyImpl
          val eventMap: EventMap = proxy.count(message)
          if (eventMap != null && !eventMap.isEmpty) {
            collector.collect(eventMap)
          }
        }
        )

    val timerMap: DataStream[EventMap] =
      inputStream.keyBy("mainType", "extType", "vccId", "modeParm")
        .window(TumblingEventTimeWindows.of(Time.seconds(5)))
        .apply(function = (tuple, timeWindow,
                           iterable: Iterable[Message[Object]], collect) => {
          // 窗口起始时间
          val startWindow = timeWindow.getStart
          println(DateUtil.getFormatHour(new Date(startWindow)))
          //窗口结束时间
          val endWidow = timeWindow.getEnd
          //   初始化代理工具类
          val proxy = new ProxyImpl
          val messages: java.util.ArrayList[Message[_]] = new java.util.ArrayList[Message[_]]()
          iterable.foreach(message => messages.add(message))
          val cacheMessages = proxy.read(messages.get(0), startWindow)
          if (cacheMessages != null && cacheMessages.size() > 0) {
            messages.addAll(cacheMessages)
          }
          var lastMessage: Message[_] = null
          for (msg <- messages) {
            val eventMap = proxy.timer(startWindow, endWidow, msg, lastMessage)
            if (eventMap != null && !eventMap.isEmpty) {
              collect.collect(eventMap)
            }
            lastMessage = msg
          }
          if (lastMessage != null) {
            val eventMap = proxy.timer(startWindow, endWidow, proxy.createMessage(lastMessage), lastMessage)
            if (eventMap != null && !eventMap.isEmpty) {
              collect.collect(eventMap)
            }
            proxy.save(lastMessage, endWidow)
          }
        }

        )
    counterMap.print()
    timerMap.print()
    counterMap.addSink(new BeamSink[EventMap](new SimpleEventBeamFactory))
    timerMap.addSink(new BeamSink[EventMap](new SimpleEventBeamFactory))
    env.execute("cti report real time stream")
  }

由于代码设计大量业务逻辑,本初只是自己对flink的应用,以及对重点代码进行记录,并不能直接运行.

猜你喜欢

转载自blog.csdn.net/u012164361/article/details/82747026