fink DataStream operators and Case

table of Contents

1 classification

DataStream

keyedStream

window Stream

Important Case

DataStream

ProcessFunction

WindowAllDataStream → AllWindowedStream

keyedStream

window Stream

(1), grouping and non-grouping Windows.

(2), the predefined window dispenser

Scroll Window

Sliding window

 Case: scroll processing time window

Window Apply (window time window to type conversion treatment)

Window Reduce or if desired window value may Window Reduce + ProcessWindowFunction

Window Fold window or if desired value may Window Reduce + ProcessWindowFunction

Window  Reduce+ProcessWindowFunction

Window  Fold+ProcessWindowFunction

Window  Aggreate+ProcessWindowFunction


 

1 classification

DataStream

Map
DataStream → DataStream

FlatMap
DataStream → DataStream

Filter
DataStream → DataStream

KeyBy
DataStream → KeyedStream

WindowAll
DataStream → AllWindowedStream

Process instantiation ProcessFunction, processing each element

DataStream → DataStream

keyedStream [are all state polymerization]

Reduce   对比(Window Reduce)
KeyedStream → DataStream

Fold   对比(Window Fold)
KeyedStream → DataStream

Aggregations (including sum / min / max) Comparative (Windows Aggregations ON)
KeyedStream → DataStream

Window
KeyedStream → WindowedStream

KeyedProcessFunction

KeyedStream → DataStream

window Stream [Polymerization stateless]

The Apply the Window (time window to window type conversion treatment)
WindowedStream → DataStream
AllWindowedStream → DataStream

Window Reduce or if desired window value may be the Window the Reduce + ProcessWindowFunction
WindowedStream → DataStream

Window Fold window or if desired value may be the Window the Reduce + ProcessWindowFunction
WindowedStream → DataStream

Aggregations on windows or window if desired value may be the Window the Reduce + ProcessWindowFunction
WindowedStream → DataStream

window  ProcessWindowFunction (window time window to process each element)

WindowedStream → DataStream

 

Important Case

DataStream

ProcessFunction

DataStream → DataStream

val processStream: DataStream[result] = dataStream
      .process(new getAllFunction)

//将UserBehavior类转成result类
class getAllFunction extends ProcessFunction[UserBehavior, result] {
  override def processElement(value: UserBehavior,
                              ctx: ProcessFunction[UserBehavior, result]#Context,
                              out: Collector[result]): Unit = {
    //对每一个元素处理
    value match {
      case behavior: UserBehavior => {
        out.collect(result(behavior.itemId, behavior.count))
      }
      case _ => print("no way")
    }
  }
}

WindowAll
DataStream → AllWindowedStream

    val resultDataStream: DataStream[String] = processStream
      .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(5)))
      .apply((_: TimeWindow, input: Iterable[result], out: Collector[String]) => {
        out.collect(input.mkString(","))
      })
    resultDataStream.print()
    //输出结果:result(1715,1),result(1715,1),result(1715,1),result(1716,1),result(1716,1)

 

 

keyedStream

A reference window Stream keyby

window Stream

(1), grouping and non-grouping Windows.

keyby and WindoWall , packet data stream concurrent execution of your calculation window, that each logical packet flow is performed independently of other logical packet flow multitask. In the non-packet data stream, your original data stream and does not split into multiple streams and all logical window logic will be executed in a task, a degree of concurrency.

(2), the predefined window dispenser

Scroll Window

滚动事件时间窗口
input
    .keyBy(<key selector>)
    .window(TumblingEventTimeWindows.of(Time.seconds(5)))
    .<windowed transformation>(<window function>); 

 

滚动处理时间窗口
input
    .keyBy(<key selector>)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
    .<windowed transformation>(<window function>);

Sliding window

滑动事件时间窗口
input
    .keyBy(<key selector>)
    .window(SlidingEventTimeWindows.of(Time.seconds(10), Time.seconds(5)))
    .<windowed transformation>(<window function>);
滑动处理时间窗口
input
    .keyBy(<key selector>)
    .window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))
    .<windowed transformation>(<window function>);

 Case: scroll processing time window

    //windowStream
    val windowStream: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = textKeyStream.
      window(TumblingProcessingTimeWindows.of(Time.seconds(10)))

    //    textKeyStream.print("windowStream:")
    //windowStream:> (000002,1461756879000,1)
    //windowStream:> (000002,1461756879001,1)
    //windowStream:> (000002,1461756879002,1)

The Apply the Window (time window to window type conversion treatment)


WindowedStream → DataStream
AllWindowedStream → DataStream

    val resultDataStream: DataStream[String] = processStream
      .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(5)))
      .apply((_: TimeWindow, input: Iterable[result], out: Collector[String]) => {
        out.collect(input.mkString(","))
      })
    resultDataStream.print()
    //输出结果:result(1715,1),result(1715,1),result(1715,1),result(1716,1),result(1716,1)

Window Reduce or if desired window value may be the Window the Reduce + ProcessWindowFunction


WindowedStream → DataStream

    val reduceValue: DataStream[result] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .reduce { (v1, v2) => result(v1.itemId, v1.count + v2.count) }
    reduceValue.print()

Window Fold window or if desired value may be the Window the Reduce + ProcessWindowFunction


WindowedStream → DataStream

    val foldValue: DataStream[result] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .fold(result(111,333)){(original:result,ele:result)=>{
        result(ele.itemId,original.count+ele.count)
      }}
    foldValue.print()

Window  Reduce+ProcessWindowFunction


WindowedStream → DataStream: increasing the window parameters, and converted DataStream Type

    val reduceWindowFunctionData: DataStream[String] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .reduce((v1, v2) => result(v1.itemId, v1.count + v2.count)
        , (key: Tuple, window: TimeWindow, input: Iterable[result], out: Collector[String]) => {
          var ele = input.iterator.next()
          out.collect((s"${window.getStart}, $ele"))
        }
      )

 

Window  Fold+ProcessWindowFunction


WindowedStream → DataStream

    val foldWindowFunctionData: DataStream[String] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .fold(result(111, 333), (original: result, ele: result) => {
        result(ele.itemId, original.count + ele.count)
      }, (key: Tuple, window: TimeWindow, input: Iterable[result], out: Collector[String]) => {
        var ele = input.iterator.next()
        out.collect((s"${window.getEnd}, $ele"))
      })

Window  Aggreate+ProcessWindowFunction


main{
    val aggregateData: DataStream[String] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .aggregate(new CountAggregate,new MyProcessWindowFunction)

    aggregateData.print()
}

case class result(itemId: Long, count: Long)

//ACC createAccumulator(); 迭代状态的初始值
//ACC add(IN value, ACC accumulator); 每一条输入数据,和迭代数据如何迭代
//ACC merge(ACC a, ACC b); 多个分区的迭代数据如何合并
//OUT getResult(ACC accumulator); 返回数据,对最终的迭代数据如何处理,并返回结果。
class CountAggregate extends AggregateFunction[result, Long, String] {
  override def createAccumulator() = 6L

  override def add(value: result, accumulator:Long) =
    value.count+accumulator

  override def getResult(accumulator: Long) = "windows count is:"+accumulator.toString

  override def merge(a: Long, b: Long) =
    a+b
}

class MyProcessWindowFunction extends ProcessWindowFunction[String, String, Tuple, TimeWindow] {

  def process(key: Tuple, context: Context, input: Iterable[String], out: Collector[String]) = {
    val count = input.iterator.next()
    out.collect("window end is :"+context.window.getEnd+"key is :"+key+count)
  }
}

Output:

window end is :1575213345000key is :(1715)windows count is:7
window end is :1575213345000key is :(1713)windows count is:7
window end is :1575213345000key is :(1716)windows count is:8

 

 

 

 

 

 

 

 

 

 

3, the window function operator: fold, wherein the original is a, value of (a, 79000) and the like, the second original is a79000, the iteration continues.

    //foldStream
    //输入:
    // 000002 79000
    //000002 79001
    //000002 79002
    //000003 79003
    //000004 79004
    val groupDstream: DataStream[String] = windowStream.
      fold("a") { case (original, value) =>
        original + value._2
      }
    //    groupDstream.print("foldDstream::::").setParallelism(1)
    //window::::> a790007900179002
    //window::::> a79004
    //window::::> a79003

fold with windowFunction: If you need to window properties, you can use this method

        windowStream.fold(("", 0L, 100), (orignal: (String, Long, Int), element: (String, Long, Int)) => {
          (element._1, orignal._2 + element._2, orignal._3 + element._3)
        }, new MyWindowFunction).print()

class MyWindowFunction extends WindowFunction[(String, Long, Int), String, Tuple, TimeWindow] {
  def apply(key: Tuple, window: TimeWindow, input: Iterable[(String, Long, Int)], out: Collector[String]): Unit = {
    var allnumber = input.iterator.next()
    out.collect(s"Window ${window.getEnd} count: $allnumber")
    //    input.foreach{
    //    case (a, b, c) => {
    //      out.collect(s"${window.getEnd}  $a,$b,$c")
    //    }
    //  }
  }
}

 

4、reduce

    //reducedDstream
    //输入:000002 0
    //000002 1
    //000002 2
    //000003 3
    //000004 4
    val reducedDstream: DataStream[(String, Long)] = windowStream.reduce((t1, t2) => {
      (t1._1, t1._2 + t2._2)
    })
    //    reducedDstream.print("reducedDstream::::").setParallelism(1)
    //reducedDstream::::> (000002,3)
    //reducedDstream::::> (000004,4)
    //reducedDstream::::> (000003,3)

reduce with windowFunction

    windowStream.reduce((t1, t2) => {
      (t1._1, t1._2 + t2._2, t1._3 + t2._3)
    }, (key: Tuple, window: TimeWindow, input: Iterable[(String, Long, Int)], out: Collector[String]) => {
      var ele = input.iterator.next()
      out.collect((s"${window.getStart}, $ele"))
    }).print()

5、timeWindowAll和apply

      .timeWindowAll(Time.seconds(5))
      //窗口输出的每个元素是一个迭代器
      .apply { (_: TimeWindow, input: Iterable[(ClickLog, ClickMetrics)], out: Collector[java.util.List[(ClickLog, ClickMetrics)]]) => out.collect(input.toList.asJava) }

 

Published 159 original articles · won praise 75 · views 190 000 +

Guess you like

Origin blog.csdn.net/xuehuagongzi000/article/details/103178180