Window join joins elements of two streams that share the same key and are located in the same window. You can use a window allocator to define these windows, evaluate them based on the elements in the two streams, and then pass the elements of both sides to User-defined locations, where JoinFunction or FlatJoinFunction can be used to issue results that meet the join conditions.
Basic syntax:
stream.join(otherStream)
.where(<KeySelector>)
.equalTo(<KeySelector>)
.window(<WindowAssigner>)
.apply(<JoinFunction>)
1、Tumbling Window Join
When executing a rolling window join, all elements with a common key and a common rolling window will be joined in a pairwise combination and passed to the JoinFunction or FlatJoinFunction. Because it behaves like an inner join, it does not emit elements in a stream that have no other stream elements in its scrolling window!
Case:
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
fsEnv.getConfig.setAutoWatermarkInterval(1000)
fsEnv.setParallelism(1)
//001 zhangsan 1571627570000
val userStream = fsEnv.socketTextStream("CentOS",7788)
.map(line=>line.split("\\s+"))
.map(ts=>User(ts(0),ts(1),ts(2).toLong))
.assignTimestampsAndWatermarks(new UserAssignerWithPeriodicWatermarks)
.setParallelism(1)
//001 apple 4.5 1571627570000L
val orderStream = fsEnv.socketTextStream("CentOS",8899)
.map(line=>line.split("\\s+"))
.map(ts=>OrderItem(ts(0),ts(1),ts(2).toDouble,ts(3).toLong))
.assignTimestampsAndWatermarks(new OrderItemWithPeriodicWatermarks)
.setParallelism(1)
userStream.join(orderStream)
.where(user=>user.id)
.equalTo(orderItem=> orderItem.uid)
.window(TumblingEventTimeWindows.of(Time.seconds(4)))
.apply((u,o)=>{
(u.id,u.name,o.name,o.price,o.ts)
})
.print()
fsEnv.execute("FlinkStreamSlidingWindowJoin")
2、Sliding Window Join
When performing sliding window connection, all elements with a common key and a common sliding window are connected in pairs and then passed to JoinFunction or FlatJoinFunction. In the current sliding window, elements that do not have other stream elements in a stream will not be emitted.
Note : Some elements may be connected in a sliding window, but not in another window!
Case:
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
fsEnv.getConfig.setAutoWatermarkInterval(1000)
fsEnv.setParallelism(1)
//001 zhangsan 1571627570000
val userStream = fsEnv.socketTextStream("CentOS",7788)
.map(line=>line.split("\\s+"))
.map(ts=>User(ts(0),ts(1),ts(2).toLong))
.assignTimestampsAndWatermarks(new UserAssignerWithPeriodicWatermarks)
.setParallelism(1)
//001 apple 4.5 1571627570000L
val orderStream = fsEnv.socketTextStream("CentOS",8899)
.map(line=>line.split("\\s+"))
.map(ts=>OrderItem(ts(0),ts(1),ts(2).toDouble,ts(3).toLong))
.assignTimestampsAndWatermarks(new OrderItemWithPeriodicWatermarks)
.setParallelism(1)
userStream.join(orderStream)
.where(user=>user.id)
.equalTo(orderItem=> orderItem.uid)
.window(SlidingEventTimeWindows.of(Time.seconds(4),Time.seconds(2)))
.apply((u,o)=>{
(u.id,u.name,o.name,o.price,o.ts)
})
.print()
fsEnv.execute("FlinkStreamTumblingWindowJoin")
3、Session Window Join
When the session window join is executed, all elements with the same key that meet the session conditions when "combined" are joined in a paired combination and passed to the JoinFunction or FlatJoinFunction. Again, this performs an inner join, so if there is a session window that only contains elements from one stream, no output will be emitted!
Case:
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
fsEnv.getConfig.setAutoWatermarkInterval(1000)
fsEnv.setParallelism(1)
//001 zhangsan 1571627570000
val userStream = fsEnv.socketTextStream("CentOS",7788)
.map(line=>line.split("\\s+"))
.map(ts=>User(ts(0),ts(1),ts(2).toLong))
.assignTimestampsAndWatermarks(new UserAssignerWithPeriodicWatermarks)
.setParallelism(1)
//001 apple 4.5 1571627570000L
val orderStream = fsEnv.socketTextStream("CentOS",8899)
.map(line=>line.split("\\s+"))
.map(ts=>OrderItem(ts(0),ts(1),ts(2).toDouble,ts(3).toLong))
.assignTimestampsAndWatermarks(new OrderItemWithPeriodicWatermarks)
.setParallelism(1)
userStream.join(orderStream)
.where(user=>user.id)
.equalTo(orderItem=> orderItem.uid)
.window(EventTimeSessionWindows.withGap(Time.seconds(5)))
.apply((u,o)=>{
(u.id,u.name,o.name,o.price,o.ts)
})
.print()
fsEnv.execute("FlinkStreamSessionWindowJoin")
4、Interval Join
Interval connection uses a public key to connect the elements of two streams (now call them A and B respectively), and the elements of stream B have timestamps that are relative to the timestamps of the elements in stream A. When a pair of elements are passed When ProcessJoinFunction, they will be assigned the larger timestamp of the two elements of ProcessJoinFunction.Context (accessible via).
Note : Interval connection currently only supports event time.
val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
fsEnv.getConfig.setAutoWatermarkInterval(1000)
fsEnv.setParallelism(1)
//001 zhangsan 1571627570000
val userStream = fsEnv.socketTextStream("CentOS",7788)
.map(line=>line.split("\\s+"))
.map(ts=>User(ts(0),ts(1),ts(2).toLong))
.assignTimestampsAndWatermarks(new UserAssignerWithPeriodicWatermarks)
.setParallelism(1)
.keyBy(_.id)
//001 apple 4.5 1571627570000L
val orderStream = fsEnv.socketTextStream("CentOS",8899)
.map(line=>line.split("\\s+"))
.map(ts=>OrderItem(ts(0),ts(1),ts(2).toDouble,ts(3).toLong))
.assignTimestampsAndWatermarks(new OrderItemWithPeriodicWatermarks)
.setParallelism(1)
.keyBy(_.uid)
userStream.intervalJoin(orderStream)
.between(Time.seconds(-1),Time.seconds(1))
.process(new ProcessJoinFunction[User,OrderItem,String]{
override def processElement(left: User, right: OrderItem, ctx: ProcessJoinFunction[User, OrderItem, String]#Context, out: Collector[String]): Unit = {
println(left+" \t"+right)
out.collect(left.id+" "+left.name+" "+right.name+" "+ right.price+" "+right.ts)
}
})
.print()
fsEnv.execute("FlinkStreamSessionWindowJoin")