watermark time of acquisition
Before the conversion operator is unable to access the event timestamp information and water level information. And that in some scenarios, extremely important. For example MapFunction this map conversion operator can not access the timestamp of current event or event time. Based on this, DataStream API provides a range of Low Level conversion operator. Can be accessed time stamp, watermark and registered timed event. You can also export some special events, such as time-out events. Process Function used to build event-driven applications, and implement custom business logic before using the window function and the conversion operator can not be achieved. For example, Flink SQL is to use Process Function implemented.
Flink offers 8 Process Function
- ProcessFunction
- KeyedProcessFunction
- CoProcessFunction
- ProcessJoinFunction
- BroadcastProcessFunction
- KeyedBroadcastProcessFunction
- ProcessWindowFunction
- ProcessAllWindowFunction
KeyedProcessFunction
for operating KeyedStream. KeyedProcessFunction will process each stream element output is zero, one or a plurality of elements. All Process Function RichFunction are inherited from the interface,
so every open (), close () and getRuntimeContext () methods. And KeyedProcessFunction [KEY, IN, OUT] additionally provides two methods:
processElement (V: the IN, CTX: the Context, OUT: Collector [OUT]), each element stream will call this method, the result will call Collector on the output data type.
Context can be accessed timestamp elements, key elements, as well as TimerService time service. Context can also be output to another stream (Side)
the onTimer (timestamp: Long, CTX: OnTimerContext, OUT: Collector [OUT]) is a callback function. Called when the timer fires before registering.
Parameters timestamp is a timestamp timer set trigger. Collector for the output set. Like Context parameters OnTimerContext and processElement provide some contextual information,
such as time information of the timer-triggered event time or processing time.
TimerService and timers Timers
Context and OnTimerContext held TimerService objects have the following methods:
currentProcessingTime (): Returns the current Long processing time
currentWatermark (): Long Returns the current timestamp watermark
registerProcessingTimeTimer (timestamp: Long): Unit will register the current key the
proces sing time timer. When the timer time reaches the processing time, the trigger timer.
registerEventTimeTimer (timestamp: Long): Unit registers the current key event time timer. When the water level greater than or equal timer registration time, triggering a timer callback function.
deleteProcessingTimeTimer (timestamp: Long): registration processing time Timer Unit before deleting. Without this timestamp timer is not performed.
How KeyedProcessFunction operation KeyedStream.
Demand: a temperature sensor to monitor the temperature value, if the temperature rises in the continuous (processing time) of a second, the alarm.
package com.xxx.fink.api.windowapi import com.xxx.fink.api.sourceapi.SensorReading import org.apache.flink.api.common.state.{ValueState, ValueStateDescriptor} import org.apache.flink.streaming.api.functions.KeyedProcessFunction import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment} import org.apache.flink.streaming.api.windowing.time.Time import org.apache.flink.util.Collector import org.apache.flink.api.scala._ import org.apache.flink.streaming.api.TimeCharacteristic /** * 1s之内温度连续上升就报警 */ object ProcessFunctionTest { def main(args: Array[String]): Unit = { val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(1) env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val stream: DataStream[String] = env.socketTextStream("hadoop101", 7777) val dataStream: DataStream[SensorReading] = stream.map(data => { val dataArray: Array[String] = data.split(",") SensorReading(dataArray(0).trim, dataArray(1).trim.toLong, dataArray(2).trim.toDouble) }) .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SensorReading](Time.seconds(1)) { override def extractTimestamp(element: SensorReading): Long = element.timestamp * 1000 }) val processedStream: DataStream[String] = dataStream.keyBy(_.id) .process(new TempIncreAlert()) dataStream.print("Input data:") processedStream.print("process data:") env.execute("Window test") } } Class TempIncreAlert () the extends KeyedProcessFunction [String, SensorReading, String] { // temperature rises continuously, to keep a data do comparison; to save the current state // temperature on a defined data state stored value lazy val lastTemp: ValueState [ Double] = getRuntimeContext.getState ( new new ValueStateDescriptor [double] ( "lastTemp" , classOf [double])) // define a state, the timer is used to save the timestamp lazy val currentTimer: ValueState [Long] = getRuntimeContext.getState ( new new ValueStateDescriptor [Long] ( "currentTimer" , classOf [Long])) the override DEF processElement (value: SensorReading, CTX: KeyedProcessFunction [String, SensorReading, String] #Context, OUT: Collector [String]): Unit ={ // first remove one temperature value Val preTemp = lastTemp.value () // updated temperature value lastTemp.update (value.temperature) Val curTimerTs = currentTimer.value () // temperature rises and had not set a timer, the registration timer IF (value.temperature> preTemp && curTimerTs == 0 ) { // if the temperature drops, or the first data, and deleting the timer empty state val timerTs = ctx.timerService (). currentProcessingTime () + 10000L ctx.timerService (). registerProcessingTimeTimer (timerTs) currentTimer.update (timerTs) } the else IF (preTemp> value.temperature || preTemp == 0.0 ) { //And temperature rises too timer is not set, the registration timer ctx.timerService () deleteProcessingTimeTimer (curTimerTs). CurrentTimer.clear () } } the override the onTimer DEF (timestamp: Long, CTX: KeyedProcessFunction [String, SensorReading, String] #OnTimerContext , OUT: Collector [String]): Unit = { // alarm information is output out.collect (ctx.getCurrentKey + "continuously raised temperature" ) currentTimer.clear () } }
Data INPUT:> SensorReading (sensor_1,1547718199,35.0 ) INPUT Data: > SensorReading (sensor_1,1547718199,36.0 ) Process Data: > continuous temperature rise sensor_1
Side output stream (SideOutput)
Operator most DataStream API output is a single output, which is a data type stream.
In addition to split operator, may be divided into a plurality of flow streams, the data streams of these types are also the same.
The side outputs function process function may generate a plurality of streams and these streams of data types may not be the same.
Can be defined as a side output Out putTag [X] objects, X is the data type of the output stream. an event process function may be transmitted to one or more Context object side outputs by
import com.xxx.fink.api.sourceapi.SensorReading import org.apache.flink.streaming.api.functions.ProcessFunction import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor import org.apache.flink.streaming.api.scala.{DataStream, OutputTag, StreamExecutionEnvironment} import org.apache.flink.streaming.api.windowing.time.Time import org.apache.flink.util.Collector import org.apache.flink.api.scala._ /** * 侧输出流 代替split */ object SideOutputTest { def main(args: Array[String]): Unit = { val env = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(1) val stream = env.socketTextStream("hadoop101", 7777) val dataStream = stream.map(data => { val dataArray = data.split(",") SensorReading(dataArray(0).trim, dataArray(1).trim.toLong, dataArray(2).trim.toDouble) }).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[SensorReading](Time.seconds(1)) { override def extractTimestamp(element: SensorReading): Long = element.timestamp * 1000 }) processedStream Val: DataStream [SensorReading] = dataStream.process ( new new FreezingAlert ()) processedStream.print ( "Processed Data:" ) processedStream.getSideOutput ( new new . OutputTag [String] ( "Alert with freezing")) Print ( "Data Alert" ) env.execute ( "Test the Window" ) } } // freezing alarm, if less than 32F, alarm information is output to the output side of the stream class FreezingAlert () the extends ProcessFunction [SensorReading, SensorReading] { the lazy Val alertOutPut: OutputTag [String] = new new OutputTag [String] ( "with freezing Alert" ) override def processElement(value: SensorReading, ctx: ProcessFunction[SensorReading, SensorReading]#Context, out: Collector[SensorReading]): Unit = { if (value.temperature < 32.0) { ctx.output(alertOutPut, "freezing alert for" + value.id) } else { out.collect(value) } } }
test:
processed data: > SensorReading(sensor_1,1547718199,35.8) alert data> freezing alert forsensor_6 alert data> freezing alert forsensor_7 processed data: > SensorReading(sensor_10,1547718205,38.1)