Flink Table API & SQL time property

This article is mainly on the official website Flink relevant content for translation, the original address: https: //ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/time_attributes.html

Flink capable of processing data streams according to different notions of time.

  • Correction processing time is the system time in the machine to perform a corresponding operation (also referred to as "wall-clock time").
  • Event time refers to stream data based on a time stamp is appended to each row. The timestamp may be encoded in the time of the incident.
  • Time is injected into the event Flink time; inside, it is treated similarly to the event time.

This article describes how to define time-based operating properties at the time of the Table API and SQL Flink's.

Time Properties Introduction

Table API SQL-based and operating time (e.g., window) concept and requires information about time sources. Thus, Table attribute may provide a logical time, and indicating a time stamp for the respective program access table.

Time attribute may be a table for each part of the schema. That can be a field. They are defined when you create a table from DataStream, or when using TableSource predefined. Once at the beginning of time defined attributes (fields), it can be used as reference field, and can be used in time-based operations.

As long as the time the property is not modified, but simply forwarded from one part to another part of the query, it is still effective time attribute. Time behavior of properties similar to conventional timestamp can be accessed to perform calculations. If the time attribute in the calculations, it will become routine embodied and timestamp. Conventional Flink not be used with the timestamp time and the watermark, and therefore can not be based on operating time.

Table procedures required to specify the appropriate time for the flow characteristics of the environment:

val env = StreamExecutionEnvironment.getExecutionEnvironment

env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) // default

// alternatively:
// env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

Processing time

Processing Time (processing time) to allow the program to produce results Table The local computer time. This is the simplest concept of time, but does not provide certainty . It does not need to extract the time stamp does not need to generate a watermark.

There are two methods for specifying the processing time attribute.

During DataStream to Table Conversion

During the schema definition, use .proctime attribute defines the processing time attributes. Time Physical properties can only extend an additional logical field mode. Therefore, it can only be defined at the end of schema definition.

val stream: DataStream[(String, String)] = ...

// declare an additional logical field as a processing time attribute
val table = tEnv.fromDataStream(stream, 'UserActionTimestamp, 'Username, 'Data, 'UserActionTime.proctime)

val windowedTable = table.window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)

Use TableSource

Attributes are defined by the processing time is achieved DefinedProctimeAttribute interface TableSource. Logical time attribute to a physical model from TableSource return type definition.

// define a table source with a processing attribute
class UserActionSource extends StreamTableSource[Row] with DefinedProctimeAttribute {

	override def getReturnType = {
		val names = Array[String]("Username" , "Data")
		val types = Array[TypeInformation[_]](Types.STRING, Types.STRING)
		Types.ROW(names, types)
	}

	override def getDataStream(execEnv: StreamExecutionEnvironment): DataStream[Row] = {
		// create stream
		val stream = ...
		stream
	}

	override def getProctimeAttribute = {
		// field with this name will be appended as a third field
		"UserActionTime"
	}
}

// register table source
tEnv.registerTableSource("UserActions", new UserActionSource)

val windowedTable = tEnv
	.scan("UserActions")
	.window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)

Event time

Program generation time allows the event table according to the time the results of each record included. Even in the case of incidents of disorder or delay the event, which also allows for consistent results. When reading records from persistent storage, also ensures reproducible results Table program.

In addition, the event time to allow uniform syntax Table of batch programs and streaming environments . Time attribute flow environment may be a conventional field recorded in a batch environment.

In order to deal with events out of order and to distinguish between streaming media and late-time event, Flink need to extract the time stamp from the event and make some kind of progress (the so-called Watermark) in time.

You can define event time or use TableSource property during DataStream-to-Table conversion.

During DataStream to Table Conversion

Use .rowtime property defines the events during the time attribute schema definition. It must be assigned a time stamp and watermark in converted DataStream in.
When converting to DataStream Table, there are two ways to define time attribute. According to the existence of a specified .rowtime field name, time stamp field DataStream architecture also

  • Appended as a new field to the schema

  • Or replace existing fields

    In either case, the event time stamp field value will be saved DataStream event time timestamp.

// Option 1:

// extract timestamp and assign watermarks based on knowledge of the stream
val stream: DataStream[(String, String)] = inputStream.assignTimestampsAndWatermarks(...)

// declare an additional logical field as an event time attribute
val table = tEnv.fromDataStream(stream, 'Username, 'Data, 'UserActionTime.rowtime)


// Option 2:

// extract timestamp from first field, and assign watermarks based on knowledge of the stream
val stream: DataStream[(Long, String, String)] = inputStream.assignTimestampsAndWatermarks(...)

// the first field has been used for timestamp extraction, and is no longer necessary
// replace first field with a logical event time attribute
val table = tEnv.fromDataStream(stream, 'UserActionTime.rowtime, 'Username, 'Data)

// Usage:

val windowedTable = table.window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)

Use TableSource

Event Time property is implementation-defined DefinedRowtimeAttributes interface TableSource. getRowtimeAttributeDescriptors () method returns a list RowtimeAttributeDescriptor, descriptive name for the final time attribute, the attribute values ​​used to derive a time stamp extractor and a watermark policy associated with the attribute.

Ensure getDataStream () method returns the time aligned DataStream property and defined. Only when the time stamp extractor StreamRecordTimestamp defined before considering DataStream time stamp (timestamp assigned by TimestampAssigner). Only when the strategy is defined PreserveWatermarks watermark, DataStream will retain the watermark. Otherwise, only the value of the property TableSource rowtime is relevant.

// define a table source with a rowtime attribute
class UserActionSource extends StreamTableSource[Row] with DefinedRowtimeAttributes {

	override def getReturnType = {
		val names = Array[String]("Username" , "Data", "UserActionTime")
		val types = Array[TypeInformation[_]](Types.STRING, Types.STRING, Types.LONG)
		Types.ROW(names, types)
	}

	override def getDataStream(execEnv: StreamExecutionEnvironment): DataStream[Row] = {
		// create stream
		// ...
		// assign watermarks based on the "UserActionTime" attribute
		val stream = inputStream.assignTimestampsAndWatermarks(...)
		stream
	}

	override def getRowtimeAttributeDescriptors: util.List[RowtimeAttributeDescriptor] = {
		// Mark the "UserActionTime" attribute as event-time attribute.
		// We create one attribute descriptor of "UserActionTime".
		val rowtimeAttrDescr = new RowtimeAttributeDescriptor(
			"UserActionTime",
			new ExistingField("UserActionTime"),
			new AscendingTimestamps)
		val listRowtimeAttrDescr = Collections.singletonList(rowtimeAttrDescr)
		listRowtimeAttrDescr
	}
}

// register the table source
tEnv.registerTableSource("UserActions", new UserActionSource)

val windowedTable = tEnv
	.scan("UserActions")
	.window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)
Published 87 original articles · won praise 69 · views 130 000 +

Guess you like

Origin blog.csdn.net/lp284558195/article/details/104384371