13. Basic concepts of Flink's table api and sql, general api introduction and getting started examples

Flink series of articles

1. Links to a series of comprehensive articles such as Flink deployment, concept introduction, source, transformation, sink usage examples, introduction and examples of the four cornerstones

13. Basic concepts of Flink's table api and sql, introduction to general api and getting started examples 14. Data types of Flink's table api and sql
: built-in data types and their attributes
A detailed introduction to dynamic tables, time attribute configuration (how to process update results), temporal tables, joins on streams, certainty on streams, and query configuration

22. Flink's table api and sql create table DDL



This article introduces the concepts and usage examples of Table Api and Sql. Next, we will continue to introduce the concepts of this part, and finally give comprehensive usage examples.
This article is the first article in this series to introduce Table API and SQL. Next, this part will be introduced through 9 articles.
This article is divided into three parts, namely, an overview of table api and sql, concepts and api, and simple introductory examples.

1. Introduction to Table API & SQL

Chinese official website link: https://nightlies.apache.org/flink/flink-docs-release-1.12/zh/dev/table/

1. Introduction to Table API & SQL

Apache Flink has two relational APIs for unified streaming and batch processing: Table API and SQL.
Table API is a query API for Scala and Java languages. It can combine selection, filtering, join and other relational operators in a very intuitive way.
Flink SQL is standard SQL implemented based on Apache Calcite. Regardless of whether the input is continuous (streaming) or bounded (batch), queries specified in both interfaces have the same semantics and specify the same results.

The Table API and the SQL API are tightly integrated, as well as the DataStream API. You can easily switch between these APIs, and some libraries based on these APIs. For example, you can first use CEP to do pattern matching from DataStream, and then use Table API to analyze the matching results; or you can use SQL to scan, filter, and aggregate a batch table, and then run a Gelly graph algorithm to process the pre-prepared data. Processed data.

Flink's Table module includes Table API and SQL:
Table API is a SQL-like API. Through Table API, users can operate data like a table, which is very intuitive and convenient.
SQL, as a declarative language, has standard syntax and specifications. , users can process data without caring about the underlying implementation, and it is very easy to get started.
About 80% of the code in the implementation of Flink Table API and SQL is public. As a stream-batch unified computing engine, Flink's runtime layer is unified.

2. maven dependency

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-api-java-bridge_2.11</artifactId>
  <version>1.12.7</version>
  <scope>provided</scope>
</dependency>
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-planner-blink_2.11</artifactId>
  <version>1.12.7</version>
  <scope>provided</scope>
</dependency>
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-streaming-scala_2.11</artifactId>
  <version>1.12.7</version>
  <scope>provided</scope>
</dependency>
<!--扩展依赖-->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-common</artifactId>
  <version>1.12.7</version>
  <scope>provided</scope>
</dependency>

3. Introduction to table api and sql chapters

1. Public concepts and APIs: Table API and SQL public concepts and APIs
2. Data types: built-in data types and their properties
3. Streaming concepts: Documents related to streaming in Table API and SQL, such as configuring time attributes and how to process them Update results
4. Connecting to external systems: Connectors and formats for reading and writing external systems
5. Table API: Operations supported by Table API
6. SQL: Operations and syntax supported by SQL
7. Built-in functions: Built-in functions in Table API and SQL
8. SQL Client: You can try Flink SQL without writing code, and you can directly submit SQL tasks to the cluster.
9. Table api and sql examples

2. Concepts and general API

1. The main differences between the two planners (Planner)

The two planners refer to the flink-table-planner used before flink version 1.9 and the flink-table-planner-blink used by default after version 1.11. These two planners are generally called old planner and blink planner.

  1. Blink treats batch jobs as a special case of stream processing. Strictly speaking, mutual conversion between Table and DataSet is not supported, and batch processing jobs will not be converted into
    DataSet programs but into DataStream programs, and the same is true for stream processing jobs.
  2. The Blink planner does not support BatchTableSource, but uses a bounded StreamTableSource instead.
  3. The implementation of FilterableTableSource in the old planner and the Blink planner is incompatible. The old planner would push down the PlannerExpression to the FilterableTableSource, while the Blink planner pushes down the Expression.
  4. String-based key-value configuration options are only used in the Blink planner.
  5. The implementation of PlannerConfig (CalciteConfig) in the two planners is different.
  6. The Blink planner will optimize multiple sinks (multiple-sinks) into a directed acyclic graph (DAG).
    Both TableEnvironment and StreamTableEnvironment support this feature. The old planner always optimizes each sink into a new directed acyclic graph, and all graphs are independent of each other.
  7. The old planner currently does not support catalog statistics, while Blink does.

2. Structure of Table API and SQL program

Sample code

// create a TableEnvironment for specific planner batch or streaming
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

// create an input Table
tableEnv.executeSql("CREATE TEMPORARY TABLE tablename_test... WITH ( 'connector' = ... )");
// register an output Table
tableEnv.executeSql("CREATE TEMPORARY TABLE outputTable_test ... WITH ( 'connector' = ... )");

// create a Table object from a Table API query
Table table2 = tableEnv.from("tablename_test").select(...);
// create a Table object from a SQL query
Table table3 = tableEnv.sqlQuery("SELECT ... FROM tablename_test... ");

// emit a Table API result Table to a TableSink, same for SQL result
TableResult tableResult = table2.executeInsert("outputTable_test ");
tableResult...

Table API and SQL queries can be easily integrated and embedded into DataStream or DataSet programs. For information about conversion, please refer to the later chapters of this article.

3. Create TableEnvironment

TableEnvironment is a core concept of Table API and SQL. It is responsible for:

  • Register Table in the internal catalog
  • Register an external catalog
  • Load pluggable modules
  • Execute SQL query
  • Register a custom function (scalar, table or aggregation)
  • Convert DataStream or DataSet to Table
  • Holds a reference to ExecutionEnvironment or StreamExecutionEnvironment

Table is always bound to a specific TableEnvironment. Tables in different TableEnvironments cannot be used in the same query, for example, by joining or union operations on them.

TableEnvironment can be created in StreamExecutionEnvironment or ExecutionEnvironment through the static method BatchTableEnvironment.create() or StreamTableEnvironment.create(). TableConfig is optional. TableConfig can be used to configure TableEnvironment or customized query optimization and transformation processes (see the query optimization chapter of this article).

Make sure to select the specific planner BatchTableEnvironment/StreamTableEnvironment that matches your programming language.

If both planner jars are on the classpath (the default behavior), you should explicitly set the planner to be used in the current program.

  • Flink query
// **********************
// FLINK STREAMING QUERY
// **********************
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);
// or TableEnvironment fsTableEnv = TableEnvironment.create(fsSettings);

// ******************
// FLINK BATCH QUERY
// ******************
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.BatchTableEnvironment;

ExecutionEnvironment fbEnv = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment fbTableEnv = BatchTableEnvironment.create(fbEnv);


  • Blink query
// **********************
// BLINK STREAMING QUERY
// **********************
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(bsEnv, bsSettings);
// or TableEnvironment bsTableEnv = TableEnvironment.create(bsSettings);

// ******************
// BLINK BATCH QUERY
// ******************
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;

EnvironmentSettings bbSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
TableEnvironment bbTableEnv = TableEnvironment.create(bbSettings);

Note: If there is only one planner jar package in the /lib directory, you can use useAnyPlanner to create EnvironmentSettings.

4. Create a table in Catalog

TableEnvironment maintains a mapping of table catalogs created by identifiers. Identifiers consist of three parts: catalog name, database name, and object name. If the catalog or database is not specified, the current default will be used (see the Table Identifier Extensions section for examples).

Tables can be virtual (views VIEWS) or regular (tables TABLES). View VIEWS can be created from an existing Table, which is generally the query result of Table API or SQL. Table TABLES describes external data, such as files, database tables, or message queues.

1), Temporary Table and Permanent Table

Tables can be temporary and associated with the lifetime of a single Flink session, or they can be permanent and visible across multiple Flink sessions and clusters.

Permanent tables require a catalog (such as Hive Metastore) to maintain the table's metadata. Once a permanent table is created, it will be visible to any Flink session connected to the catalog and will persist until explicitly deleted.

Temporary tables, on the other hand, are usually kept in memory and exist only for the duration of the Flink session in which they are created. These tables are not visible to other sessions. They are not tied to any catalog or database but can be created in a namespace. Even if their corresponding database is deleted, temporary tables will not be deleted.

Temporary tables can be registered with the same identifier as an existing permanent table. A temporary table blocks the permanent table, and the permanent table is inaccessible as long as the temporary table exists. All queries using this identifier will operate on the temporary table.
This may be useful for experimentation. It allows the exact same query to be performed against a temporary table first, for example only a subset of the data, or the data is indeterminate. Once the correctness of the query has been verified, it can be queried against the actual production table.

2), create table

  • Virtual Tables - Views
    In SQL terms, the objects of the Table API correspond to views (virtual tables). It encapsulates a logical query plan. It can be created in the catalog by:
// get a TableEnvironment
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

// table is the result of a simple projection query 
Table tableName= tableEnv.from("X").select(...);

// register the Table projTable as table "projectedTable"
tableEnv.createTemporaryView("projectedTable", tableName);

Note: From the perspective of a traditional database system, Table objects are very similar to VIEW views. That is, the query that defines the Table is not optimized and will be embedded in another query that references the registered Table. If multiple queries refer to the same registered Table, it will be embedded in each query and executed multiple times, which means that the results of the registered Table will not be shared (Note: Blink planner's TableEnvironment will be optimized to only be executed once).

  • Connector Tables
    Another way to create a TABLE is through a connector declaration. Connectors describe external systems that store table data. Storage systems such as Apache Kafka or regular file systems can be declared in this way.
tableEnvironment
  .connect(...)
  .withFormat(...)
  .withSchema(...)
  .inAppendMode()
  .createTemporaryTable("tableName")

3), extended table identifier

Tables are always registered with a ternary identifier, consisting of catalog name, database name, and table name.

Users can specify a catalog and database as "current catalog" and "current database". With these, the first two parts of the ternary identifier just mentioned can be omitted. If the first two identifiers are not specified, then the current catalog and current database will be used. Users can also switch the current catalog and current database through Table API or SQL.

Identifiers follow SQL standards and therefore need to be escaped with backticks (`) when used.

TableEnvironment tEnv = ...;
tEnv.useCatalog("custom_catalog");
tEnv.useDatabase("custom_database");

Table table = ...;

// register the view named 'exampleView' in the catalog named 'custom_catalog'
// in the database named 'custom_database' 
tableEnv.createTemporaryView("view_Name", table);

// register the view named 'exampleView' in the catalog named 'custom_catalog'
// in the database named 'other_database' 
tableEnv.createTemporaryView("other_database.view_Name", table);

// register the view named 'example.View' in the catalog named 'custom_catalog'
// in the database named 'custom_database' 
tableEnv.createTemporaryView("`example.View`", table);

// register the view named 'exampleView' in the catalog named 'other_catalog'
// in the database named 'other_database' 
tableEnv.createTemporaryView("other_catalog.other_database.viewName", table);

5. Query table

1)、Table API

Table API is an integrated language query API for Scala and Java. Contrary to SQL, the Table API's queries are not specified by strings but are built incrementally in the host language.

The Table API is based on the Table class, which represents a table (stream or batch) and provides methods for using relational operations. These methods return a new Table object that represents the result of relational operations on the input Table. Some relational operations consist of multiple method calls, such as table.groupBy(…).select(), where groupBy(…) specifies the grouping of the table, and select(…) projects onto the table grouping.

This link lists all supported operator operations: 17. Flink’s table api and sql’s Table API: Table API supported operations
The following example shows a simple Table API aggregation query:

// get a TableEnvironment
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

// register Orders table

// scan registered Orders table
Table orders = tableEnv.from("student");
// compute revenue for all customers from France
Table revenue = orders
  .filter($("name").isEqual("alanchan"))
  .groupBy($("id"), $("name")
  .select($("id"), $("name"), $("chinese").sum().as("sum_c"));

// emit or convert Table
// execute query

2)、SQL

Flink SQL is based on Apache Calcite which implements the SQL standard. SQL queries are specified by regular strings.

This link describes Flink
’s SQL support for stream processing and batch tables: 26. Overview and introduction to Flink’s SQL
27. Flink’s SQL SELECT (Queries)
28. Flink’s SQL DROP statement, ALTER statement, INSERT statement, ANALYZE Statement
29, Flink SQL’s DESCRIBE, EXPLAIN, USE, SHOW, LOAD, UNLOAD, SET, RESET, JAR, JOB Statements, UPDATE, DELETE
30, Flink SQL’s SQL client

The following example demonstrates how to specify a query and return the results as a Table object.

// get a TableEnvironment
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

// register Orders table

// compute revenue for all customers from France
Table revenue = tableEnv.sqlQuery(
    "SELECT id, name, SUM(chinese) AS sum_c" +
    "FROM student" +
    "WHERE name= 'alanchan' " +
    "GROUP BY id, name"
  );

// emit or convert Table
// execute query

The following example shows how to specify an update query to insert the results of the query into the registered table

// get a TableEnvironment
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

// register "Orders" table
// register "RevenueFrance" output table

// compute revenue for all customers from France and emit to "RevenueFrance"
tableEnv.executeSql(
    "INSERT INTO student_sumscore " +
    "SELECT id, name, SUM(chinese) AS sum_c" +
    "FROM student" +
    "WHERE name= 'alanchan' " +
    "GROUP BY id, name"
  );

3), mix Table API and SQL

Mixing the Table API with SQL queries is very easy because they both return Table objects:

  • Table API queries can be defined on Table objects returned by SQL queries.
  • The result table registered in the TableEnvironment can be referenced in the FROM clause of the SQL query. In this way, the SQL query can be defined on the results of the Table API query.

6. Output table

Table is output by writing to TableSink. TableSink is a general interface for supporting multiple file formats (such as CSV, Apache Parquet, Apache Avro), storage systems (such as JDBC, Apache HBase, Apache Cassandra, Elasticsearch) or message queue systems (such as Apache Kafka, RabbitMQ).

Batch processing Table can only write to BatchTableSink, while stream processing Table needs to specify writing to AppendStreamTableSink, RetractStreamTableSink or UpsertStreamTableSink.

This link can get more information about the available sinks and how to customize the TableSink: https://nightlies.apache.org/flink/flink-docs-release-1.12/zh/dev/table/sourceSinks.html

The method Table.executeInsert(String tableName) sends the Table to the registered TableSink. This method searches the TableSink in the catalog by name and confirms that the Table schema and TableSink schema are consistent.

The following example shows how to output a Table:

/ get a TableEnvironment
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

// create an output Table
final Schema schema = new Schema()
    .field("a", DataTypes.INT())
    .field("b", DataTypes.STRING())
    .field("c", DataTypes.BIGINT());

tableEnv.connect(new FileSystem().path("/usr/local/bigdata/testdata"))
    .withFormat(new Csv().fieldDelimiter('|').deriveSchema())
    .withSchema(schema)
    .createTemporaryTable("CsvSinkTable");

// compute a result Table using Table API operators and/or SQL queries
Table result = ...
// emit the result Table to the registered TableSink
result.executeInsert("CsvSinkTable");

7. Translate and execute queries

The way the two planners translate and execute queries is different.

1)、Blink planner(flink-table-planner-blink)

Regardless of whether the input data source is streaming or batch, Table API and SQL queries are converted into DataStream procedures. Queries are represented internally as logical query plans and are translated into two phases:

  • Optimize logical execution plan
  • Translated into DataStream program

Table API or SQL queries will be translated under the following circumstances:

  • When TableEnvironment.executeSql() is called. This method is used to execute a SQL statement. Once this method is called, the SQL statement is immediately translated.
  • When Table.executeInsert() is called. This method is used to insert the contents of a table into the target table. Once this method is called, the TABLE API program is immediately translated.
  • When Table.execute() is called. This method is used to collect the contents of a table locally. Once this method is called, the TABLE API program is immediately translated.
  • When StatementSet.execute() is called. Table (output to a Sink through StatementSet.addInsert()) and INSERT statements (by calling StatementSet.addInsertSql()) will be cached in the StatementSet first. When the StatementSet.execute() method is called, all sinks will be optimized. into a directed acyclic graph.
  • When the Table is converted to a DataStream (combined with the DataStream and DataSet APIs below in this article). After the conversion is complete, it becomes a normal DataStream program and will be executed when StreamExecutionEnvironment.execute() is called.

Starting from version 1.11, the sqlUpdate method and insertInto method are abandoned. Table programs built from these two methods must be executed through the StreamTableEnvironment.execute() method, but not through the StreamExecutionEnvironment.execute() method.

2), Flink old planner (flink-table-planner)

Table API and SQL queries are translated into DataStream or DataSet procedures, depending on whether their input data source is streaming or batch. Queries are represented internally as logical query plans and are translated into two phases:

  • Optimize logical execution plan
  • Translated into DataStream or DataSet program

Table API or SQL queries will be translated under the following circumstances:

  • When TableEnvironment.executeSql() is called. This method is used to execute a SQL statement. Once this method is called, the SQL statement is immediately translated.
  • When Table.executeInsert() is called. This method is used to insert the contents of a table into the target table. Once this method is called, the TABLE API program is immediately translated.
  • When Table.execute() is called. This method is used to collect the contents of a table locally. Once this method is called, the TABLE API program is immediately translated.
  • When StatementSet.execute() is called. Table (output to a Sink through StatementSet.addInsert()) and INSERT statements (by calling StatementSet.addInsertSql()) will be cached in the StatementSet first. When the StatementSet.execute() method is called, all sinks will be optimized. into a directed acyclic graph.
  • For Streaming, translation is triggered when a Table is converted to a DataStream (see Combining with DataStream and DataSet API). After conversion, it becomes a normal DataStream program and will be executed when StreamExecutionEnvironment.execute() is called. For Batch, translation is triggered when Table is converted into DataSet (see Combining with DataStream and DataSet API). After conversion, it becomes an ordinary DataSet program and will be executed when ExecutionEnvironment.execute() is called.

Starting from version 1.11, the sqlUpdate method and the insertInto method are deprecated. For Streaming, if a Table program is built from these two methods, it must be executed through the StreamTableEnvironment.execute() method, but not through the StreamExecutionEnvironment.execute() method; for Batch, if a Table program is Those built from these two methods must be executed through the BatchTableEnvironment.execute() method, but not through the ExecutionEnvironment.execute() method.

8. Integrate with DataStream and DataSet API

Both planners can be combined with the DataStream API for stream processing. Only legacy planners can be combined with the DataSet API. For batch processing, the Blink planner cannot be combined with either planner.
Note: The DataSet API discussed below is only relevant from the old plan onwards.
Table API and SQL can be easily integrated and embedded into DataStream and DataSet programs. For example, one can query an external table (e.g. from an RDBMS), do some preprocessing such as filtering, projecting, aggregating or joining with metadata, and then use the DataStream or DataSet API (and any library built on top of these APIs, such as CEP or Gelly). Conversely, you can also apply Table API or SQL queries to the results of a DataStream or DataSet program.

This interaction can be realized through the mutual transformation of DataStream or DataSet and Table. This section describes how these transformations are achieved.

1) Create a view through DataSet or DataStream

DataStream or DataSet can be registered as a view in TableEnvironment. The schema of the result view depends on the data type of the registered DataStream or DataSet. See Document data type to table schema mapping for details.

Note: Views created through DataStream or DataSet can only be registered as temporary views.

// get StreamTableEnvironment
// registration of a DataSet in a BatchTableEnvironment is equivalent
StreamTableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

DataStream<Tuple2<Long, String>> stream = ...

// register the DataStream as View "myTable" with fields "f0", "f1"
tableEnv.createTemporaryView("table_name", stream);

// register the DataStream as View "myTable2" with fields "myLong", "myString"
tableEnv.createTemporaryView("table_name2", stream, $("id"), $("name"));

2) Convert DataStream or DataSet into a table

Unlike registering DataStream or DataSet in TableEnvironment, DataStream and DataSet can also be directly converted to Table.

// get StreamTableEnvironment
// registration of a DataSet in a BatchTableEnvironment is equivalent
StreamTableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

DataStream<Tuple2<Long, String>> stream = ...

// Convert the DataStream into a Table with default fields "f0", "f1"
Table table1 = tableEnv.fromDataStream(stream);

// Convert the DataStream into a Table with fields "myLong", "myString"
Table table2 = tableEnv.fromDataStream(stream, $("id"), $("name"));

3) Convert the table into DataStream or DataSet

Table can be converted to DataStream or DataSet. In this way, custom DataSet or DataStream programs can be run on the results of Table API or SQL queries.

When converting a Table to a DataStream or DataSet, you need to specify the data type of the generated DataStream or DataSet, that is, the data type to which each row of Table data is to be converted. Usually the most convenient option is to convert to Row. The following list outlines what the different options do:

  • Row: Fields are mapped by position, the number of fields is arbitrary, null values ​​are supported, and there is no type-safe check.
  • POJO: Fields are mapped by name (POJO must be named according to the field name in Table), the number of fields is arbitrary, null values ​​are supported, and there is no type safety check.
  • Case Class: Fields are mapped by position, null values ​​are not supported, and there is type safety check.
  • Tuple: Fields are mapped positionally, the number of fields is less than 22 (Scala) or 25 (Java), null values ​​are not supported, and there is no type safety check.
  • Atomic Type: Table must have one field, does not support null values, and has type safety checks.

1. Convert the table into DataStream

The result table of a streaming query is dynamically updated, that is, the query results change when new records arrive in the query's input stream. Therefore, converting dynamic query results into a DataStream like this requires coding how the table is updated.

There are two modes for converting Table to DataStream:

  • Append Mode: This mode can be used only when the dynamic Table is modified only by INSERT changes, i.e. it is an append operation only and the previously output results are never updated.
  • Retract Mode: This mode can be used in any situation. It uses boolean values ​​to tag data for INSERT and DELETE operations.
// get StreamTableEnvironment. 
StreamTableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

// Table with two fields (String name, Integer age)
Table table = ...

// convert the Table into an append DataStream of Row by specifying the class
DataStream<Row> dsRow = tableEnv.toAppendStream(table, Row.class);

// convert the Table into an append DataStream of Tuple2<String, Integer> 
//   via a TypeInformation
TupleTypeInfo<Tuple2<String, Integer>> tupleType = new TupleTypeInfo<>(
  Types.STRING(),
  Types.INT());
DataStream<Tuple2<String, Integer>> dsTuple = 
  tableEnv.toAppendStream(table, tupleType);

// convert the Table into a retract DataStream of Row.
//   A retract stream of type X is a DataStream<Tuple2<Boolean, X>>. 
//   The boolean field indicates the type of the change. 
//   True is INSERT, false is DELETE.
DataStream<Tuple2<Boolean, Row>> retractStream = 
  tableEnv.toRetractStream(table, Row.class);
//		DataStream<Tuple2<Boolean, Student>> result = tenv.toRetractStream(resultTable, Student.class);
//		DataStream<Tuple2<Boolean, Row>> result = tenv.toRetractStream(resultTable, Row.class);
//		DataStream<Tuple2<Boolean, Result>> result = tenv.toRetractStream(resultTable, Result.class);

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Student {
    
    
	private Long id;
	private String name;
	private double chinese;
	private double english;
	private double math;
}

	@Data
	public static class Result {
    
    
		private Long id;
		private Double sum_c;
		private Double sum_m;
		private Double sum_e;
	}
	

Regarding dynamic tables, please refer to the link: 15. Flink’s table api and sql streaming concepts - a detailed introduction to dynamic tables, time attribute configuration (how to process update results), temporal tables, joins on streams, and streams Determinism and the dynamic table part of the query configuration

Once the Table is converted to a DataStream, the DataStream job must be executed using the execute method of the StreamExecutionEnvironment.

2. Convert the table into a DataSet

The process of converting Table into DataSet is as follows:

// get BatchTableEnvironment
BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env);

// Table with two fields (String name, Integer age)
Table table = ...

// convert the Table into a DataSet of Row by specifying a class
DataSet<Row> dsRow = tableEnv.toDataSet(table, Row.class);

// convert the Table into a DataSet of Tuple2<String, Integer> via a TypeInformation
TupleTypeInfo<Tuple2<String, Integer>> tupleType = new TupleTypeInfo<>(
  Types.STRING(),
  Types.INT());
DataSet<Tuple2<String, Integer>> dsTuple = 
  tableEnv.toDataSet(table, tupleType);

Once the Table is converted to a DataSet, the DataSet job must be executed using the execute method of ExecutionEnvironment.

4) Mapping of data types to Table Schema

Flink's DataStream and DataSet APIs support a variety of data types. For example, Tuple (Scala built-in and Flink Java tuple), POJO type, Scala case class type, and Flink's Row type allow for nesting and have multiple fields that can be accessed in table expressions. Other types are considered atomic types. Below, we discuss how the Table API converts these data types into internal row representations and provides an example of converting a DataStream into a Table.

There are two ways to map data types to table schema: based on field position or based on field name.

  • Based on field position Position-
    based mapping provides more meaningful names for fields while maintaining their order. This mapping can be used for composite data types with a specific field order as well as atomic types. Composite data types such as tuple, row, and case class have such field order. However, fields of POJO type must be mapped by name (see next chapter). Fields can be projected but not renamed using as.

When defining a position-based mapping, the specified name must not exist in the input data type, otherwise the API will assume that the mapping should be based on the field name. If no field names are specified, the default field names and field order of the composite data type are used, or f0 is used for atomic types.

// get a StreamTableEnvironment, works for BatchTableEnvironment equivalently
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
		StreamTableEnvironment tenv = StreamTableEnvironment.create(env, settings);

		DataStream<Tuple2<Long, String>> stream = env.fromCollection(
				Arrays.asList(new Tuple2(1L, "alan"), 
						new Tuple2(2L, "alanchan"), 
						new Tuple2(3L, "alanchanchn"), 
						new Tuple2(4L, "alanalan_chn"), 
						new Tuple2(5L, "alan_chan_chn")));

// convert DataStream into Table with default field names "f0" and "f1"
Table t_name_row = tenv.fromDataStream(stream);
DataStream<Tuple2<Boolean, Row>> result = tenv.toRetractStream(t_name_row, Row.class);

// convert DataStream into Table with field "id" only
Table table_f0 = tenv.fromDataStream(stream, $("f0"));
DataStream<Tuple2<Boolean, Row>> result2 = tenv.toRetractStream(table_f0, Row.class);

// convert DataStream into Table with field names "id" and "name"
Table table_f = tenv.fromDataStream(stream, $("f0"), $("f1"));
DataStream<Tuple2<Boolean, Row>> result3 = tenv.toRetractStream(table_f, Row.class);

result.print();
result2.print();
result3.print();

env.execute();

  • Based on field names
    Name-based mapping applies to any data type including POJO types. This is the most flexible way to define table schema mapping. All fields in the map are referenced by name and can be renamed via as. Fields can be reordered and mapped.

If no field names are specified, the default field names and field order of the composite data type are used, or f0 is used to represent an atomic type.

// get a StreamTableEnvironment, works for BatchTableEnvironment equivalently
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment tenv = StreamTableEnvironment.create(env, settings);

DataStream<Tuple2<Long, String>> stream = env.fromCollection(
		Arrays.asList(new Tuple2(1L, "alan"), 
					new Tuple2(2L, "alanchan"), 
					new Tuple2(3L, "alanchanchn"), 
					new Tuple2(4L, "alanalan_chn"), 
					new Tuple2(5L, "alan_chan_chn")));

// convert DataStream into Table with default field names "f0" and "f1"
Table t_name_row = tenv.fromDataStream(stream);
DataStream<Tuple2<Boolean, Row>> result = tenv.toRetractStream(t_name_row, Row.class);

// convert DataStream into Table with field "f0" only
Table table_f0 = tenv.fromDataStream(stream, $("f0"));
DataStream<Tuple2<Boolean, Row>> result2 = tenv.toRetractStream(table_f0, Row.class);
		
		
// convert DataStream into Table with field names "id" and "name"
Table table_f = tenv.fromDataStream(stream, $("f0").as("id"), $("f1").as("name"));
DataStream<Tuple2<Boolean, Row>> result3 = tenv.toRetractStream(table_f, Row.class);

result.print();
result2.print();
result3.print();

env.execute();

1. Atomic type

Flink treats basic data types (Integer, Double, String) or general data types (data types that cannot be split again) as atomic types. An atomic DataStream or DataSet will be converted into a Table with only one attribute. The data type of a property can be inferred from the atomic type, and the property can also be renamed.

// get a StreamTableEnvironment, works for BatchTableEnvironment equivalently
StreamTableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section

DataStream<Long> stream = ...

// convert DataStream into Table with default field name "f0"
Table table = tableEnv.fromDataStream(stream);

// convert DataStream into Table with field name "fo"
Table table = tableEnv.fromDataStream(stream, $("f"));

2. Tuple type (Scala and Java) and Case Class type (Scala only)

Flink supports Scala's built-in tuple type and provides Java with its own tuple type. Both DataStream and DataSet of tuples can be converted into tables. Fields can be renamed (based on positional mapping) by providing all field names. If no field name is specified, the default field name will be used. If the original field names are referenced (f0, f1, ... for Flink tuples, _1, _2, ... for Scala tuples), the API assumes that the mapping is name-based rather than position-based. Name-based mappings can reorder fields and projections via as.

// get a StreamTableEnvironment, works for BatchTableEnvironment equivalently
// env
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
		StreamTableEnvironment tenv = StreamTableEnvironment.create(env, settings);

		// source
		DataStream<Tuple2<Long, String>> stream = env.fromCollection(
				Arrays.asList(new Tuple2(1L, "alan"), new Tuple2(2L, "alanchan"), new Tuple2(3L, "alanchanchn"), new Tuple2(4L, "alanalan_chn"), new Tuple2(5L, "alan_chan_chn")));

		// convert DataStream into Table with default field names "f0", "f1"
		Table t_name_row = tenv.fromDataStream(stream);
		DataStream<Tuple2<Boolean, Row>> result = tenv.toRetractStream(t_name_row, Row.class);
		result.print();

		// convert DataStream into Table with renamed field names "myLong", "myString" (position-based)
		Table table_f0 = tenv.fromDataStream(stream, $("id"), $("name"));
		DataStream<Tuple2<Boolean, Row>> result2 = tenv.toRetractStream(table_f0, Row.class);
		result2.print();

		// convert DataStream into Table with reordered fields "f1", "f0" (name-based)
		Table table_f = tenv.fromDataStream(stream, $("f1"), $("f0"));
		DataStream<Tuple2<Boolean, Row>> result4 = tenv.toRetractStream(table_f, Row.class);
		result4.print();

		// convert DataStream into Table with reordered and aliased fields "myString", "myLong" (name-based)
		Table table_f_as= tenv.fromDataStream(stream, $("f0").as("id"), $("f1").as("name"));
		DataStream<Tuple2<Boolean, Row>> result3 = tenv.toRetractStream(table_f_as, Row.class);
		result3.print();

		// execute
		env.execute();

3. POJO type (Java and Scala)

Flink supports POJO types as composite types. Refer to the link for determining the rule record of the POJO type: https://nightlies.apache.org/flink/flink-docs-release-1.12/zh/dev/types_serialization.html#pojos

When converting a POJO-type DataStream or DataSet to a Table without specifying a field name, the name of the original POJO-type field will be used. Name mapping requires the original name and cannot be done by position. Fields can be renamed, reordered and projected using aliases (with the as keyword).

// get a StreamTableEnvironment, works for BatchTableEnvironment equivalently
// env
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
		StreamTableEnvironment tenv = StreamTableEnvironment.create(env, settings);

		// source
		DataStream<Student> stream = env.fromCollection(Arrays.asList(
			new Student(1L, "alan", 10, 20, 30), 
			new Student(2L, "alanchan", 60, 70, 80),
			new Student(3L, "alanchanchn", 70, 80, 90), 
			new Student(4L, "alanchn", 100, 100, 100)));

		// convert DataStream into Table with default field names "age", "name" (fields are ordered by name!)
		Table table1 = tenv.fromDataStream(stream);
		DataStream<Tuple2<Boolean, Student>> result = tenv.toRetractStream(table1, Student.class);
		result.print();

		// convert DataStream into Table with renamed fields "myAge", "myName"	(name-based)
		Table table2 = tenv.fromDataStream(stream, $("id").as("r_id"), $("name").as("r_name"), $("chinese").as("r_chinese"), $("english").as("r_english"), $("math").as("r_math"));

		DataStream<Tuple2<Boolean, Row>> result2 = tenv.toRetractStream(table2, Row.class);
		result2.print();

		// convert DataStream into Table with projected field "name" (name-based)
		Table table3 = tenv.fromDataStream(stream, $("name"));
		DataStream<Tuple2<Boolean, Row>> result3 = tenv.toRetractStream(table3, Row.class);
		result3.print();

		// convert DataStream into Table with projected and renamed field "myName"	 (name-based)
		Table table4 = tenv.fromDataStream(stream, $("name").as("NAME"),$("chinese").as("CHINESE"));
		DataStream<Tuple2<Boolean, Row>> result4 = tenv.toRetractStream(table4, Row.class);
		result4.print();

		// execute
		env.execute();

4. Row type

The Row type supports any number of fields as well as fields with null values. The field name can be specified by RowTypeInfo, and can also be specified when converting Row's DataStream or DataSet to Table.
Field mapping of the Row type supports both name-based and position-based methods. Fields can be renamed by providing the names of all fields (based on position mapping) or selected individually for projection/sorting/renaming (based on name mapping).

// env
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
		StreamTableEnvironment tenv = StreamTableEnvironment.create(env, settings);

		// source
		DataStream<Row> stream = env.fromCollection(Arrays.asList(
				 Row.of(1L, "alan", 10, 20, 30), 
				 Row.of(2L, "alanchan", 60, 70, 80),
				 Row.of(3L, "alanchanchn", 70, 80, 90), 
				 Row.of(4L, "alanchn", 100, 100, 100)));

		// Convert DataStream into Table with renamed field names "myName", "myAge" (position-based)
		Table table1 = tenv.fromDataStream(stream,$("id"),$("name"),$("chinese"),$("english"),$("math"));
		DataStream<Tuple2<Boolean, Row>> result = tenv.toRetractStream(table1, Row.class);
		result.print();

		// Convert DataStream into Table with renamed fields "myName", "myAge" (name-based)
		//Table table2 = tenv.fromDataStream(stream, $("id").as("r_id"), $("name").as("r_name"), $("chinese").as("r_chinese"), $("english").as("r_english"), $("math").as("r_math"));
		RowTypeInfo rowTypeInfo = new RowTypeInfo(new TypeInformation<?>[] {
    
     Types.LONG, Types.STRING, Types.INT, Types.INT, Types.INT },
				new String[] {
    
     "id_", "name_", "chinese_", "english_", "math_" });
		
		DataStream<Row> processedStream = stream.process(new ProcessFunction<Row, Row>() {
    
    
			@Override
			public void processElement(Row input, Context context, Collector<Row> output) {
    
    
				output.collect(input);
			}
		}).returns(rowTypeInfo);

		 Table resultTable = tenv.fromDataStream(processedStream);
		 resultTable.printSchema();
		 
		DataStream<Tuple2<Boolean, Row>> result2 = tenv.toRetractStream(resultTable,Row.class);
		result2.print();

		// convert DataStream into Table with projected field "name" (name-based)
		Table table3 = tenv.fromDataStream(stream, $("name"));
		DataStream<Tuple2<Boolean, Row>> result3 = tenv.toRetractStream(table3, Row.class);
		result3.print();

		// execute
		env.execute();

9. Query optimization

1)、Blink planner(flink-table-planner-blink)

Apache Flink uses and extends Apache Calcite to perform complex query optimization. This includes a range of rule- and cost-based optimizations such as:

  • Subquery decorrelation based on Apache Calcite
  • Projection clipping
  • Partition pruning
  • Filter push down
  • Subplans deduplicate data to avoid double counting
  • Special subquery rewriting, including two parts:
    1. Convert IN and EXISTS to left semi-joins
    2. Convert NOT IN and NOT EXISTS to left anti-join
  • Optional join reordering
    enabled via table.optimizer.join-reorder-enabled

Note: IN/EXISTS/NOT IN/NOT EXISTS are currently only supported in conjunction with subquery rewriting.

The optimizer makes informed decisions based not only on the plan, but also on the rich statistics available from data sources and the fine-grained cost of each operator (such as io, cpu, network and memory).

Advanced users can provide custom optimizations through the CalciteConfig object, which can be provided to the TableEnvironment by calling TableEnvironment#getConfig#setPlannerConfig.

2), Flink planner (flink-table-planner)

Apache Flink leverages Apache Calcite to optimize and translate queries. Optimizations currently performed include projection and filter pushdown, subquery elimination, and other types of query rewrites. The vanilla planner has not optimized the order of joins, but instead executes them in the order defined in the query (table order in the FROM clause and/or join predicate order in the WHERE clause).

By providing a CalciteConfig object, you can adjust the set of optimization rules applied at different stages. This object can be created by calling the constructor CalciteConfig.createBuilder() and provided to the TableEnvironment by calling tableEnv.getConfig.setPlannerConfig(calciteConfig).

10. Explanation table

The Table API provides a mechanism to explain the logic of computing a Table and optimize query plans. This is done through the Table.explain() method or the StatementSet.explain() method. Table.explain() returns a plan for a Table. StatementSet.explain() returns the result of a multi-sink plan. It returns a string describing three plans:

  • The Abstract Syntax Tree of relational query (the Abstract Syntax Tree), that is, the unoptimized logical query plan,
  • optimized logical query plans, and
  • Physical execution plan.

You can use the TableEnvironment.explainSql() method and TableEnvironment.executeSql() method to support executing an EXPLAIN statement to obtain logical and optimized query plans. Please refer to 29. Flink SQL DESCRIBE, EXPLAIN, USE, SHOW, LOAD, UNLOAD, SET, RESET, JAR, JOB Statements, UPDATE, DELETE .

The following code shows an example and the corresponding output using the Table.explain() method for a given Table:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
		StreamTableEnvironment tenv = StreamTableEnvironment.create(env, settings);

		// source
		DataStream<Tuple2<Long, String>> stream1 = env.fromCollection(
				Arrays.asList(
						new Tuple2(1L, "alan"), 
						new Tuple2(2L, "alanchan"), 
						new Tuple2(3L, "alanchanchn")));

		DataStream<Tuple2<Long, String>> stream2 = env.fromCollection(
				Arrays.asList(
						new Tuple2(4L, "alanalan_chn"), 
						new Tuple2(5L, "alan_chan_chn")));

		// explain Table API
		Table table1 = tenv.fromDataStream(stream1, $("id"), $("name"));
		Table table2 = tenv.fromDataStream(stream2, $("id"), $("name"));
		
		Table table = table1
				.where($("name").like("alan%"))
				.unionAll(table2);
		System.out.println(table.explain());
		
		DataStream<Row> result2 = tenv.toChangelogStream(table);
		result2.print();
		
		// execute
		env.execute();

The result of the above example is:

== Abstract Syntax Tree ==
LogicalUnion(all=[true])
:- LogicalFilter(condition=[LIKE($1, _UTF-16LE'alan%')])
:  +- LogicalTableScan(table=[[Unregistered_DataStream_1]])
+- LogicalTableScan(table=[[Unregistered_DataStream_2]])

== Optimized Physical Plan ==
Union(all=[true], union=[id, name])
:- Calc(select=[id, name], where=[LIKE(name, _UTF-16LE'alan%')])
:  +- DataStreamScan(table=[[Unregistered_DataStream_1]], fields=[id, name])
+- DataStreamScan(table=[[Unregistered_DataStream_2]], fields=[id, name])

== Optimized Execution Plan ==
Union(all=[true], union=[id, name])
:- Calc(select=[id, name], where=[LIKE(name, _UTF-16LE'alan%')])
:  +- DataStreamScan(table=[[Unregistered_DataStream_1]], fields=[id, name])
+- DataStreamScan(table=[[Unregistered_DataStream_2]], fields=[id, name])

5> +I[3, alanchanchn]
3> +I[1, alan]
1> +I[4, alanalan_chn]
2> +I[5, alan_chan_chn]
4> +I[2, alanchan]

The following code shows an example and the corresponding output for a multi-sink plan using StatementSet.explain():

EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
TableEnvironment tEnv = TableEnvironment.create(settings);

final Schema schema = new Schema()
    .field("count", DataTypes.INT())
    .field("word", DataTypes.STRING());

tEnv.connect(new FileSystem().path("/source/path1"))
    .withFormat(new Csv().deriveSchema())
    .withSchema(schema)
    .createTemporaryTable("MySource1");
tEnv.connect(new FileSystem().path("/source/path2"))
    .withFormat(new Csv().deriveSchema())
    .withSchema(schema)
    .createTemporaryTable("MySource2");
tEnv.connect(new FileSystem().path("/sink/path1"))
    .withFormat(new Csv().deriveSchema())
    .withSchema(schema)
    .createTemporaryTable("MySink1");
tEnv.connect(new FileSystem().path("/sink/path2"))
    .withFormat(new Csv().deriveSchema())
    .withSchema(schema)
    .createTemporaryTable("MySink2");

StatementSet stmtSet = tEnv.createStatementSet();

Table table1 = tEnv.from("MySource1").where($("word").like("F%"));
stmtSet.addInsert("MySink1", table1);

Table table2 = table1.unionAll(tEnv.from("MySource2"));
stmtSet.addInsert("MySink2", table2);

String explanation = stmtSet.explain();
System.out.println(explanation);

The result of the multi-sink plan is:

== Abstract Syntax Tree ==
LogicalLegacySink(name=[MySink1], fields=[count, word])
+- LogicalFilter(condition=[LIKE($1, _UTF-16LE'F%')])
   +- LogicalTableScan(table=[[default_catalog, default_database, MySource1, source: [CsvTableSource(read fields: count, word)]]])

LogicalLegacySink(name=[MySink2], fields=[count, word])
+- LogicalUnion(all=[true])
   :- LogicalFilter(condition=[LIKE($1, _UTF-16LE'F%')])
   :  +- LogicalTableScan(table=[[default_catalog, default_database, MySource1, source: [CsvTableSource(read fields: count, word)]]])
   +- LogicalTableScan(table=[[default_catalog, default_database, MySource2, source: [CsvTableSource(read fields: count, word)]]])

== Optimized Logical Plan ==
Calc(select=[count, word], where=[LIKE(word, _UTF-16LE'F%')], reuse_id=[1])
+- TableSourceScan(table=[[default_catalog, default_database, MySource1, source: [CsvTableSource(read fields: count, word)]]], fields=[count, word])

LegacySink(name=[MySink1], fields=[count, word])
+- Reused(reference_id=[1])

LegacySink(name=[MySink2], fields=[count, word])
+- Union(all=[true], union=[count, word])
   :- Reused(reference_id=[1])
   +- TableSourceScan(table=[[default_catalog, default_database, MySource2, source: [CsvTableSource(read fields: count, word)]]], fields=[count, word])

== Physical Execution Plan ==
Stage 1 : Data Source
	content : collect elements with CollectionInputFormat

	Stage 2 : Operator
		content : CsvTableSource(read fields: count, word)
		ship_strategy : REBALANCE

		Stage 3 : Operator
			content : SourceConversion(table:Buffer(default_catalog, default_database, MySource1, source: [CsvTableSource(read fields: count, word)]), fields:(count, word))
			ship_strategy : FORWARD

			Stage 4 : Operator
				content : Calc(where: (word LIKE _UTF-16LE'F%'), select: (count, word))
				ship_strategy : FORWARD

				Stage 5 : Operator
					content : SinkConversionToRow
					ship_strategy : FORWARD

					Stage 6 : Operator
						content : Map
						ship_strategy : FORWARD

Stage 8 : Data Source
	content : collect elements with CollectionInputFormat

	Stage 9 : Operator
		content : CsvTableSource(read fields: count, word)
		ship_strategy : REBALANCE

		Stage 10 : Operator
			content : SourceConversion(table:Buffer(default_catalog, default_database, MySource2, source: [CsvTableSource(read fields: count, word)]), fields:(count, word))
			ship_strategy : FORWARD

			Stage 12 : Operator
				content : SinkConversionToRow
				ship_strategy : FORWARD

				Stage 13 : Operator
					content : Map
					ship_strategy : FORWARD

					Stage 7 : Data Sink
						content : Sink: CsvTableSink(count, word)
						ship_strategy : FORWARD

						Stage 14 : Data Sink
							content : Sink: CsvTableSink(count, word)
							ship_strategy : FORWARD

3. Example 1: Convert DataStream data to Table and then use sql to query

1. maven dependency

<properties>
    <encoding>UTF-8</encoding>
	<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	<maven.compiler.source>1.8</maven.compiler.source>
	<maven.compiler.target>1.8</maven.compiler.target>
	<java.version>1.8</java.version>
	<scala.version>2.12</scala.version>
	<flink.version>1.12.0</flink.version>
</properties>
	
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-scala-bridge_2.12</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-java-bridge_2.12</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<!-- flink执行计划,这是1.9版本之前的-->
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner_2.12</artifactId>
    <version>${flink.version}</version>
</dependency>
<!-- blink执行计划,1.11+默认的-->
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner-blink_2.12</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-common</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>

2. Realize

  • java bean
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

/**
 * @author alanchan
 *
 */
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Student {
    
    
	private Long id;
	private String name;
	private double chinese;
	private double english;
	private double math;
}
  • accomplish
import static org.apache.flink.table.api.Expressions.$;

import java.util.Arrays;

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

/**
 * @author alanchan
 *
 */
public class DataStream2Table {
    
    

	/**
	 * @param args
	 * @throws Exception
	 */
	public static void main(String[] args) throws Exception {
    
    
		// env
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
		StreamTableEnvironment tenv = StreamTableEnvironment.create(env, settings);

		// source
		DataStream<Student> studentDS = env.fromCollection(Arrays.asList(
		new Student(1L, "alan", 10, 20, 30), 
		new Student(2L, "alanchan", 60, 70, 80),
		new Student(3L, "alanchanchn", 70, 80, 90), 
		new Student(4L, "alanchn", 100, 100, 100)));

		// transformation
		// 将DataStream数据转Table,然后查询
		Table tableStudent = tenv.fromDataStream(studentDS, $("id"), $("name"), $("chinese"), $("english"), $("math"));

		String sql = "select *  from " + tableStudent + " where english > 20";

		Table resultTable = tenv.sqlQuery(sql);
		DataStream<Tuple2<Boolean, Student>> result = tenv.toRetractStream(resultTable, Student.class);

		// sink
		result.print();

		// execute
		env.execute();
	}

}

3. Verification results

15> (true,Student(id=2, name=alanchan, chinese=60.0, english=70.0, math=80.0))
1> (true,Student(id=4, name=alanchn, chinese=100.0, english=100.0, math=100.0))
16> (true,Student(id=3, name=alanchanchn, chinese=70.0, english=80.0, math=90.0))

Above, the concepts and usage examples of Table Api and Sql are introduced. Next, we will continue to introduce the concepts of this part, and finally give comprehensive usage examples.

Guess you like

Origin blog.csdn.net/chenwewi520feng/article/details/131941092