Join realize how Flink SQL data stream?

Whether in the field of OLAP or OLTP, Join business are often related to the optimization rules and complex SQL statements. For offline computing, the database field after many years of accumulation, Join semantics and implementation has been very mature, but in recent years had just begun for the Streaming SQL Join for it in the fledgling state.

The most critical issue is to Join implementation relies on caching the entire data set, and Streaming SQL Join objects is unlimited data streams, memory pressure and efficiency in the long run, it is inevitable problems. SQL below in connection with the development of analytical Flink SQL how to solve these problems and achieve Join two data streams.

Offline Batch SQL Join realization

Traditional offline Batch SQL (for bounded dataset SQL) has three basic implementations, they are Nested-loop Join, Sort-Merge Join and Hash Join.

  • Nested-loop Join the most simple and direct, two sets of data will be loaded into memory, and a built-in way to traverse the elements one by one in the comparison of two data sets whether the Join condition. Nested-loop Join although time efficiency and space efficiency is the lowest, but wins in more flexible for a wide range, so its variants BNL traditional databases often used as a basis Join the default option.
  • Sort-Merge Join As the name suggests, it is divided into two phases Sort and Merge. First, two data sets are sorted, and then traverse the two ordered sets of data and respectively matching, similar merging merge sort. Notably, Sort-Merge applies only Equi-Join (Join equal conditions are used as the comparison operator). Sort-Merge Join requires two data sets are sorted, the cost is high, generally as the optimization in the case where the present input data set is ordered.
  • Hash Join similarly divided into two stages, first convert a data set for Hash Table, and then traversing another set of data elements and matched elements within the Hash Table. A first phase and a second phase data set referred to build and build table, and the second stage are referred to as second data set and probe stage probe table. Hash Join high efficiency but requires a larger space, which is usually a table as Join optimization in the case of small to fit into the memory table. And Sort-Merge Join Similarly, Hash Join applicable only to the Equi-Join.

Real-Time Streaming SQL Join

With respect to the Join off-line, real-time Streaming SQL (for unbounded data set SQL) can not cache any data, so Sort-Merge Join required data sets sorting substantially can not be done, and Nested-loop Join and Hash Join through certain improvements to meet the requirements of real-time SQL.
We look at a basic example of Nested Join in real-time Streaming basis implementations of SQL (Fig case and from Piotr Nowojski share Flink Forward San Francisco's [2]).

img1.join-in-continuous-query-1.png

图1. Join-in-continuous-query-1

There are two elements 1,42 Table A, Table B has an element 42, so in this case the result outputs 42 Join.

img2.join-in-continuous-query-2.png

图2. Join-in-continuous-query-2

Subsequently Table B sequentially received three new elements, respectively, 7,3,1. Because the matching element to Table A, thus a re-output element results in Table 1.

img3.join-in-continuous-query-3.png

图3. Join-in-continuous-query-3

Table A subsequent emergence of new input 2,3,6,3 element matched to Table B, and thus to re-output the results of Table 3.

可以看到在 Nested-Loop Join 中我们需要保存两个输入表的内容,而随着时间的增长 Table A 和 Table B 需要保存的历史数据无止境地增长,导致很不合理的内存磁盘资源占用,而且单个元素的匹配效率也会越来越低。类似的问题也存在于 Hash Join 中。

那么有没有可能设置一个缓存剔除策略,将不必要的历史数据及时清理呢?答案是肯定的,关键在于缓存剔除策略如何实现,这也是 Flink SQL 提供的三种 Join 的主要区别。

Flink SQL 的 Join

  • Regular Join

Regular Join 是最为基础的没有缓存剔除策略的 Join。Regular Join 中两个表的输入和更新都会对全局可见,影响之后所有的 Join 结果。举例,在一个如下的 Join 查询里,Orders 表的新纪录会和 Product 表所有历史纪录以及未来的纪录进行匹配。

SELECT * FROM Orders
INNER JOIN Product
ON Orders.productId = Product.id

因为历史数据不会被清理,所以 Regular Join 允许对输入表进行任意种类的更新操作(insert、update、delete)。然而因为资源问题 Regular Join 通常是不可持续的,一般只用做有界数据流的 Join。

  • Time-Windowed Join

Time-Windowed Join 利用窗口给两个输入表设定一个 Join 的时间界限,超出时间范围的数据则对 JOIN 不可见并可以被清理掉。值得注意的是,这里涉及到的一个问题是时间的语义,时间可以指计算发生的系统时间(即 Processing Time),也可以指从数据本身的时间字段提取的 Event Time。如果是 Processing Time,Flink 根据系统时间自动划分 Join 的时间窗口并定时清理数据;如果是 Event Time,Flink 分配 Event Time 窗口并依据 Watermark 来清理数据。

以更常用的 Event Time Windowed Join 为例,一个将 Orders 订单表和 Shipments 运输单表依据订单时间和运输时间 Join 的查询如下:

SELECT *
FROM 
  Orders o, 
  Shipments s
WHERE 
  o.id = s.orderId AND
  s.shiptime BETWEEN o.ordertime AND o.ordertime + INTERVAL '4' HOUR

这个查询会为 Orders 表设置了 o.ordertime > s.shiptime- INTERVAL ‘4’ HOUR 的时间下界(图4)。

img4.time-window-orders-lower-bound.png

图4. Time-Windowed Join 的时间下界 - Orders 表

并为 Shipmenets 表设置了 s.shiptime >= o.ordertime 的时间下界(图5)。

img5.time-window-shipment-lower-bound.png

图5. Time-Windowed Join 的时间下界 - Shipment 表

因此两个输入表都只需要缓存在时间下界以上的数据,将空间占用维持在合理的范围。

不过虽然底层实现上没有问题,但如何通过 SQL 语法定义时间仍是难点。尽管在实时计算领域 Event Time、Processing Time、Watermark 这些概念已经成为业界共识,但在 SQL 领域对时间数据类型的支持仍比较弱[4]。因此,定义 Watermark 和时间语义都需要通过编程 API 的方式完成,比如从 DataStream 转换至 Table ,不能单纯靠 SQL 完成。这方面的支持 Flink 社区计划通过拓展 SQL 方言来完成,感兴趣的读者可以通过 FLIP-66[7] 来追踪进度。

  • Temporal Table Join

虽然 Timed-Windowed Join 解决了资源问题,但也限制了使用场景: Join 两个输入流都必须有时间下界,超过之后则不可访问。这对于很多 Join 维表的业务来说是不适用的,因为很多情况下维表并没有时间界限。针对这个问题,Flink 提供了 Temporal Table Join 来满足用户需求。

Temporal Table Join 类似于 Hash Join,将输入分为 Build Table 和 Probe Table。前者一般是纬度表的 changelog,后者一般是业务数据流,典型情况下后者的数据量应该远大于前者。在 Temporal Table Join 中,Build Table 是一个基于 append-only 数据流的带时间版本的视图,所以又称为 Temporal Table。Temporal Table 要求定义一个主键和用于版本化的字段(通常就是 Event Time 时间字段),以反映记录在不同时间的内容。

比如典型的一个例子是对商业订单金额进行汇率转换。假设有一个 Orders 流记录订单金额,需要和 RatesHistory 汇率流进行 Join。RatesHistory 代表不同货币转为日元的汇率,每当汇率有变化时就会有一条更新记录。两个表在某一时间节点内容如下:

img6.temporal-table-join-example.png

图6. Temporal Table Join Example]

我们将 RatesHistory 注册为一个名为 Rates 的 Temporal Table,设定主键为 currency,版本字段为 time。

img7.temporal-table-registration.png

图7. Temporal Table Registration]

此后给 Rates 指定时间版本,Rates 则会基于 RatesHistory 来计算符合时间版本的汇率转换内容。

img8.temporal-table-content.png

图8. Temporal Table Content]

在 Rates 的帮助下,我们可以将业务逻辑用以下的查询来表达:

SELECT 
  o.amount * r.rate
FROM
  Orders o,
  LATERAL Table(Rates(o.time)) r
WHERE
  o.currency = r.currency

值得注意的是,不同于在 Regular Join 和 Time-Windowed Join 中两个表是平等的,任意一个表的新记录都可以与另一表的历史记录进行匹配,在 Temporal Table Join 中,Temoparal Table 的更新对另一表在该时间节点以前的记录是不可见的。这意味着我们只需要保存 Build Side 的记录直到 Watermark 超过记录的版本字段。因为 Probe Side 的输入理论上不会再有早于 Watermark 的记录,这些版本的数据可以安全地被清理掉。

总结

实时领域 Streaming SQL 中的 Join 与离线 Batch SQL 中的 Join 最大不同点在于无法缓存完整数据集,而是要给缓存设定基于时间的清理条件以限制 Join 涉及的数据范围。根据清理策略的不同,Flink SQL 分别提供了 Regular Join、Time-Windowed Join 和 Temporal Table Join 来应对不同业务场景。

另外,尽管在实时计算领域 Join 可以灵活地用底层编程 API 来实现,但在 Streaming SQL 中 Join 的发展仍处于比较初级的阶段,其中关键点在于如何将时间属性合适地融入 SQL 中,这点 ISO SQL 委员会制定的 SQL 标准并没有给出完整的答案。或者从另外一个角度来讲,作为 Streaming SQL 最早的开拓者之一,Flink 社区很适合探索出一套合理的 SQL 语法反过来贡献给 ISO。

参考

Author:
Forest of platinum, Netease game senior development engineer, responsible for the development and operation and maintenance work in real-time game data center platform, currently focused on the development and application of Apache Flink. Explore the problem has always been a pleasure.

Description link

Guess you like

Origin yq.aliyun.com/articles/739792