Flink Principle (f) - Asynchronous I / O (asynchronous I / O)

1 Introduction

  This article is based on Flink official website Asynchronous I / O is introduced in conjunction with their understanding of written, if an incorrect message exchanges welcome everyone, thank you !

2, Asynchronous I / O Profile

  When used in the flow calculations Flink, and the external systems when it comes to interact, such as reading data from the database using Flink, this need to obtain I / O scenario, we need to consider the delay caused by the interaction problem.

  To analyze how to reduce the delay, we will first analyze, Flink form of a method to synchronize external system (and to interact with the database in MapFunction example) process, if the dotted line shown in FIG left side, a request is sent to the Database, after waiting for the next reply MapFunction b sends a request, during, I / O is idle, request b and repeat this procedure, so that the two back and forth in time (transmission request - receives results back and forth), only two processing requests. As shown in dashed lines on the right side in FIG. 1, two back and forth in the same time, interact in the form of an asynchronous, a request is sent out after, while waiting for a reply, the request b, c, d are sequentially emitted, which can handle both 4 requested.

 A synchronous / access to the database of FIG comparison chart ( Ref [1] )

  In some scenarios, in order to improve system throughput can be increased only by the degree of concurrency MapFunction to achieve the purpose, but the attendant is consume a lot of resources.

  [ Important ]

  1) To achieve the asynchronous I / O access to the database or K / V storage, database client needed to support the requested asynchronous; if not, the call synchronized manner may be processed by creating a plurality of client synchronization using a thread pool and similar complicated by the client, but there is no way that asynchronous I / O performance is good.

  2) AsyncFunction not calls in a multithreaded manner, a transmission request AsyncFunction example sequentially for each individual message;

  3) current (Flink 1.9), when used AsyncWaitOperator to interrupt operator chain (default is not in use), the reasons givenFLINK-13063

3, the order of results

  Since the request response speed may not be the same, AsyncFunction the "concurrent" requests out of order results may result. As shown in a broken line on the right side in FIG. 1, after the request if b is issued, the result of a prior request return message sequence before and after such asynchronous I / O on the operator of the inconsistency. In order to control the order of the results returned, Flink offers two modes:

  . 1) . Unordered : when the asynchronous request completes, it returns the results immediately, i.e. without considering the results of the shuffle order. When the processing time to a time attribute, the pattern can be obtained with minimal delay and minimal overhead, use: AsyncDataStream.unorderedWait (...);

  2) the Ordered : In this mode, the message in the sequential order of the asynchronous I / O consistent operator, returns to the first request, i.e. an ordered pattern. To achieve an orderly mode, the operator returns the request result into the cache, until the result of the previous request to return all or a timeout. This mode is usually the case next time introduce additional delay as well as in the checkpoint process will bring the cost because, compared and disorderly mode, messages and requests to return the results will last longer in a state of checkpoint. Use: AsyncDataStream.orderedWait (...);

  Here, we need to supplement for the case of flow tasks and event time combination. why? Because of the relative position of watermark and the overall message will not change, what it means? Message occurs after a watermark can only be issued after the watermark is issued, it is also a result of the request. In other words, the entire watermark message between two ordered watermark. Of course, whether these come ordered according to analyze patterns used in this interval between messages.

  1) Ordered mode is ordered as the message itself, the message is between the watermark and orderly, and processing time compared to the need to introduce additional overhead;

  2) Unordered mode, which is the first mode in response to return, but in the case where the binding event time, a message or the results are subject to a specific watermark issued after issuing this case, a delay will be introduced and overhead, which overhead the size depends on the frequency watermark, the principle reason for joining a section below.

4, principle

   4.1 Terms

  A more detailed description of the implementation process asynchronous I / O, first explain a few term, which will also involve its basic usage, if the analysis principle to look at its meaning.

  1) AsyncFunction: asynchronous I / O interface to trigger

    In AsyncWaitOperator AsyncFunction user as a function, similar flatMap, there is open () / processElement (StreamRecord < in> record) / processWatermark (Watermark mark) method.
 For AsyncFunction user's own implementation, it must override asyncInvoke (IN input, AsyncCollector collector) to provide the code calls the asynchronous operation.

  2) AsyncWaitOperator: AsyncFunction call flow operator, is an abstract concept, particular operator is unorderedWait (...) or orderedWait (...)

  3)AsyncCollector:

    AsyncCollector created by AsyncWaitOperator, and passed to AsyncFunction, where it should be added to the user's callback function. It acts as a role get results or errors from the user code, and notify AsyncCollectorBuffer issue results.

  4) AsyncCollectorBuffer: AsyncCollectorBuffer save all AsyncCollector, and sends the result to the next node.

  The concept described above is a schematic view of the work can be found in Ref [2] .

  4.2  Chart

  In the flow calculations, to the asynchronous I / O, the overall process is as follows:

 

FIG 2 asynchronous I / O architecture diagram ( Ref [2] )

  After 1) AsyncWaitOperator message reaches the normal process is as follows:

  AsyncWaitOperator call AsyncFunction, and passed AsyncCollector to create AsyncFunction. After the waiting time acquired AsyncCollector return results (abnormal) into the storage AsyncCollectorBuffer, will mark a message into AsyncCollectorBuffer then a signal will be sent to the message Emitter thread, if the case is to send out signal, the and a notification message will be sent to the task thread collector buffer plus the message. As for how development should be based mode code is set in an orderly or disorderly, if the order is made head, deleted head. The process is further detailed in the following figure:

 

FIG 3 asynchronous I / O message FIG normal processing ( Ref [2] )

  2) checkpoint process

  AsyncWaitOperator First of all input data stream in AsyncCollectorBuffer scan, the state after the completion of deleting the old data, then the data stored in AsyncCollectorBuffer into state, rather than processing a single input stream is stored in a connected state The specific process is shown in Figure 2 or Figure 4.

  3) Recovery

  在恢复AsyncWaitOperator的状态时,AsyncWaitOperator将scan状态中的所有元素,获取AsyncCollectors,调用AsyncFunction.asyncInvoke()并将它们插入AsyncCollectorBuffer中,具体的如下:

 

图4 故障恢复和checkpoint流程图(Ref[2])

 总结:

  关于具体使用的方法见后期的博客,建议大伙看看原文,一千个读者就有一千个哈姆雷特!

Ref

[1]https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/asyncio.html

[2]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673

[3]https://blog.icocoro.me/2019/05/26/1905-apache-flinkv2-asyncio/

Guess you like

Origin www.cnblogs.com/love-yh/p/11681435.html