Flink1.8.0 major update automatically clear the State of -Flink Detailed

Copyright: Reprinted need to indicate the source and the two-dimensional code is placed at the end of the article. https://blog.csdn.net/u013411339/article/details/90625604

REVIEW:

And access to application state is controlled based on the size of the state of time there is a state of the stream processing in the field of common problems and challenges.


Flink's 1.8.0 version by adding support for the outdated state of the object of continuous background cleanup, significantly improves the state of the TTL function. The new mechanism can reduce manual cleanup execution state to clean up the trouble.


State TTL allows you to control the size of the state of the application, so that developers can focus more on core application logic.

When we develop Flink applications, many states require a common streaming applications that access time automatic cleaning application state to effectively manage the size of the state, or control application state. TTL (Time To Live) function started in Flink 1.6.0 and enable the application state to clean and efficient state management in Apache Flink in size.

In this article, we will discuss the state (State) of TTL and given use case. In addition, we will show how to use and configure the state of TTL.

Temporary state

State only last for two main reasons for a limited period of time. For example, suppose a Flink application for each user to extract user logon events and store each user's last login time to achieve free landing next to improve the user experience.

The size of the control state

Control the size of the state, to efficiently manage the growing scale of State, the main scene TTL applications. In general, the data temporarily retained, such as the user is in session once access. When the end user access to the event, we do not need to save the state of the user, but the user's State still take up storage space. Flink1.8.0 introduced based on TTL expired for clean-up state, so that we can be clear on these invalid data. Prior to this, developers must take additional action to delete the unwanted state to free up storage space. This manual clean-up program is not only error-prone and inefficient. In our case the user logs on, we no longer need to manually clean up.

Based on the need for confidentiality of the data

Suppose we have the timeliness of the data requirements, for example, users are not allowed to access certain period of time. We all can be achieved by TTL functions.

Continued to clean up application state (Continuous Cleanup)

1.6.0 version of Apache Flink introduces State TTL functions. Cleaning up after it makes the stream processing application developers to configure the expiration time, and overtime (Time to Live) in the definition of time. In Flink 1.8.0, this function has been expanded, including the rear end of the stack and RocksDB (FSStateBackend and MemoryStateBackend) historical data for continuous cleaning, the cleaning process in order to achieve a continuous old entry (according to the TTL setting).

Flink in the DataStream API, the application state from the state symbol (State Descriptor) defined below. StateTtlConfiguration configured state by a state descriptor object to TTL. The following example demonstrates how to create a Java state TTL configuration and supplies it to state descriptor, the descriptor will save the state above cases users Last a Long value:

import org.apache.flink.api.common.state.StateTtlConfig;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.api.common.state.ValueStateDescriptor;

StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.days(7))
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.build();

ValueStateDescriptor<Long> lastUserLogin =
new ValueStateDescriptor<>("lastUserLogin", Long.class);

lastUserLogin.enableTimeToLive(ttlConfig);

Flink provides several options to configure the behavior TTL functions.

Reset When?

By default, when the state data changes will update the data of TTL time. We can also update it when read access data, cost of doing so is that there will be additional write operations to update the timestamp.

Whether the data can be accessed has expired?

State TTL inert tactics to clean up the expired state. This may result in our application will try to read expired but have not been removed in the state data. We can observe such a read request returned an expired state. It clears the outdated state immediately after either case, the data is accessed.

What time is used to define the semantics of TTL?

Use Flink 1.8.0, the user only according to the processing time TTL (Processing Time) defined state. Future versions of Apache Flink plans to support the event time (Event Time).

Flink内部,状态TTL功能是通过存储上次相关状态访问的附加时间戳以及实际状态值来实现的。虽然这种方法增加了一些存储开销,但它允许Flink程序在查询数据、checkpointing,数据恢复的时候访问数据的过期状态。

如何避免取出'垃圾数据'

在读取操作中访问状态对象时,Flink将检查其时间戳并清除状态是否已过期(取决于配置的状态可见性,是否返回过期状态)。由于这种延迟删除的特性,永远不会再次访问的过期状态数据将永远占用存储空间,除非被垃圾回收。

那么如何在没有应用程序逻辑明确的处理它的情况下删除过期的状态呢?通常,我们可以配置不同的策略进行后台删除。

完整快照自动删除过期状态

当获取检查点或保存点的完整快照时,Flink 1.6.0已经支持自动删除过期状态。大家注意,过期状态删除不适用于增量检查点。必须明确启用完全快照的状态删除,如以下示例所示:

StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.days(7))
.cleanupFullSnapshot()
.build();

上述代码会导致本地状态存储大小保持不变,但Flink任务的完整快照的大小减小。只有当用户从快照重新加载其状态到本地时,才会清除用户的本地状态。

由于上述这些限制,FLink应用程序仍需要在Flink 1.6.0中过期后主动删除状态。为了改善用户体验,Flink1.8.0引入了两种自主清理策略,分别针对Flink的两种状态后端类型。

堆状态后端的增量清理

此方法特定于堆状态后端(FSStateBackend和MemoryStateBackend)。它的实现方法是存储后端在所有状态条目上维护一个惰性全局迭代器。某些事件(例如状态访问)会触发增量清理。每次触发增量清理时,迭代器都会向前迭代删除已遍历的过期数据。以下代码示例演示如何启用增量清理:

StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.days(7))
// check 10 keys for every state access
.cleanupIncrementally(10, false)
.build();

如果启用,则每次进行状态访问都会触发清理步骤。对于每个清理步骤,都会检查一定数量的数据是否过期。

有两个参数:第一个参数是检查每个清理步骤的状态条目数。第二个参数是一个标志,用于数据处理后触发清理步骤,此外对于每次状态访问同样有效。

About this method there are two points to note: the first is the time it takes to clean up the incremental increase in the data processing delays. The second should be negligible, but still worth mentioning: if there is no state or no access to the data processing records, it will not delete expired state.

RocksDB background compression can filter out expired state

If your application uses RocksDB Flink state as back-end storage, you can enable another clean-up strategy based Flink particular compression filter. RocksDB run periodically to incorporate asynchronous status updates and compression to reduce storage. Flink expiration timestamp compression filter using TTL check the status of the entry, and discards all the expiration value.

The first step is to activate this feature to configure RocksDB state Flink backend by setting the following configuration options:

state.backend.rocksdb.ttl.compaction.filter.enabled

After you configure the back-end RocksDB state, will enable the state to clean up compression strategies, as in the following example:

StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.days(7))
.cleanupInRocksdbCompactFilter()
.build();
Use a timer to delete (Timers)

Another way to manually clear the state is based on Flink timer. This community is currently evaluating future versions of the idea. In this way, access the registration cleanup timer for each state. This method is more predictable because the state once expired will be deleted. However, this approach is costly, since the timer consumption of storage resources, and frequently read status information.

Future Prospects

In addition to foreign-based clean-up timer, Flink community also plans to further improve the state of the TTL functions mentioned above. Possible points of improvement include adding support for TTL time event (Event Time) (currently only supports Processing Time).

640?wx_fmt=png

640?wx_fmt=jpeg

Guess you like

Origin blog.csdn.net/u013411339/article/details/90625604