Flink 1.9 restart strategy and Recovery Strategies

When Task failure, Flink need to restart error and other Task Task affected, so that the job recovery to normal execution state.

Flink is controlled Task restart recovery by restarting the tactics and strategy failure: restart policy determines whether you can reboot and restart intervals; recovery policy decisions which need to restart Task.

Figure restart full recovery policy

In full view of the restart recovery policy, it will restart all Task job to fail back when Task failure.

Recovery strategy based on local restart failed Region of

All Task division of the policy will be job number one Region. When Task fails, it will try to find fault recovery need to restart the minimum set of Region.
Compared to restart global recovery policy, the failure of this strategy in some scenarios of recovery will be less need to restart the Task.

Task Region here refers to a set of data exchange with Pipelined form. In other words, Batch forms of data exchange would constitute a Region border.
- all data and streaming DataStream Table / SQL operations are Pipelined exchange form.
- All data batch-type Table / SQL Batch jobs are the default form of exchange.
- form of data exchange based on job DataSet [ExecutionConfig] ({{site.baseurl} } / zh / dev / execution_configuration.html)
configured [ExecutionMode] ({{site.javadocs_baseurl} } / api / java / org /apache/flink/api/common/ExecutionMode.html)
decision.

Region need to restart the judgment logic is as follows:
1. Region where the Error Task to restart.
2. If the data to restart the Region need to consume some inaccessible (lost or damaged), the portion of the output data Region also needs to restart.
3. The need to restart the downstream Region Region also needs to restart. This is due to consider the data protection consistency, because some non-deterministic calculation or distribution will lead to the same
data that is included when generating Result Partition is not the same every time.

Guess you like

Origin www.cnblogs.com/mrpei/p/flinkfailover.html
Recommended