The accidental "free" discovery
I have used flink for more than half a year. I usually use checkpoint and savepoint. I pay more attention to the size of the checkpoint of the task, but I don’t pay attention to the difference in the size of the two. Just to deal with the failure drills that may be encountered during the National Day, I pay attention to both The difference.
The flink UI Checkpoint monitoring chart shows the difference in size between checkpoint and savepoint:
1. The checkpoint status is relatively small
Because I turned on the incremental mode of rocksDB, the Checkpointed Data Size official website that I saw on the UI stated that it was incremental data.
Look at the checkpoint in the hdfs path:
(1) chk-x: is a metadata save for each checkpoint, the default configuration only saves one: state.checkpoints.num-retained =1.
Official website configuration: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/checkpointing.html#state-checkpoints-num-retained
(2)shared represents each checkpoint status. Find the original words on the official website: The SHARED directory is for state that is possibly part of multiple checkpoints.
2. The size of the savepoint I triggered is 20G
So why is the status of savepoint so big?
Mainly because savepoint is native, checkpoint is incremental checkpoint using RocksDB status backend, using RocksDB internal format, and savepoint is native. So the difference will be so big.
Official website link: https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different -from-a-checkpoint