Series Article Directory
Practice Data Lake iceberg Lesson 1 Getting Started
Practice Data Lake iceberg Lesson 2 Iceberg is based on hadoop’s underlying data format
Practice data lake
iceberg In sqlclient, use SQL to read data from Kafka to iceberg (upgrade the version to flink1.12.7)
practice data lake iceberg Lesson 5 hive catalog features
practice data lake iceberg Lesson 6 write from kafka to iceberg failure problem solving
practice data lake iceberg Lesson 7 Write to iceberg
practice data lake iceberg in real time Lesson 8 hive and iceberg integrate
practice data lake iceberg Lesson 9 merge small files
practice data lake iceberg Lesson 10 snapshot delete
practice data lake iceberg Lesson 11 test partition table integrity Process (creating numbers, building tables, merging, and deleting snapshots)
Practice data lake iceberg Lesson 12 What is a catalog
Practice data lake iceberg Lesson 13 Metadata is many times larger than data files
Practice data lake iceberg Lesson 14 Data merging (to solve the problem of metadata expansion over time)
practice data lake iceberg Lesson 15 spark installation and integration iceberg (jersey package conflict)
practice data lake iceberg Lesson 16 open the cognition of iceberg through spark3 Door
Practice data lake iceberg Lesson 17 Hadoop2.7, spark3 on yarn run iceberg configuration
Practice data lake iceberg Lesson 18 Multiple clients interact with iceberg Start commands (commonly used commands)
Practice data lake iceberg Lesson 19 flink count iceberg , No result problem
practice data lake iceberg Lesson 20 flink + iceberg CDC scenario (version problem, test failed)
practice data lake iceberg Lesson 21 flink1.13.5 + iceberg0.131 CDC (test successful INSERT, change operation failed)
Practice data lake iceberg Lesson 22 flink1.13.5 + iceberg0.131 CDC (CRUD test successful)
practice data lake iceberg Lesson 23 flink-sql restart
practice data lake iceberg from checkpoint Lesson 24 iceberg metadata details Analyzing
the practice data lake iceberg Lesson 25 Running flink sql in the background The effect of addition, deletion and modification
Practice data lake iceberg Lesson 26 checkpoint setting method
Practice data lake iceberg Lesson 27 Flink cdc test program failure restart: can restart from the last time checkpoint to continue working
practice data lake iceberg Lesson 28 Deploy packages that do not exist in the public warehouse to local warehouse
practice data lake iceberg Lesson 29 how to obtain flink jobId elegantly and efficiently
practice data lake iceberg lesson 30 mysql -> iceberg, different clients sometimes have zone issues
Practice data lake iceberg Lesson 31 use github's flink-streaming-platform-web tool to manage flink task flow, test cdc restart scenario
practice data lake iceberg more content directory
Article directory
foreword
Flink restarts and needs to recover from checkpoint. When it comes to task engineering management, it is time-consuming to develop a set of such tools by ourselves. For small companies, it is thankless, so we found an open source solution to test flink-streaming-platform-web, support sql, jar, various A cluster mode submission, available for personal testing
1. Use the open source component flink-streaming-platform-web to manage flink tasks
flink-streaming-platform-web Use open source components to manage flink tasks Source
code address: https://github.com/zhp8341/flink-streaming-platform-web
Relevant instructions for use: The official website is very detailed, please check the official website
2. Test flink-streaming-platform-web Recover from checkpoint when restarting
Core logic: read from mysql to iceberg
1. Code, read from mysql to iceberg
- Mysql raw data
3. Start the program
4. Check the results of the iceberg table and find that it is synchronized.
5. Test the newly added change data
Add an id=5, and update a data, as follows
INSERT INTO `stock_basic` VALUES ('5', '000007.SZ', '000007', '*ST全新', '深圳', '酒店餐饮', '19920413', null);
update stock_basic set actural_controller='me me me' where i='0';
Check iceberg and found that the change has been captured:
6. Restart to see if you can recover from checkpoint
Click Restore, and the following dialogue pops up:
Click Restore, refresh the main page of flink, and find that the task starts normally:
-
Go to the sink table to check the data and see if there is any repeated consumption:
it is found that there is no duplication.
-
Re-test, after stopping the program, write some data into it, and see if the changed data is captured after recovery from checkpoint
Step 1: Record savePoint
Step 2: Stop the program
Step 3: Insert update delete change
INSERT INTO `stock_basic` VALUES ('6', '000008.SZ', '000008', '神州高铁', '北京', '运输设备', '19920507', '国家开发投资集团有限公司');
update stock_basic set actural_controller='汉武帝' where i='1';
delete from stock_basic where i='0';
Step 4: Restoring the program
After the restoration, wait for a checkpoint, check in spark-sql, find out, and capture the change
Summarize
This flink-streaming-platform-web is really easy to use! use it for fun