Practice data lake iceberg Lesson 31 uses github's flink-streaming-platform-web tool to manage flink task flow and test cdc restart scenarios

Series Article Directory

Practice Data Lake iceberg Lesson 1 Getting Started
Practice Data Lake iceberg Lesson 2 Iceberg is based on hadoop’s underlying data format
Practice data lake
iceberg In sqlclient, use SQL to read data from Kafka to iceberg (upgrade the version to flink1.12.7)
practice data lake iceberg Lesson 5 hive catalog features
practice data lake iceberg Lesson 6 write from kafka to iceberg failure problem solving
practice data lake iceberg Lesson 7 Write to iceberg
practice data lake iceberg in real time Lesson 8 hive and iceberg integrate
practice data lake iceberg Lesson 9 merge small files
practice data lake iceberg Lesson 10 snapshot delete
practice data lake iceberg Lesson 11 test partition table integrity Process (creating numbers, building tables, merging, and deleting snapshots)
Practice data lake iceberg Lesson 12 What is a catalog
Practice data lake iceberg Lesson 13 Metadata is many times larger than data files
Practice data lake iceberg Lesson 14 Data merging (to solve the problem of metadata expansion over time)
practice data lake iceberg Lesson 15 spark installation and integration iceberg (jersey package conflict)
practice data lake iceberg Lesson 16 open the cognition of iceberg through spark3 Door
Practice data lake iceberg Lesson 17 Hadoop2.7, spark3 on yarn run iceberg configuration
Practice data lake iceberg Lesson 18 Multiple clients interact with iceberg Start commands (commonly used commands)
Practice data lake iceberg Lesson 19 flink count iceberg , No result problem
practice data lake iceberg Lesson 20 flink + iceberg CDC scenario (version problem, test failed)
practice data lake iceberg Lesson 21 flink1.13.5 + iceberg0.131 CDC (test successful INSERT, change operation failed)
Practice data lake iceberg Lesson 22 flink1.13.5 + iceberg0.131 CDC (CRUD test successful)
practice data lake iceberg Lesson 23 flink-sql restart
practice data lake iceberg from checkpoint Lesson 24 iceberg metadata details Analyzing
the practice data lake iceberg Lesson 25 Running flink sql in the background The effect of addition, deletion and modification
Practice data lake iceberg Lesson 26 checkpoint setting method
Practice data lake iceberg Lesson 27 Flink cdc test program failure restart: can restart from the last time checkpoint to continue working
practice data lake iceberg Lesson 28 Deploy packages that do not exist in the public warehouse to local warehouse
practice data lake iceberg Lesson 29 how to obtain flink jobId elegantly and efficiently
practice data lake iceberg lesson 30 mysql -> iceberg, different clients sometimes have zone issues
Practice data lake iceberg Lesson 31 use github's flink-streaming-platform-web tool to manage flink task flow, test cdc restart scenario
practice data lake iceberg more content directory


foreword

Flink restarts and needs to recover from checkpoint. When it comes to task engineering management, it is time-consuming to develop a set of such tools by ourselves. For small companies, it is thankless, so we found an open source solution to test flink-streaming-platform-web, support sql, jar, various A cluster mode submission, available for personal testing


1. Use the open source component flink-streaming-platform-web to manage flink tasks

flink-streaming-platform-web Use open source components to manage flink tasks Source
code address: https://github.com/zhp8341/flink-streaming-platform-web
Relevant instructions for use: The official website is very detailed, please check the official website

2. Test flink-streaming-platform-web Recover from checkpoint when restarting

Core logic: read from mysql to iceberg

1. Code, read from mysql to iceberg

  1. Mysql raw data
    insert image description here
    3. Start the program
    4. Check the results of the iceberg table and find that it is synchronized.
    insert image description here
    5. Test the newly added change data

Add an id=5, and update a data, as follows

INSERT INTO `stock_basic` VALUES ('5', '000007.SZ', '000007', '*ST全新', '深圳', '酒店餐饮', '19920413', null);

update stock_basic set actural_controller='me me me' where i='0';

Check iceberg and found that the change has been captured:
insert image description here

6. Restart to see if you can recover from checkpoint

insert image description here
Click Restore, and the following dialogue pops up:
insert image description here
Click Restore, refresh the main page of flink, and find that the task starts normally:
insert image description here

  1. Go to the sink table to check the data and see if there is any repeated consumption:
    it is found that there is no duplication.
    insert image description here

  2. Re-test, after stopping the program, write some data into it, and see if the changed data is captured after recovery from checkpoint

Step 1: Record savePoint
Step 2: Stop the program
Step 3: Insert update delete change

INSERT INTO `stock_basic` VALUES ('6', '000008.SZ', '000008', '神州高铁', '北京', '运输设备', '19920507', '国家开发投资集团有限公司');
update stock_basic set actural_controller='汉武帝' where i='1';
delete from stock_basic where i='0';

insert image description here

Step 4: Restoring the program
insert image description here
After the restoration, wait for a checkpoint, check in spark-sql, find out, and capture the change
insert image description here

Summarize

This flink-streaming-platform-web is really easy to use! use it for fun

Guess you like

Origin blog.csdn.net/spark_dev/article/details/124469641