Flink CDC best practices (taking MySQL as an example)

1. Preparation

1.1 Confirm MySQL binlog mode

Confirm whether the binlog mode of the MySQL database is ROW. You can execute the following statement in the MySQL command line to confirm:

SHOW GLOBAL VARIABLES LIKE 'binlog_format';

If the field in the returned result Valueis ROW, then the binlog mode is ROW.

1.2 Download and install Flink

Download and install Flink. You can refer to the official documentation for installation.

2. Configure Flink CDC

2.1 Configure MySQL database connection information

flink-conf.yamlAdd MySQL database connection information in Flink's configuration file , for example:

# MySQL connection configuration
mysql.server-id: 12345
mysql.hostname: localhost
mysql.port: 3306
mysql.username: root
mysql.password: 123456
mysql.database-name: test

2.2 Configure CDC Job

mysql-cdc.propertiesAdd the following configuration in Flink's CDC Job configuration file :

# Flink CDC Job Configuration
name: mysql-cdc-job
flink.parallelism: 1
flink.checkpoint.interval: 60000
flink.checkpoint.mode: EXACTLY_ONCE

# MySQL CDC Source Configuration
debezium.transforms: unwrap
debezium.transforms.unwrap.type: io.debezium.transforms.ExtractNewRecordState
database.hostname: localhost
database.port: 3306
database.user: root
database.password: 123456
database.history.kafka.bootstrap.servers: localhost:9092
database.history.kafka.topic: mysql-cdc-history
database.server.id: 12345
database.server.name: test
database.whitelist: test.user

Among them, nameis the name of the CDC Job, flink.parallelismis the parallelism of Flink, flink.checkpoint.intervalis the Checkpoint time interval of Flink, flink.checkpoint.modeis the Checkpoint mode, and is set here EXACTLY_ONCE.

debezium.transformsis the name of the Debezium converter, set here unwrap. database.hostname, database.port, database.user, database.passwordare the connection information of the MySQL database respectively. database.history.kafka.bootstrap.serversIt is the address information of Kafka and database.history.kafka.topicthe Kafka Topic recorded by CDC historical data. database.server.idis the Server ID of MySQL, database.server.nameis the name of the CDC Source, database.whitelistand is the name of the MySQL table that needs to be synchronized.

Step 1: Create a MySQL database

First, you need to create a MySQL database locally or in the cloud and add a user with read and write permissions. Here is a sample SQL code that creates a test_dbdatabase named and a user named :flink_cdc_user

CREATE DATABASE test_db;

CREATE USER 'flink_cdc_user'@'%' IDENTIFIED BY 'password';

GRANT ALL PRIVILEGES ON test_db.* TO 'flink_cdc_user'@'%';

Step 2: Start the Flink cluster

Start a Flink cluster to run the CDC application. You can use Flink's own bin/start-cluster.shscript to start the Flink cluster. Make sure that the Flink cluster has included Kafka and MySQL dependencies when running.

Step 3: Create MySQL table and CDC table

In MySQL, you first need to create the tables that require CDC and the CDC table. The CDC table is a system table that stores change data that needs to be captured. A test_tabletable named and the CDC table associated with it can be created by the following code

CREATE TABLE test_db.test_table (
  id INT PRIMARY KEY,
  name VARCHAR(30),
  age INT,
  email VARCHAR(50)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE test_db.test_table_cdc (
  `database` VARCHAR(100),
  `table` VARCHAR(100),
  `type` VARCHAR(10),
  `ts` TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
  `before` JSON,
  `after` JSON
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Step 4: Write Flink CDC application

Next, you need to write a Flink CDC application to push MySQL table changes to the Kafka topic. You can use Flink's flink-connector-jdbcand flink-connector-kafkalibraries to achieve this.

Here is a code example for a basic Flink CDC application:

public static void main(String[] args) throws Exception {

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

    env.setParallelism(1);

    Properties properties = new Properties();
    properties.setProperty("bootstrap.servers", "localhost:9092");
    properties.setProperty("group.id", "test-group");

    JdbcSource<RowData> source = JdbcSource.<RowData>builder()
            .setDrivername("com.mysql.jdbc.Driver")
            .setDBUrl("jdbc:mysql://localhost:3306/test_db")
            .setUsername("flink_cdc_user")
            .setPassword("password")
            .setQuery("SELECT id, name, age, email FROM test_table")
            .setRowTypeInfo(Types.ROW(Types.INT, Types.STRING, Types.INT, Types.STRING))
            .setFetchSize(1000)
            .build();

    DataStream<RowData> stream = env.addSource(source);

Here is a simple example run and results:

$ bin/flink run -c com.example.MyCDCJob ./my-cdc-job.jar --database.server=mysql.example.com --database.port=3306 --database.name=mydb --database.username=myuser --database.password=mypassword --table.name=mytable --debezium.plugin.name=mysql --debezium.plugin.property.version=1.3.1.Final
[INFO] Starting CDC process for table: mytable.
[INFO] Initializing CDC source...
[INFO] CDC source successfully initialized.
[INFO] Starting CDC source...
[INFO] CDC source successfully started.
[INFO] Adding CDC source to Flink job topology...
[INFO] CDC source successfully added to Flink job topology.
[INFO] Starting Flink job...
[INFO] Flink job started successfully.
[INFO] Change data for table: mytable.
[INFO] Record key: {"id": 1}, record value: {"id": 1, "name": "Alice", "age": 25}.
[INFO] Record key: {"id": 2}, record value: {"id": 2, "name": "Bob", "age": 30}.
[INFO] Record key: {"id": 3}, record value: {"id": 3, "name": "Charlie", "age": 35}.
[INFO] Change data for table: mytable.
[INFO] Record key: {"id": 1}, record value: {"id": 1, "name": "Alice", "age": 27}.

You can see that when data changes, Flink CDC Job will output the changed table name, primary key of the record, and changed data. For example, in this example, there is a row where the age field changes from 25 to 27.

Guess you like

Origin blog.csdn.net/lhyandlwl/article/details/129998737