Real-time data synchronization from MySQL to Databend based on Flink CDC

Author: Han Shanjie

Databend Cloud R&D Engineer https://github.com/hantmachttps://github.com/hantmac

This tutorial will show how to quickly build real-time data synchronization from MySQL to Databend based on Flink CDC. The demonstrations in this tutorial will all be carried out in the Flink SQL CLI, only involving SQL, no need for a line of Java/Scala code, and no need to install an IDE.

Suppose we have an e-commerce business, and the product data is stored in MySQL, and we need to synchronize it to Databend in real time.

The following content will introduce how to use Flink Mysql/Databend CDC to achieve this requirement. The overall architecture of the system is shown in the following figure:

Preparation Phase

Prepare a Linux or MacOS with Docker and docker-compose installed.

Components needed to prepare the tutorial

The following tutorials will prepare the required components in the docker-composesame way.

debezium-MySQL

docker-compose.yaml

version: '2.1'
services:
  postgres:
    image: debezium/example-postgres:1.1
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_DB=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
  mysql:
    image: debezium/example-mysql:1.1
    ports:
      - "3306:3306"
    environment:
      - MYSQL_ROOT_PASSWORD=123456
      - MYSQL_USER=mysqluser
      - MYSQL_PASSWORD=mysqlpw

Data bend

docker-compose.yaml

version: '3'
services:
  databend:
    image: datafuselabs/databend
    volumes:
      - /Users/hanshanjie/databend/local-test/databend/databend-query.toml:/etc/databend/query.toml
    environment:
      QUERY_DEFAULT_USER: databend
      QUERY_DEFAULT_PASSWORD: databend
      MINIO_ENABLED: 'true'
    ports:
      - '8000:8000'
      - '9000:9000'
      - '3307:3307'
      - '8124:8124'

docker-compose.ymlExecute the following command in the same directory to start the components needed for this tutorial:

ocker-compose up -d

This command will automatically start all containers defined in the Docker Compose configuration in detached mode. You can use docker ps to observe whether the above container starts normally.

Download Flink and the required dependencies

  1. Download Flink 1.16.0 and extract it to the directoryflink-1.16.0
  2. Download the dependencies listed below and place them flink-1.16.0/lib/in :
  3. The download link is only valid for the released version, the SNAPSHOT version needs to be compiled locally

Compile flink-connector-databend

git clone https://github.com/databendcloud/flink-connector-databend
cd flink-connector-databend
mvn clean install -DskipTests

Copy target/flink-connector-databend-1.16.0-SNAPSHOT.jar flink-1.16.0/lib/to .

prepare data

Prepare data in MySQL database

Enter the MySQL container

docker-compose exec mysql mysql -uroot -p123456

Create database mydb and tables products, and insert data:

CREATE DATABASE mydb;
USE mydb;

CREATE TABLE products (id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,name VARCHAR(255) NOT NULL,description VARCHAR(512));
ALTER TABLE products AUTO_INCREMENT = 10;

INSERT INTO products VALUES (default,"scooter","Small 2-wheel scooter"),
(default,"car battery","12V car battery"),
(default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
(default,"hammer","12oz carpenter's hammer"),
(default,"hammer","14oz carpenter's hammer"),
(default,"hammer","16oz carpenter's hammer"),
(default,"rocks","box of assorted rocks"),
(default,"jacket","water resistent black wind breaker"),
(default,"cloud","test for databend"),
(default,"spare tire","24 inch spare tire");

Create table in Databend

CREATE TABLE bend_products (id INT NOT NULL, name VARCHAR(255) NOT NULL, description VARCHAR(512) );

Start Flink cluster and Flink SQL CLI

Use the following command to jump to the Flink directory

cd flink-16.0

Start the Flink cluster with the following command

./bin/start-cluster.sh

If the startup is successful, you can access the Flink Web UI at http://localhost:8081/ , as follows:

Start the Flink SQL CLI with the following command

./bin/sql-client.sh

Create tables using Flink DDL in Flink SQL CLI

First, turn on checkpoint and do a checkpoint every 3 seconds

-- Flink SQL              
Flink SQL> SET execution.checkpointing.interval = 3s;

Then, for the tables in the database products, use Flink SQL CLI to create corresponding tables to synchronize the data of the underlying database tables

-- Flink SQL
Flink SQL> CREATE TABLE products (id INT,name STRING,description STRING,PRIMARY KEY (id) NOT ENFORCED) 
WITH ('connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'products',
'server-time-zone' = 'UTC'
);

Finally, create the d_products table to write order data into Databend

-- Flink SQL
create table d_products (id INT,name String,description String, PRIMARY KEY (`id`) NOT ENFORCED) 
with ('connector' = 'databend',
'url'='databend://localhost:8000',
'username'='databend',
'password'='databend',
'database-name'='default',
'table-name'='bend_products',
'sink.batch-size' = '5',
'sink.flush-interval' = '1000',
'sink.max-retries' = '3');

Use Flink SQL to synchronize the data in the products table to Databend's d_products table:

insert into d_products select * from products;

At this point, the flink job will be submitted successfully. Open the flink UI and you can see:

At the same time, you can see that the data in MySQL has been synchronized in databend:

Synchronize Insert/Update data

At this point we insert 10 more pieces of data in MySQL:

INSERT INTO products VALUES 
(default,"scooter","Small 2-wheel scooter"),
(default,"car battery","12V car battery"),
(default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),        
(default,"hammer","12oz carpenter's hammer"),        
(default,"hammer","14oz carpenter's hammer"),        
(default,"hammer","16oz carpenter's hammer"),        
(default,"rocks","box of assorted rocks"),        
(default,"jacket","water resistent black wind breaker"),
(default,"cloud","test for databend"),        
(default,"spare tire","24 inch spare tire");

These data will be synchronized to Databend immediately.

If a piece of data is updated in MySQL at this time:

Then the data with id=10 will be updated immediately in databend:

environmental cleanup

After the operation is over, execute the following command in the directory where docker-compose.ymlthe file is located to stop all containers:

docker-compose down

flink-1.16.0Execute the following command in the directory where Flink is located to stop the Flink cluster:

./bin/stop-cluster.sh

in conclusion

The above is the whole process of building real-time data synchronization from MySQL to Databend based on Flink CDC. Flink CDC connectors can replace the data acquisition module of Debezium+Kafka, realize the integration of Flink SQL acquisition + calculation + transmission, reduce maintenance components, and simplify the real-time chain This way, while reducing the deployment cost, it can also achieve the semantic effect of Exactly Once.

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5489811/blog/10084237