Author: Han Shanjie
Databend Cloud R&D Engineer https://github.com/hantmachttps://github.com/hantmac
This tutorial will show how to quickly build real-time data synchronization from MySQL to Databend based on Flink CDC. The demonstrations in this tutorial will all be carried out in the Flink SQL CLI, only involving SQL, no need for a line of Java/Scala code, and no need to install an IDE.
Suppose we have an e-commerce business, and the product data is stored in MySQL, and we need to synchronize it to Databend in real time.
The following content will introduce how to use Flink Mysql/Databend CDC to achieve this requirement. The overall architecture of the system is shown in the following figure:
Preparation Phase
Prepare a Linux or MacOS with Docker and docker-compose installed.
Components needed to prepare the tutorial
The following tutorials will prepare the required components in the docker-compose
same way.
debezium-MySQL
docker-compose.yaml
version: '2.1'
services:
postgres:
image: debezium/example-postgres:1.1
ports:
- "5432:5432"
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
mysql:
image: debezium/example-mysql:1.1
ports:
- "3306:3306"
environment:
- MYSQL_ROOT_PASSWORD=123456
- MYSQL_USER=mysqluser
- MYSQL_PASSWORD=mysqlpw
Data bend
docker-compose.yaml
version: '3'
services:
databend:
image: datafuselabs/databend
volumes:
- /Users/hanshanjie/databend/local-test/databend/databend-query.toml:/etc/databend/query.toml
environment:
QUERY_DEFAULT_USER: databend
QUERY_DEFAULT_PASSWORD: databend
MINIO_ENABLED: 'true'
ports:
- '8000:8000'
- '9000:9000'
- '3307:3307'
- '8124:8124'
docker-compose.yml
Execute the following command in the same directory to start the components needed for this tutorial:
ocker-compose up -d
This command will automatically start all containers defined in the Docker Compose configuration in detached mode. You can use docker ps to observe whether the above container starts normally.
Download Flink and the required dependencies
- Download Flink 1.16.0 and extract it to the directory
flink-1.16.0
- Download the dependencies listed below and place them
flink-1.16.0/lib/
in : - The download link is only valid for the released version, the SNAPSHOT version needs to be compiled locally
Compile flink-connector-databend
git clone https://github.com/databendcloud/flink-connector-databend
cd flink-connector-databend
mvn clean install -DskipTests
Copy target/flink-connector-databend-1.16.0-SNAPSHOT.jar flink-1.16.0/lib/
to .
prepare data
Prepare data in MySQL database
Enter the MySQL container
docker-compose exec mysql mysql -uroot -p123456
Create database mydb and tables products
, and insert data:
CREATE DATABASE mydb;
USE mydb;
CREATE TABLE products (id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,name VARCHAR(255) NOT NULL,description VARCHAR(512));
ALTER TABLE products AUTO_INCREMENT = 10;
INSERT INTO products VALUES (default,"scooter","Small 2-wheel scooter"),
(default,"car battery","12V car battery"),
(default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
(default,"hammer","12oz carpenter's hammer"),
(default,"hammer","14oz carpenter's hammer"),
(default,"hammer","16oz carpenter's hammer"),
(default,"rocks","box of assorted rocks"),
(default,"jacket","water resistent black wind breaker"),
(default,"cloud","test for databend"),
(default,"spare tire","24 inch spare tire");
Create table in Databend
CREATE TABLE bend_products (id INT NOT NULL, name VARCHAR(255) NOT NULL, description VARCHAR(512) );
Start Flink cluster and Flink SQL CLI
Use the following command to jump to the Flink directory
cd flink-16.0
Start the Flink cluster with the following command
./bin/start-cluster.sh
If the startup is successful, you can access the Flink Web UI at http://localhost:8081/ , as follows:
Start the Flink SQL CLI with the following command
./bin/sql-client.sh
Create tables using Flink DDL in Flink SQL CLI
First, turn on checkpoint and do a checkpoint every 3 seconds
-- Flink SQL
Flink SQL> SET execution.checkpointing.interval = 3s;
Then, for the tables in the database products
, use Flink SQL CLI to create corresponding tables to synchronize the data of the underlying database tables
-- Flink SQL
Flink SQL> CREATE TABLE products (id INT,name STRING,description STRING,PRIMARY KEY (id) NOT ENFORCED)
WITH ('connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'products',
'server-time-zone' = 'UTC'
);
Finally, create the d_products table to write order data into Databend
-- Flink SQL
create table d_products (id INT,name String,description String, PRIMARY KEY (`id`) NOT ENFORCED)
with ('connector' = 'databend',
'url'='databend://localhost:8000',
'username'='databend',
'password'='databend',
'database-name'='default',
'table-name'='bend_products',
'sink.batch-size' = '5',
'sink.flush-interval' = '1000',
'sink.max-retries' = '3');
Use Flink SQL to synchronize the data in the products table to Databend's d_products table:
insert into d_products select * from products;
At this point, the flink job will be submitted successfully. Open the flink UI and you can see:
At the same time, you can see that the data in MySQL has been synchronized in databend:
Synchronize Insert/Update data
At this point we insert 10 more pieces of data in MySQL:
INSERT INTO products VALUES
(default,"scooter","Small 2-wheel scooter"),
(default,"car battery","12V car battery"),
(default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
(default,"hammer","12oz carpenter's hammer"),
(default,"hammer","14oz carpenter's hammer"),
(default,"hammer","16oz carpenter's hammer"),
(default,"rocks","box of assorted rocks"),
(default,"jacket","water resistent black wind breaker"),
(default,"cloud","test for databend"),
(default,"spare tire","24 inch spare tire");
These data will be synchronized to Databend immediately.
If a piece of data is updated in MySQL at this time:
Then the data with id=10 will be updated immediately in databend:
environmental cleanup
After the operation is over, execute the following command in the directory where docker-compose.yml
the file is located to stop all containers:
docker-compose down
flink-1.16.0
Execute the following command in the directory where Flink is located to stop the Flink cluster:
./bin/stop-cluster.sh
in conclusion
The above is the whole process of building real-time data synchronization from MySQL to Databend based on Flink CDC. Flink CDC connectors can replace the data acquisition module of Debezium+Kafka, realize the integration of Flink SQL acquisition + calculation + transmission, reduce maintenance components, and simplify the real-time chain This way, while reducing the deployment cost, it can also achieve the semantic effect of Exactly Once.