Official website: https://ververica.github.io/flink-cdc-connectors/release-2.3/content/%E5%BF%AB%E9%80%9F%E4%B8%8A%E6%89%8B/ The official mysql-postgres-tutorial-zh.html tutorial has some pitfalls. After testing it myself, I took notes.
Server environment:
VM virtual machine: CentOS7.9
docker version: Docker version 24.0.5, build ced0996
docker-compose version: 2.19
jdk 1.8
Virtual machine IP:192.168.122.131
Memory: 16G (must be greater than or equal to 16G)
CPU:4g
Disk: >= 60G
1. Docker compose installation
DOCKER_CONFIG=${DOCKER_CONFIG:-/usr/local/lib/docker/cli-plugins}
mkdir -p $DOCKER_CONFIG/cli-plugins
curl -SL https://github.com/docker/compose/releases/download/v2.19.1/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose
Apply executable permissions to the file:
chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
Test whether the installation was successful
docker compose version #之前的v1版本命令是docker-compose --version
Reference: https://blog.csdn.net/qq_40099908/article/details/131611496
Two, actual combat
This tutorial will show how to quickly build streaming ETL for MySQL and Postgres based on Flink CDC. The demonstrations in this tutorial will be carried out in the Flink SQL CLI, involving only SQL, without a single line of Java/Scala code, and no need to install an IDE.
Assume that we are running an e-commerce business. The data of goods and orders are stored in MySQL, and the logistics information corresponding to the orders is stored in Postgres. For the order table, in order to facilitate analysis, we hope to associate it with its corresponding product and logistics information to form a wide table, and write it to ElasticSearch in real time.
The following content will introduce how to use Flink Mysql/Postgres CDC to achieve this requirement. The overall architecture of the system is shown in the figure below:
1. Prepare the components required for the tutorial
The following tutorial will docker-compose
prepare the required components in the following manner.
Create a file with the following content docker-compose.yml
:
version: '2.1'
services:
postgres:
image: debezium/example-postgres:1.1
ports:
- "5432:5432"
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
mysql:
image: debezium/example-mysql:1.1
ports:
- "3306:3306"
environment:
- MYSQL_ROOT_PASSWORD=123456
- MYSQL_USER=mysqluser
- MYSQL_PASSWORD=mysqlpw
elasticsearch:
image: elastic/elasticsearch:7.6.0
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.type=single-node
ports:
- "9200:9200"
- "9300:9300"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
kibana:
image: elastic/kibana:7.6.0
ports:
- "5601:5601"
The containers included in this Docker Compose are:
-
MySQL: The product table
products
and order tableorders
will be stored in this database. These two tables willshipments
be associated with the logistics table in the Postgres database to obtain an order table containing more information.enriched_orders
-
Postgres: Logistics table
shipments
will be stored in this database -
Elasticsearch: The final order table
enriched_orders
will be written to Elasticsearch -
Kibana: used to visualize ElasticSearch data
docker-compose.yml
Execute the following command in the directory where you are located to start the components required for this tutorial :
docker compose up -d
This command will automatically start all containers defined in the Docker Compose configuration in detached mode. You can use docker ps to see whether the above containers are started normally, or you can check whether Kibana is running normally by visiting http://192.168.122.131:5601 .
2. Download Flink and the required dependencies
Download Flink 1.16.0 and extract it to the directory flink-1.16.0
,
Download the dependency packages listed below and place them flink-1.16.0/lib/
in the directory:
-
The download link is only valid for the published version, the SNAPSHOT version requires local compilation
Prepare data
Prepare data in MySQL database
Enter the MySQL container
docker compose exec mysql mysql -uroot -p123456
Create database and tables products
, orders
and insert data
-- MySQL
CREATE DATABASE mydb;
USE mydb;
CREATE TABLE products (
id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description VARCHAR(512)
);
ALTER TABLE products AUTO_INCREMENT = 101;
INSERT INTO products
VALUES (default,"scooter","Small 2-wheel scooter"),
(default,"car battery","12V car battery"),
(default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
(default,"hammer","12oz carpenter's hammer"),
(default,"hammer","14oz carpenter's hammer"),
(default,"hammer","16oz carpenter's hammer"),
(default,"rocks","box of assorted rocks"),
(default,"jacket","water resistent black wind breaker"),
(default,"spare tire","24 inch spare tire");
CREATE TABLE orders (
order_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
order_date DATETIME NOT NULL,
customer_name VARCHAR(255) NOT NULL,
price DECIMAL(10, 5) NOT NULL,
product_id INTEGER NOT NULL,
order_status BOOLEAN NOT NULL -- Whether order has been placed
) AUTO_INCREMENT = 10001;
INSERT INTO orders
VALUES (default, '2020-07-30 10:08:22', 'Jark', 50.50, 102, false),
(default, '2020-07-30 10:11:09', 'Sally', 15.00, 105, false),
(default, '2020-07-30 12:00:30', 'Edward', 25.25, 106, false);
Note: mysql will encounter the wrong time zone.
Adjust the time zone in the mysql container:
set time_zone='+8:00';
SET GLOBAL time_zone = '+8:00';
flush privileges;
SELECT @@global.time_zone;
show variables like '%time_zone%';
Prepare data in Postgres database
Enter the Postgres container
docker compose exec postgres psql -h localhost -U postgres
Create a table shipments
and insert data
-- PG
CREATE TABLE shipments (
shipment_id SERIAL NOT NULL PRIMARY KEY,
order_id SERIAL NOT NULL,
origin VARCHAR(255) NOT NULL,
destination VARCHAR(255) NOT NULL,
is_arrived BOOLEAN NOT NULL
);
ALTER SEQUENCE public.shipments_shipment_id_seq RESTART WITH 1001;
ALTER TABLE public.shipments REPLICA IDENTITY FULL;
INSERT INTO shipments
VALUES (default,10001,'Beijing','Shanghai',false),
(default,10002,'Hangzhou','Shanghai',false),
(default,10003,'Shanghai','Hangzhou',false);
Start the Flink cluster and Flink SQL CLI
Use the following command to jump to the Flink directory
cd flink-1.16.0
Use the following command to start the Flink cluster
./bin/start-cluster.sh
If the startup is successful, you can access the Flink Web UI at http://192.168.122.131:8081/ , as shown below:
Note: If it cannot be accessed from a local computer other than the VM, you need to adjust the /flink-1.16.0/conf/flink-conf.yaml file.
Change rest.address value to: 0.0.0.0
Open a single port (you need to restart the firewall after opening to take effect);
firewall-cmd --zone=public --add-port=8081/tcp --permanent
Restart the firewall; systemctl restart firewalld
Another: There is also a parameter taskmanager.numberOfTaskSlots: 50. Generally, set a larger value, such as 50.
Start Flink SQL CLI using the following command
./bin/sql-client.sh
After successful startup, you can see the following page:
Create tables using Flink DDL in Flink SQL CLI
First, enable checkpoint and do a checkpoint every 3 seconds.
-- Flink SQL
Flink SQL> SET execution.checkpointing.interval = 3s;
Then, for the tables in the database products
, use Flink SQL CLI to create corresponding tables to synchronize the data of these underlying database tables orders
. shipments
-- Flink SQL
Flink SQL> CREATE TABLE products (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'products'
);
Flink SQL> CREATE TABLE orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'orders'
);
Flink SQL> CREATE TABLE shipments (
shipment_id INT,
order_id INT,
origin STRING,
destination STRING,
is_arrived BOOLEAN,
PRIMARY KEY (shipment_id) NOT ENFORCED
) WITH (
'connector' = 'postgres-cdc',
'hostname' = 'localhost',
'port' = '5432',
'username' = 'postgres',
'password' = 'postgres',
'database-name' = 'postgres',
'schema-name' = 'public',
'table-name' = 'shipments'
);
Finally, create enriched_orders
a table to write the associated order data into Elasticsearch.
-- Flink SQL
Flink SQL> CREATE TABLE enriched_orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
product_name STRING,
product_description STRING,
shipment_id INT,
origin STRING,
destination STRING,
is_arrived BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'elasticsearch-7',
'hosts' = 'http://localhost:9200',
'index' = 'enriched_orders'
);
Associate order data and write it to Elasticsearch
Use Flink SQL to associate the order table order
with the product table products
and logistics information table shipments
, and write the associated order information into Elasticsearch
-- Flink SQL
Flink SQL> INSERT INTO enriched_orders
SELECT o.*, p.name, p.description, s.shipment_id, s.origin, s.destination, s.is_arrived
FROM orders AS o
LEFT JOIN products AS p ON o.product_id = p.id
LEFT JOIN shipments AS s ON o.order_id = s.order_id;
Now, you can see the order data including product and logistics information in Kibana.
First visit http://192.168.122.131:5601/app/kibana#/management/kibana/index_pattern to create index pattern enriched_orders.
Then you can see the written data at http://192.168.122.131:5601/app/kibana#/discover.
Next, modify the data in the tables in the MySQL and Postgres databases, and the order data displayed in Kibana will also be updated in real time:
orders
Insert a piece of data into a MySQL table
--MySQL
INSERT INTO orders
VALUES (default, '2020-07-30 15:22:00', 'Jark', 29.71, 104, false);
shipment
Insert a piece of data into a Postgres table
--PG
INSERT INTO shipments
VALUES (default,10004,'Shanghai','Beijing',false);
orders
Update order status in MySQL table
--MySQL
UPDATE orders SET order_status = true WHERE order_id = 10004;
shipment
Update logistics status in Postgres table
--PG
UPDATE shipments SET is_arrived = true WHERE shipment_id = 1004;
orders
Delete a piece of data in a MYSQL table
--MySQL
DELETE FROM orders WHERE order_id = 10004;
Kibana is refreshed every time a step is executed, and you can see that the order data displayed in Kibana will be updated in real time, as shown below:
environmental cleanup
After this tutorial, docker-compose.yml
execute the following command in the directory where the file is located to stop all containers:
docker compose down
Execute the following command in the directory where Flink is located flink-1.16.0
to stop the Flink cluster:
./bin/stop-cluster.sh
Troubleshooting
If the data is abnormal, check the error message on the flink web page.