Flink CDC Series: Building Streaming ETL for MySQL and Postgres based on Flink CDC
- 1. Technical route
- 2. MySQL database table creation
- 3. PostgreSQL database table creation
- 4. Create tables using Flink DDL in Flink SQL CLI
- 5. Associate order data and write it into Elasticsearch
- 6. Kibana checks the order data of goods and logistics information
- 7. Modify the data in the table in the database, and Kibana checks the update
1. Technical route
2. MySQL database table creation
mysql database creates database and table products, orders
Create the products table
-- MySQL
CREATE DATABASE mydb;USE mydb;
CREATE TABLE products (id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,name VARCHAR(255) NOT NULL,description VARCHAR(512));
ALTER TABLE products AUTO_INCREMENT = 101;
Create the orders table
CREATE TABLE orders (
order_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
order_date DATETIME NOT NULL,
customer_name VARCHAR(255) NOT NULL,
price DECIMAL(10, 5) NOT NULL,
product_id INTEGER NOT NULL,
order_status BOOLEAN NOT NULL -- Whether order has been placed
) AUTO_INCREMENT = 10001;
products table insert data
INSERT INTO products
VALUES (default,"scooter","Small 2-wheel scooter"),
(default,"car battery","12V car battery"),
(default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
(default,"hammer","12oz carpenter's hammer"),
(default,"hammer","14oz carpenter's hammer"),
(default,"hammer","16oz carpenter's hammer"),
(default,"rocks","box of assorted rocks"),
(default,"jacket","water resistent black wind breaker"),
(default,"spare tire","24 inch spare tire");
Insert data into the orders table
INSERT INTO orders
VALUES (default, '2020-07-30 10:08:22', 'Jark', 50.50, 102, false),
(default, '2020-07-30 10:11:09', 'Sally', 15.00, 105, false),
(default, '2020-07-30 12:00:30', 'Edward', 25.25, 106, false);
3. PostgreSQL database table creation
Create table shipments
-- PG
CREATE TABLE shipments (
shipment_id SERIAL NOT NULL PRIMARY KEY,
order_id SERIAL NOT NULL,
origin VARCHAR(255) NOT NULL,
destination VARCHAR(255) NOT NULL,
is_arrived BOOLEAN NOT NULL);
insert data
ALTER SEQUENCE public.shipments_shipment_id_seq RESTART WITH 1001;
ALTER TABLE public.shipments REPLICA IDENTITY FULL;
INSERT INTO shipments
VALUES (default,10001,'Beijing','Shanghai',false),
(default,10002,'Hangzhou','Shanghai',false),
(default,10003,'Shanghai','Hangzhou',false);
4. Create tables using Flink DDL in Flink SQL CLI
First, turn on checkpoint and do a checkpoint every 3 seconds
-- Flink SQL
Flink SQL> SET execution.checkpointing.interval = 3s;
Then, for the tables products, orders, and shipments in the database, use Flink SQL CLI to create corresponding tables to synchronize the data of these underlying database tables
-- Flink SQL
Flink SQL> CREATE TABLE products (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'products'
);
Flink SQL> CREATE TABLE orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'orders');
Finally, create the enriched_orders table to write the associated order data into Elasticsearch
-- Flink SQL
Flink SQL> CREATE TABLE enriched_orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
product_name STRING,
product_description STRING,
shipment_id INT,
origin STRING,
destination STRING,
is_arrived BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED) WITH (
'connector' = 'elasticsearch-7',
'hosts' = 'http://localhost:9200',
'index' = 'enriched_orders'
);
5. Associate order data and write it into Elasticsearch
Use Flink SQL to associate the order table order with the product table products and the logistics information table shipments, and write the associated order information into Elasticsearch
-- Flink SQL
Flink SQL> INSERT INTO enriched_orders
SELECT o.*, p.name, p.description, s.shipment_id, s.origin, s.destination, s.is_arrived
FROM orders AS o
LEFT JOIN products AS p ON o.product_id = p.id
LEFT JOIN shipments AS s ON o.order_id = s.order_id;
6. Kibana checks the order data of goods and logistics information
Create index pattern enriched_orders
to view written data
7. Modify the data in the table in the database, and Kibana checks the update
Modify the data in the tables in the MySQL and Postgres databases, and the order data displayed in Kibana will also be updated in real time:
Insert a piece of data into the MySQL orders table
--MySQL
INSERT INTO orders
VALUES (default, '2020-07-30 15:22:00', 'Jark', 29.71, 104, false);
Insert a piece of data into the Postgres shipment table
--PG
INSERT INTO shipmentsVALUES (default,10004,'Shanghai','Beijing',false);
Update the status of an order in the MySQL orders table
--MySQL
UPDATE orders SET order_status = true WHERE order_id = 10004;
Update the status of the logistics in the Postgres shipment table
--PG
UPDATE shipments SET is_arrived = true WHERE shipment_id = 1004;
Delete a piece of data in the orders table of MYSQL
--MySQL
DELETE FROM orders WHERE order_id = 10004;
Refresh Kibana every step of execution, and you can see that the order data displayed in Kibana will be updated in real time, as shown below: