Flink CDC Series: Building Streaming ETL for MySQL and Postgres based on Flink CDC

1. Technical route

insert image description here

2. MySQL database table creation

mysql database creates database and table products, orders

Create the products table

-- MySQL

CREATE DATABASE mydb;USE mydb;

CREATE TABLE products (id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,name VARCHAR(255) NOT NULL,description VARCHAR(512));

ALTER TABLE products AUTO_INCREMENT = 101;

Create the orders table

CREATE TABLE orders (
    order_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    order_date DATETIME NOT NULL,
    customer_name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 5) NOT NULL,
    product_id INTEGER NOT NULL,
    order_status BOOLEAN NOT NULL -- Whether order has been placed
 ) AUTO_INCREMENT = 10001;

products table insert data

INSERT INTO products
VALUES (default,"scooter","Small 2-wheel scooter"),
       (default,"car battery","12V car battery"),
       (default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
       (default,"hammer","12oz carpenter's hammer"),
       (default,"hammer","14oz carpenter's hammer"),
       (default,"hammer","16oz carpenter's hammer"),
       (default,"rocks","box of assorted rocks"),
       (default,"jacket","water resistent black wind breaker"),
       (default,"spare tire","24 inch spare tire");

Insert data into the orders table

INSERT INTO orders
VALUES (default, '2020-07-30 10:08:22', 'Jark', 50.50, 102, false),
(default, '2020-07-30 10:11:09', 'Sally', 15.00, 105, false),
(default, '2020-07-30 12:00:30', 'Edward', 25.25, 106, false);

3. PostgreSQL database table creation

Create table shipments

-- PG
CREATE TABLE shipments (
shipment_id SERIAL NOT NULL PRIMARY KEY,
order_id SERIAL NOT NULL,
origin VARCHAR(255) NOT NULL,
destination VARCHAR(255) NOT NULL,
is_arrived BOOLEAN NOT NULL);

insert data

ALTER SEQUENCE public.shipments_shipment_id_seq RESTART WITH 1001;


ALTER TABLE public.shipments REPLICA IDENTITY FULL;

INSERT INTO shipments
VALUES (default,10001,'Beijing','Shanghai',false),
       (default,10002,'Hangzhou','Shanghai',false),   
       (default,10003,'Shanghai','Hangzhou',false);

4. Create tables using Flink DDL in Flink SQL CLI

First, turn on checkpoint and do a checkpoint every 3 seconds

-- Flink SQL                   
Flink SQL> SET execution.checkpointing.interval = 3s;

Then, for the tables products, orders, and shipments in the database, use Flink SQL CLI to create corresponding tables to synchronize the data of these underlying database tables

-- Flink SQL

Flink SQL> CREATE TABLE products (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'products'
);
Flink SQL> CREATE TABLE orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'localhost',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'database-name' = 'mydb',
'table-name' = 'orders');

Finally, create the enriched_orders table to write the associated order data into Elasticsearch

-- Flink SQL
Flink SQL> CREATE TABLE enriched_orders (
order_id INT,
order_date TIMESTAMP(0),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
product_name STRING,
product_description STRING,
shipment_id INT,
origin STRING,
destination STRING,
is_arrived BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED) WITH (
'connector' = 'elasticsearch-7',
'hosts' = 'http://localhost:9200',
'index' = 'enriched_orders'
);

5. Associate order data and write it into Elasticsearch

Use Flink SQL to associate the order table order with the product table products and the logistics information table shipments, and write the associated order information into Elasticsearch

-- Flink SQL
Flink SQL> INSERT INTO enriched_orders
SELECT o.*, p.name, p.description, s.shipment_id, s.origin, s.destination, s.is_arrived
FROM orders AS o
LEFT JOIN products AS p ON o.product_id = p.id
LEFT JOIN shipments AS s ON o.order_id = s.order_id;

6. Kibana checks the order data of goods and logistics information

Create index pattern enriched_orders
insert image description here
to view written data
insert image description here

7. Modify the data in the table in the database, and Kibana checks the update

Modify the data in the tables in the MySQL and Postgres databases, and the order data displayed in Kibana will also be updated in real time:

Insert a piece of data into the MySQL orders table

--MySQL

INSERT INTO orders
VALUES (default, '2020-07-30 15:22:00', 'Jark', 29.71, 104, false);

Insert a piece of data into the Postgres shipment table

--PG
INSERT INTO shipmentsVALUES (default,10004,'Shanghai','Beijing',false);

Update the status of an order in the MySQL orders table

--MySQL
UPDATE orders SET order_status = true WHERE order_id = 10004;

Update the status of the logistics in the Postgres shipment table

--PG
UPDATE shipments SET is_arrived = true WHERE shipment_id = 1004;

Delete a piece of data in the orders table of MYSQL

--MySQL
DELETE FROM orders WHERE order_id = 10004;

Refresh Kibana every step of execution, and you can see that the order data displayed in Kibana will be updated in real time, as shown below:
insert image description here

Guess you like

Origin blog.csdn.net/zhengzaifeidelushang/article/details/132256089