Flink CDC Series: Importing TiDB CDC into Elasticsearch
- 1. Start the TiDB cluster through docker
- 2. Download Flink and the required dependencies
- 3. Create tables and prepare data in TiDB database
- 4. Start the Flink cluster, and then start the SQL CLI
- 5. Create tables using Flink DDL in Flink SQL CLI
- Six, Kibana view ElasticSearch data
- 7. Add, delete, and modify data in TiDB, and observe the results in ElasticSearch
1. Start the TiDB cluster through docker
git clone https://github.com/pingcap/tidb-docker-compose.git
Replace the docker-compose.yml file in the directory tidb-docker-compose, the content is as follows:
version: "2.1"
services:
pd:
image: pingcap/pd:v5.3.1
ports:
- "2379:2379"
volumes:
- ./config/pd.toml:/pd.toml
- ./logs:/logs
command:
- --client-urls=http://0.0.0.0:2379
- --peer-urls=http://0.0.0.0:2380
- --advertise-client-urls=http://pd:2379
- --advertise-peer-urls=http://pd:2380
- --initial-cluster=pd=http://pd:2380
- --data-dir=/data/pd
- --config=/pd.toml
- --log-file=/logs/pd.log
restart: on-failure
tikv:
image: pingcap/tikv:v5.3.1
ports:
- "20160:20160"
volumes:
- ./config/tikv.toml:/tikv.toml
- ./logs:/logs
command:
- --addr=0.0.0.0:20160
- --advertise-addr=tikv:20160
- --data-dir=/data/tikv
- --pd=pd:2379
- --config=/tikv.toml
- --log-file=/logs/tikv.log
depends_on:
- "pd"
restart: on-failure
tidb:
image: pingcap/tidb:v5.3.1
ports:
- "4000:4000"
volumes:
- ./config/tidb.toml:/tidb.toml
- ./logs:/logs
command:
- --store=tikv
- --path=pd:2379
- --config=/tidb.toml
- --log-file=/logs/tidb.log
- --advertise-address=tidb
depends_on:
- "tikv"
restart: on-failure
elasticsearch:
image: elastic/elasticsearch:7.6.0
container_name: elasticsearch
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.type=single-node
ports:
- "9200:9200"
- "9300:9300"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
kibana:
image: elastic/kibana:7.6.0
container_name: kibana
ports:
- "5601:5601"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
The containers included in this Docker Compose are:
- TiDB cluster: tikv, pd, tidb.
- Elasticsearch: The orders table will be joined with the products table, and the result of the join will be written into Elasticsearch.
- Kibana: Visualize data in Elasticsearch.
This machine adds host mapping pd and tikv mapping 127.0.0.1. Run the following command in the directory where docker-compose.yml is located to start all containers:
docker-compose up -d
mysql -h 127.0.0.1 -P 4000 -u root # Just test tidb cluster is ready,if you have install mysql local.
This command automatically starts all containers defined in the Docker Compose configuration in detached mode. You can use docker ps to observe whether the above container is started normally. You can also visit http://localhost:5601/ to see if Kibana is running properly.
In addition, you can stop and delete all containers with the following command:
docker-compose down
2. Download Flink and the required dependencies
Download Flink 1.17.1 and extract it to the directory flink-1.17.1
https://archive.apache.org/dist/flink/flink-1.17.1/flink-1.17.1-bin-scala_2.12.tgz
Download the dependencies listed below and put them in the directory flink-1.17.1/lib/:
3. Create tables and prepare data in TiDB database
Create database and table products, orders, and insert data:
-- TiDB
CREATE DATABASE mydb;
USE mydb;
CREATE TABLE products (
id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description VARCHAR(512)
) AUTO_INCREMENT = 101;
INSERT INTO products
VALUES (default,"scooter","Small 2-wheel scooter"),
(default,"car battery","12V car battery"),
(default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
(default,"hammer","12oz carpenter's hammer"),
(default,"hammer","14oz carpenter's hammer"),
(default,"hammer","16oz carpenter's hammer"),
(default,"rocks","box of assorted rocks"),
(default,"jacket","water resistent black wind breaker"),
(default,"spare tire","24 inch spare tire");
CREATE TABLE orders (
order_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
order_date DATETIME NOT NULL,
customer_name VARCHAR(255) NOT NULL,
price DECIMAL(10, 5) NOT NULL,
product_id INTEGER NOT NULL,
order_status BOOLEAN NOT NULL -- Whether order has been placed
) AUTO_INCREMENT = 10001;
INSERT INTO orders
VALUES (default, '2020-07-30 10:08:22', 'Jark', 50.50, 102, false),
(default, '2020-07-30 10:11:09', 'Sally', 15.00, 105, false),
(default, '2020-07-30 12:00:30', 'Edward', 25.25, 106, false);
4. Start the Flink cluster, and then start the SQL CLI
Use the following command to jump to the Flink directory
cd flink-1.17.1
Start the Flink cluster with the following command
./bin/start-cluster.sh
If the startup is successful, you can access the Flink Web UI at http://localhost:8081/, as shown below:
Use the following command to start the Flink SQL CLI
./bin/sql-client.sh
After the startup is successful, you can see the following page:
5. Create tables using Flink DDL in Flink SQL CLI
First, turn on checkpoint and do a checkpoint every 3 seconds
-- Flink SQL
Flink SQL> SET execution.checkpointing.interval = 3s;
Use Flink SQL CLI to create corresponding tables for synchronizing the data of these underlying database tables
Flink SQL> CREATE TABLE products (
id INT,
name STRING,
description STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector' = 'tidb-cdc',
'tikv.grpc.timeout_in_ms' = '20000',
'pd-addresses' = '127.0.0.1:2379',
'database-name' = 'mydb',
'table-name' = 'products'
);
Flink SQL> CREATE TABLE orders (
order_id INT,
order_date TIMESTAMP(3),
customer_name STRING,
price DECIMAL(10, 5),
product_id INT,
order_status BOOLEAN,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'tidb-cdc',
'tikv.grpc.timeout_in_ms' = '20000',
'pd-addresses' = '127.0.0.1:2379',
'database-name' = 'mydb',
'table-name' = 'orders'
);
Flink SQL> CREATE TABLE enriched_orders (
order_id INT,
order_date DATE,
customer_name STRING,
order_status BOOLEAN,
product_name STRING,
product_description STRING,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'elasticsearch-7',
'hosts' = 'http://localhost:9200',
'index' = 'enriched_orders_1'
);
Insert associated data into ElasticSearch
Flink SQL> INSERT INTO enriched_orders
SELECT o.order_id, o.order_date, o.customer_name, o.order_status, p.name, p.description
FROM orders AS o
LEFT JOIN products AS p ON o.product_id = p.id;
Six, Kibana view ElasticSearch data
Check whether the final result is written to ElasticSearch, and you can see the data in ElasticSearch in Kibana.
First visit http://localhost:5601/app/kibana#/management/kibana/index_pattern to create index pattern enriched_orders.
Then you can see the written data at http://localhost:5601/app/kibana#/discover.
7. Add, delete, and modify data in TiDB, and observe the results in ElasticSearch
Make some modifications to the TiDB database through the following SQL statement, and then you can see that every time a SQL statement is executed, the data in Elasticsearch will be updated in real time.
INSERT INTO orders
VALUES (default, '2020-07-30 15:22:00', 'Jark', 29.71, 104, false);
UPDATE orders SET order_status = true WHERE order_id = 10004;
DELETE FROM orders WHERE order_id = 10004;