Flink CDC Series: Importing TiDB CDC into Elasticsearch

1. Start the TiDB cluster through docker

git clone https://github.com/pingcap/tidb-docker-compose.git

Replace the docker-compose.yml file in the directory tidb-docker-compose, the content is as follows:

version: "2.1"

services:
  pd:
    image: pingcap/pd:v5.3.1
    ports:
      - "2379:2379"
    volumes:
      - ./config/pd.toml:/pd.toml
      - ./logs:/logs
    command:
      - --client-urls=http://0.0.0.0:2379
      - --peer-urls=http://0.0.0.0:2380
      - --advertise-client-urls=http://pd:2379
      - --advertise-peer-urls=http://pd:2380
      - --initial-cluster=pd=http://pd:2380
      - --data-dir=/data/pd
      - --config=/pd.toml
      - --log-file=/logs/pd.log
    restart: on-failure

  tikv:
    image: pingcap/tikv:v5.3.1
    ports:
      - "20160:20160"
    volumes:
      - ./config/tikv.toml:/tikv.toml 
      - ./logs:/logs           
    command:
      - --addr=0.0.0.0:20160
      - --advertise-addr=tikv:20160
      - --data-dir=/data/tikv
      - --pd=pd:2379
      - --config=/tikv.toml
      - --log-file=/logs/tikv.log
    depends_on:
      - "pd"
    restart: on-failure

  tidb:
    image: pingcap/tidb:v5.3.1
    ports:
      - "4000:4000"
    volumes:
      - ./config/tidb.toml:/tidb.toml
      - ./logs:/logs
    command:
      - --store=tikv
      - --path=pd:2379
      - --config=/tidb.toml
      - --log-file=/logs/tidb.log
      - --advertise-address=tidb
    depends_on:
      - "tikv"
    restart: on-failure
    
  elasticsearch:
     image: elastic/elasticsearch:7.6.0
     container_name: elasticsearch
     environment:
       - cluster.name=docker-cluster
       - bootstrap.memory_lock=true
       - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
       - discovery.type=single-node
     ports:
       - "9200:9200"
       - "9300:9300"
     ulimits:
       memlock:
         soft: -1
         hard: -1
       nofile:
         soft: 65536
         hard: 65536
         
  kibana:
     image: elastic/kibana:7.6.0
     container_name: kibana
     ports:
       - "5601:5601"
     volumes:
       - /var/run/docker.sock:/var/run/docker.sock

The containers included in this Docker Compose are:

  • TiDB cluster: tikv, pd, tidb.
  • Elasticsearch: The orders table will be joined with the products table, and the result of the join will be written into Elasticsearch.
  • Kibana: Visualize data in Elasticsearch.

This machine adds host mapping pd and tikv mapping 127.0.0.1. Run the following command in the directory where docker-compose.yml is located to start all containers:

docker-compose up -d
mysql -h 127.0.0.1 -P 4000 -u root # Just test tidb cluster is ready,if you have install mysql local.

This command automatically starts all containers defined in the Docker Compose configuration in detached mode. You can use docker ps to observe whether the above container is started normally. You can also visit http://localhost:5601/ to see if Kibana is running properly.

In addition, you can stop and delete all containers with the following command:

docker-compose down

2. Download Flink and the required dependencies

Download Flink 1.17.1 and extract it to the directory flink-1.17.1

https://archive.apache.org/dist/flink/flink-1.17.1/flink-1.17.1-bin-scala_2.12.tgz

Download the dependencies listed below and put them in the directory flink-1.17.1/lib/:

3. Create tables and prepare data in TiDB database

Create database and table products, orders, and insert data:

-- TiDB
CREATE DATABASE mydb;
USE mydb;
CREATE TABLE products (
                         id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
                         name VARCHAR(255) NOT NULL,
                         description VARCHAR(512)
) AUTO_INCREMENT = 101;

INSERT INTO products
VALUES (default,"scooter","Small 2-wheel scooter"),
      (default,"car battery","12V car battery"),
      (default,"12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3"),
      (default,"hammer","12oz carpenter's hammer"),
      (default,"hammer","14oz carpenter's hammer"),
      (default,"hammer","16oz carpenter's hammer"),
      (default,"rocks","box of assorted rocks"),
      (default,"jacket","water resistent black wind breaker"),
      (default,"spare tire","24 inch spare tire");

CREATE TABLE orders (
                       order_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
                       order_date DATETIME NOT NULL,
                       customer_name VARCHAR(255) NOT NULL,
                       price DECIMAL(10, 5) NOT NULL,
                       product_id INTEGER NOT NULL,
                       order_status BOOLEAN NOT NULL -- Whether order has been placed
) AUTO_INCREMENT = 10001;

INSERT INTO orders
VALUES (default, '2020-07-30 10:08:22', 'Jark', 50.50, 102, false),
      (default, '2020-07-30 10:11:09', 'Sally', 15.00, 105, false),
      (default, '2020-07-30 12:00:30', 'Edward', 25.25, 106, false);

4. Start the Flink cluster, and then start the SQL CLI

Use the following command to jump to the Flink directory

cd flink-1.17.1

Start the Flink cluster with the following command

./bin/start-cluster.sh

If the startup is successful, you can access the Flink Web UI at http://localhost:8081/, as shown below:
insert image description here
Use the following command to start the Flink SQL CLI

./bin/sql-client.sh

After the startup is successful, you can see the following page:

insert image description here

5. Create tables using Flink DDL in Flink SQL CLI

First, turn on checkpoint and do a checkpoint every 3 seconds

-- Flink SQL                   
Flink SQL> SET execution.checkpointing.interval = 3s;

Use Flink SQL CLI to create corresponding tables for synchronizing the data of these underlying database tables

Flink SQL> CREATE TABLE products (
    id INT,
    name STRING,
    description STRING,
    PRIMARY KEY (id) NOT ENFORCED
  ) WITH (
    'connector' = 'tidb-cdc',
    'tikv.grpc.timeout_in_ms' = '20000',
    'pd-addresses' = '127.0.0.1:2379',
    'database-name' = 'mydb',
    'table-name' = 'products'
  );

Flink SQL> CREATE TABLE orders (
   order_id INT,
   order_date TIMESTAMP(3),
   customer_name STRING,
   price DECIMAL(10, 5),
   product_id INT,
   order_status BOOLEAN,
   PRIMARY KEY (order_id) NOT ENFORCED
 ) WITH (
    'connector' = 'tidb-cdc',
    'tikv.grpc.timeout_in_ms' = '20000',
    'pd-addresses' = '127.0.0.1:2379',
    'database-name' = 'mydb',
    'table-name' = 'orders'
);

Flink SQL> CREATE TABLE enriched_orders (
   order_id INT,
   order_date DATE,
   customer_name STRING,
   order_status BOOLEAN,
   product_name STRING,
   product_description STRING,
   PRIMARY KEY (order_id) NOT ENFORCED
 ) WITH (
     'connector' = 'elasticsearch-7',
     'hosts' = 'http://localhost:9200',
     'index' = 'enriched_orders_1'
 );

Insert associated data into ElasticSearch

Flink SQL> INSERT INTO enriched_orders
  SELECT o.order_id, o.order_date, o.customer_name, o.order_status, p.name, p.description
  FROM orders AS o
  LEFT JOIN products AS p ON o.product_id = p.id;

Six, Kibana view ElasticSearch data

Check whether the final result is written to ElasticSearch, and you can see the data in ElasticSearch in Kibana.

First visit http://localhost:5601/app/kibana#/management/kibana/index_pattern to create index pattern enriched_orders.

insert image description here
Then you can see the written data at http://localhost:5601/app/kibana#/discover.
insert image description here

7. Add, delete, and modify data in TiDB, and observe the results in ElasticSearch

Make some modifications to the TiDB database through the following SQL statement, and then you can see that every time a SQL statement is executed, the data in Elasticsearch will be updated in real time.

INSERT INTO orders
VALUES (default, '2020-07-30 15:22:00', 'Jark', 29.71, 104, false);

UPDATE orders SET order_status = true WHERE order_id = 10004;

DELETE FROM orders WHERE order_id = 10004;

Guess you like

Origin blog.csdn.net/zhengzaifeidelushang/article/details/132244913