本文主要是根据官方文档指导,结合实际主机情况,在CentOS 7上通过TiCDC 将TiDB增量数据同步到Kafka。
前置步骤
TiDB和Kafka的安装请参照下述链接。
CentOS 7使用TiUP部署TiDB
CentOS 7安装Zookeeper和Kafka
安装TiCDC
在中控机,编写TiCDC的部署脚本
global:
user: "tidb"
ssh_port: 2333
deploy_dir: "/tidb-deploy"
data_dir: "/tidb-data"
cdc_servers:
- host: 192.168.58.10
gc-ttl: 86400
data_dir: /data/deploy/install/data/cdc-8300root@yov-PC:/home/yov/Desktop#
执行安装指令。
# 安装tiCDC
$ tiup cluster scale-out tidb-test cdc.yaml -u root -p
# 查看集群组件状态
$ tiup cluster display tidb-test
tiup is checking updates for component cluster ...
Starting component `cluster`: /root/.tiup/components/cluster/v1.11.3/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v6.5.0
Deploy user: root
SSH type: builtin
Dashboard URL: http://192.168.58.10:2379/dashboard
Grafana URL: http://192.168.58.10:3000
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ --------
192.168.58.10:8300 cdc 192.168.58.10 8300 linux/x86_64 Up /data/deploy/install/data/cdc-8300 /tidb-deploy/cdc-8300
192.168.58.10:3000 grafana 192.168.58.10 3000 linux/x86_64 Up - /tidb-deploy/grafana-3000
192.168.58.10:2379 pd 192.168.58.10 2379/2380 linux/x86_64 Up|L|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
192.168.58.10:9090 prometheus 192.168.58.10 9090/12020 linux/x86_64 Up /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
192.168.58.10:4000 tidb 192.168.58.10 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
192.168.58.10:20160 tikv 192.168.58.10 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
192.168.58.10:20161 tikv 192.168.58.10 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161
192.168.58.10:20162 tikv 192.168.58.10 20162/20182 linux/x86_64 Up /tidb-data/tikv-20162 /tidb-deploy/tikv-20162
Total nodes: 8
创建同步通道,将增量数据同步到Kafka
在安装TICDC的主机上,创建同步通道
$ cd /tidb-deploy/cdc-8300/bin
# 创建同步任务
$ ./cdc cli changefeed create --server=http://192.168.58.10:8300 --sink-uri="kafka://192.168.58.10:9092/test-topic?protocol=canal-json&kafka-version=3.4.0&partition-num=1&max-message-bytes=67108864&replication-factor=1" --changefeed-id="simple-replication-task"
参数说明
--server
: TiCDC的主机IP和端口
--changefeed-id
:同步任务的 ID,格式需要符合正则表达式^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$
。如果不指定该 ID,TiCDC 会自动生成一个 UUID(version 4 格式)作为 ID。
--sink-uri
: 同步任务的下游地址,具体信息如下:
192.168.58.10:9092
:为kafka访问的主机地址和端口;
test-topic
:为kafka topic名;
protocol
:输出到 Kafka 的消息协议,可选值有 canal-json、open-protocol、canal、avro、maxwell;
partition-num
:下游 Kafka partition 数量(可选,不能大于实际 partition 数量,否则创建同步任务会失败,默认值 3)。
max-message-bytes
:每次向 Kafka broker 发送消息的最大数据量(可选,默认值 10MB)建议调大。
replication-factor
:Kafka 消息保存副本数(可选,默认值 1)。
验证增量同步成功
增量数据解析
{
"id":0,
"database":"test",
"table":"student",
"pkNames":["s_id"],
"isDdl":false,
"type":"INSERT",
"es":1678934551053,
"ts":1678934551348,
"sql":"",
"sqlType":{
"s_id":4,"s_name":12},
"mysqlType":{
"s_id":"int","s_name":"varchar"},
"old":null,
"data":[{
"s_id":"2","s_name":"Jack"}]}
database
: 数据库名
table
: 表名
pkNames
: 主键名数组
isDdl
: 是否为数据库结构变更
type
: 增量类型,其中修改数据时,如果修改了数据主键,会被拆分为一条DELETE,一条INSERT;
sqlType
: 对应TiDB中表的数据类型编码;
mysqlType
:对应MySQL中的数据类型;
old
:旧数据,当type为DELETE或INSERT时,该值为null;
data
: 最新数据。
Kafka消费者查看增量数据
# 在kafka所在主机启动消费者
$ ./kafka-console-consumer.sh --bootstrap-server 192.168.58.10:9092 --topic test-topic --from-beginning
# 此时去tidb数据库,进行数据操作,比如插入一条数据,此时消费者控制台会输出以下信息:
{
"id":0,"database":"test","table":"student","pkNames":["s_id"],"isDdl":false,"type":"INSERT","es":1678934551053,"ts":1678934551348,"sql":"","sqlType":{
"s_id":4,"s_name":12},"mysqlType":{
"s_id":"int","s_name":"varchar"},"old":null,"data":[{
"s_id":"2","s_name":"Jack"}]}
Java代码读取Kafka消费者信息
package kafka;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Date;
import java.util.Properties;
public class KafkaDemo {
public static void main(String[] args) throws Exception {
Properties prop = new Properties();
prop.put("bootstrap.servers", "192.168.58.10:9092");
prop.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
prop.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
prop.put("group.id", "con-1");
prop.put("auto.offset.reset","latest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(prop);
ArrayList<String> topics = new ArrayList<>();
topics.add("test-topic");
consumer.subscribe(topics);
while (true) {
ConsumerRecords<String, String> poll = consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : poll) {
System.out.println("增量发生时间:" + new Date(record.timestamp()));
System.out.println("原始数据:" + record.value());
JSONObject object = JSONObject.parseObject(record.value());
if (object.getBoolean("isDdl")) {
continue;
}
System.out.println("增量类型:" + object.get("type"));
System.out.println("数据库名:" + object.get("database"));
System.out.println("表名:" + object.get("table"));
JSONArray pkArray = object.getJSONArray("pkNames");
String pkName = pkArray.get(0).toString();
System.out.println("主键名:" + pkName);
JSONArray oldArray = object.getJSONArray("old");
JSONArray dataArray = object.getJSONArray("data");
String newPkValue = dataArray.getJSONObject(0).getString(pkName);
System.out.println("主键值:" + newPkValue);
if (oldArray != null) {
JSONObject old = oldArray.getJSONObject(0);
String oldPkValue = old.getString(pkName);
if (!oldPkValue.equals(newPkValue)) {
System.out.println("发生主键变更,旧主键值:" + newPkValue);
}
}
}
}
}
}