插件版本:
Postgresql:9.6
Debezium:debezium-0.8.3
Sqoop:1.4.7
目标:
1、通过流复制的形式实现数据从Postgresql到Hbase的实时同步
2、Postgresql两个节点在热切换时保证数据的完整性
流程如下图:
要解决的问题:
如上图Postgresql1与Postgresql2之间为热切换关系,一旦Postgresql1挂掉之后,Postgresql2会自动切换成主节点,我们通过Debezium插件以流复制的形式完成Postgresql到Hbase的数据同步,但是Debeium只能通过Master节点进行数据同步,在Postgresql2切换成主节点后,Debezium的流复制节点也要从Postgresql1切换到2,这样在Postgresql2切换为主节点之后,Debezium的流复制切换完成之前,写进Postgresql2的数据会有丢失
解决方法:
1、在Postgresql2切换为主节点后,创建并启动流复制连接
curl 'http://xxxx:8083/connectors' -X POST -i -H "Content-Type:application/json" -d '{
"name": "SUB-HDI-REPLICATION-TEST-A",
"config": {
"slot.name": "debezium",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "xxxx",
"database.port": "5432",
"database.user": "repmgr",
"database.password": "xxxx",
"database.dbname" : "xxxx",
"database.history.kafka.bootstrap.servers":"xxxx:9092",
"database.server.name": "xxxx",
"table.whitelist": "public.nt_test_a",
"time.precision.mode":"connect",
"decimal.handling.mode": "string",
"plugin.name":"wal2json"
}
}'
2、暂停该流复制连接
curl -X PUT 'http://localhost:8083/connectors/SUB-HDI-REPLICATION-TEST-A/pause'
3、通过sqoop拉取之前1个小时的数据
sqoop import \
--connect jdbc:postgresql://xxxxxx:5432/nexttaodb \
--username xxxx --password xxxx \
--query "select * from (select id as ID,name as NAME,code as CODE,state as STATE,info as INFO,to_char(created_at,'yyyy-mm-dd hh24:mi:ss') as CREATED_AT,to_char(updated_at,'yyyy-mm-dd hh24:mi:ss') as UPDATED_AT,create_uid as CREATE_UID,to_char(create_date,'yyyy-mm-dd hh24:mi:ss') as CREATE_DATE,to_char(write_date,'yyyy-mm-dd hh24:mi:ss') as WRITE_DATE,write_uid as WRITE_UID from nt_test_a)x where \$CONDITIONS" \
--hbase-create-table --hbase-table TEST_2 --hbase-row-key id --column-family info --split-by id -m 20 -z
4、恢复之前暂停的流复制连接
curl -X PUT 'http://localhost:8083/connectors/SUB-HDI-REPLICATION-TEST-A/resume'
这样整个切换过程完成,数据不会有丢失!