Debezium for MySQL+Kafka+Confluent Schema Registry环境搭建

由于公司业务需要，需要把MySQL中的binlog信息发送到kafka上，给相关应用去消费，对数据变化作出响应。
笔者用的软件信息如下：

1.Kafka:kafka_2.11-2.0.0.tgz
2.Confluent:confluent-oss-5.0.0-2.11.tar.gz
3.Debezium:debezium-connector-mysql-0.8.1.Final-plugin.tar.gz
笔者的三台机器信息如下（真实ip信息被隐藏了）：

ali-18 *.*.*.18
ali-36 *.*.*.36
ali-37 *.*.*.37

把debezium-connector-mysql的压缩包解压放到Confluent的解压后的插件目录(share/java)中：

解压命令：tar -xzf debezium-connector-mysql-0.8.1.Final-plugin.tar.gz 

[root@ali-37 java]# pwd
/opt/confluent-5.0.0/share/java
[root@ali-37 java]# ls
confluent-common                                      kafka                        kafka-connect-s3              ksql
debezium-connector-mongodb-0.8.1.Final-plugin.tar.gz  kafka-connect-elasticsearch  kafka-connect-storage-common  rest-utils
debezium-connector-mysql                              kafka-connect-hdfs           kafka-rest                    schema-registry
debezium-connector-mysql-0.8.1.Final-plugin.tar.gz    kafka-connect-jdbc           kafka-serde-tools
[root@ali-37 java]#

注意:包含mysql连接器的jar包的文件夹debezium-connector-mysql一定要直接放在插件目录下。不同的插件的jar包放在不通的文件夹下，可以防止jar包冲突，因为不同的插件的jar包是隔离的。三台Kafka connect worker机器上的Confluent插件目录下都要有插件文件夹（因为connector提交到一个分布式的worker集群后，不一定在哪台worker上调度运行）。

由于笔者需要用Avro格式的kafka消息和分布式的kafka connect，因此需要修改Confluent的schema-registry下的配置:

[root@ali-37 schema-registry]# pwd
/opt/confluent-5.0.0/etc/schema-registry
[root@ali-37 schema-registry]# ls
connect-avro-distributed.properties  connect-avro-standalone.properties  log4j.properties  schema-registry.properties
[root@ali-37 schema-registry]#

需要配置的是上面的connect-avro-distributed.properties和schema-registry.properties。
其中，schema-registry.properties中只配置了一下listeners和zk集群信息：

listeners=http://0.0.0.0:18081
#zookeeper集群信息
kafkastore.connection.url=ali-18:2182,ali-36:2182,ali-37:2182

Kafka Connect worker的配置文件connect-avro-distributed.properties的配置如下：

# Sample configuration for a distributed Kafka Connect worker that uses Avro serialization and
# integrates the the Schema Registry. This sample configuration assumes a local installation of
# Confluent Platform with all services running on their default ports.

# Bootstrap Kafka servers. If multiple servers are specified, they should be comma-separated.
bootstrap.servers=ali-18:9092,ali-36:9092,ali-37:9092

# The group ID is a unique identifier for the set of workers that form a single Kafka Connect
# cluster
group.id=connect-cluster

# The converters specify the format of data in Kafka and how to translate it into Connect data.
# Every Connect user will need to configure these based on the format they want their data in
# when loaded from or stored into Kafka
#注意schema-registry-url后面的地址要有http://前缀
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://ali-18:18081,http://ali-36:18081,http://ali-37:18081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://ali-18:18081,http://ali-36:18081,http://ali-37:18081

# Internal Storage Topics.
#
# Kafka Connect distributed workers store the connector and task configurations, connector offsets,
# and connector statuses in three internal topics. These topics MUST be compacted.
# When the Kafka Connect distributed worker starts, it will check for these topics and attempt to create them
# as compacted topics if they don't yet exist, using the topic name, replication factor, and number of partitions
# as specified in these properties, and other topic-specific settings inherited from your brokers'
# auto-creation settings. If you need more control over these other topic-specific settings, you may want to
# manually create these topics before starting Kafka Connect distributed workers.
#
# The following properties set the names of these three internal topics for storing configs, offsets, and status.
config.storage.topic=connect-configs
offset.storage.topic=connect-offsets

# The following properties set the replication factor for the three internal topics, defaulting to 3 for each
# and therefore requiring a minimum of 3 brokers in the cluster. Since we want the examples to run with
# only a single broker, we set the replication factor here to just 1. That's okay for the examples, but
# ALWAYS use a replication factor of AT LEAST 3 for production environments to reduce the risk of
# losing connector offsets, configurations, and status.
config.storage.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1

# The config storage topic must have a single partition, and this cannot be changed via properties.
# Offsets for all connectors and tasks are written quite frequently and therefore the offset topic
# should be highly partitioned; by default it is created with 25 partitions, but adjust accordingly
# with the number of connector tasks deployed to a distributed worker cluster. Kafka Connect records
# the status less frequently, and so by default the topic is created with 5 partitions.
#offset.storage.partitions=25
#status.storage.partitions=5

# The offsets, status, and configurations are written to the topics using converters specified through
# the following required properties. Most users will always want to use the JSON converter without schemas.
# Offset and config data is never visible outside of Connect in this format.
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false

# Confluent Control Center Integration -- uncomment these lines to enable Kafka client interceptors
# that will report audit data that can be displayed and analyzed in Confluent Control Center
# producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
# consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor

# These are provided to inform the user about the presence of the REST host and port configs
# Hostname & Port for the REST API to listen on. If this is set, it will bind to the interface used to listen to requests.
#rest.host.name=0.0.0.0
rest.port=18083

# The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
#rest.advertised.host.name=0.0.0.0
#rest.advertised.port=8083

# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include
# any combination of:
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
# Replace the relative path below with an absolute path if you are planning to start Kafka Connect from within a
# directory other than the home directory of Confluent Platform.
plugin.path=share/java

其实就是配置了kafka集群，schema-registry-url还有开放了一个rest端口18083（用于向kafka connect worker 提交 connector配置）。

下面需要编写mysql connector的配置信息了，先创建一个目录用于存放配置信息(connector配置信息只要放在一台机器上就行了，用curl命令提交的kafka connect的worker上之后就可以删除掉了，建议保留，以后可以参考或者修改)：

[root@ali-37 etc]# pwd
/opt/confluent-5.0.0/etc
[root@ali-37 etc]# mkdir kafka-connect-debezium
[root@ali-37 etc]# ls
confluent-common  kafka-connect-debezium       kafka-connect-hdfs  kafka-connect-s3              kafka-rest  rest-utils
kafka             kafka-connect-elasticsearch  kafka-connect-jdbc  kafka-connect-storage-common  ksql        schema-registry
[root@ali-37 etc]#

debezium mysql connector配置如下：

{
 "name":"debezium-mysql-source-3308",
 "config":{
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "172.16.6.21",
    "database.port": "3308",
    "database.user": "**",
    "database.password": "**",
    "database.server.id": "184000",
    "database.server.name": "prod",
    "table.whitelist":"simu_affair_release.affair_member,simu_task_release.*\\.(task_member),simu_ann_release.*\\.(announcement_member|announcement_follower)",
    "database.history.kafka.bootstrap.servers": "ali-18:9092,ali-36:9092,ali-37:9092",
    "database.history.kafka.topic": "schema-changes.prod" ,
    "include.schema.changes": "true" ,
    "mode":"incrementing",
    "incrementing.column.name":"id",
    "database.history.skip.unparseable.ddl":"true"
  }
}

下面开始启动了：
1.zk和kafka集群启动
2.启动schema-registry（三台机器上都要执行）
cd /opt/confluent-5.0.0/ && ./bin/schema-registry-start -daemon ./etc/schema-registry/schema-registry.properties
执行完，用jps命令查看进程，可以看到SchemaRegistryMain进程
3.启动kafka connect worker（三台机器上都要执行）
cd /opt/confluent-5.0.0/ &&./bin/connect-distributed -daemon ./etc/schema-registry/connect-avro-distributed.properties
执行完，用jps命令查看进程，可以看到ConnectDistributed进程


[root@ali-37 kafka-connect-debezium]#cd /opt/confluent-5.0.0/etc/kafka-connect-debezium
[root@ali-37 kafka-connect-debezium]#curl -X POST -H "Content-Type: application/json" --data@debezium_mysql_source_affair.json http://ali-36:18083/connectors
{"name":"debezium-mysql-source-affair","config":{"connector.class":"io.debezium.connector.mysql.MySqlConnector","database.hostname":"**","database.port":"3306","database.user":"**","database.password":"**","database.server.id":"184000","database.server.name":"prod","table.whitelist":"simu_affair_release.affair_member,simu_affair_release.role","database.history.kafka.bootstrap.servers":"ali-18:9092,ali-36:9092,ali-37:9092","database.history.kafka.topic":"schema-changes.prod","include.schema.changes":"true","mode":"incrementing","incrementing.column.name":"id","database.history.skip.unparseable.ddl":"true","snapshot.mode":"schema_only_recovery","name":"debezium-mysql-source-affair"},"tasks":[],"type":null}

如果一切正常，应该可以看到kafka集群上多了一些和表名相关的topic，topic命名规则为debezium mysql connector配置文件中配置的serverName.databaseName.tableName。

Debezium for MySQL+Kafka+Confluent Schema Registry环境搭建

猜你喜欢