Table of contents
Apache Hadoop Ecology - Directory Summary - Continuous Update
1: Start directly from the command line (used in the development environment)
1.1: Create a topic (can be ignored, it will be created automatically by default)
1.2: Start the maxwell acquisition channel by command line
2: Start maxwell through the configuration file (used in production environment)
2.1: partition control of kafka
2.2: Start the maxwell acquisition channel
Apache Hadoop Ecology - Directory Summary - Continuous Update
System environment: centos7
Java environment: Java8
1: Start directly from the command line (used in the development environment)
1.1: Create a topic (can be ignored, it will be created automatically by default)
If the kafka topic does not exist, a single parallelism topic will be created automatically
手动创建topic命令
kafka-topics.sh --zookeeper node100:2181/kafka --create --replication-factor 1 --partitions 1 --topic test_topic_db
查看topic
kafka-topics.sh --zookeeper node100:2181/kafka --list
1.2: Start the maxwell acquisition channel by command line
cd /usr/local/maxwell-1.29.2
[root@node01 maxwell-1.29.2]# bin/maxwell --user='maxwell' --password='pw_maxwell' --host='192.168.1.100' --producer=kafka --kafka.bootstrap.servers=192.168.1.100:9092 --kafka_topic=test_topic_db
参数:
--kafka.bootstrap.servers=部署zk的节点,多个以,隔开
--kafka.bootstrap.servers=192.168.1.104:9092,192.168.1.102:9092
--filter 'exclude: *.*, include:test_maxwell.test'
1.3: Test process
Open the consumer of the kafka console to consume maxwell channel data
kafka-console-consumer.sh --bootstrap-server 192.168.1.100:9092 --topic test_topic_db
Modify the database data with binlog enabled
2: Start maxwell through the configuration file (used in production environment)
Easy management through configuration files
[root@node100 ~]# cd /usr/local/maxwell-1.29.2
[root@node100 ~]# sudo mkdir project_v3
[root@node100 ~]# sudo cp config.properties.example project_v3/kafka_config.properties
[root@node100 ~]# sudo vim project_v3/kafka_config.properties
log_level=info
# *** kafka ***
producer=kafka
# list of kafka brokers
kafka.bootstrap.servers=192.168.1.100:9092
# kafka.bootstrap.servers=hosta:9092,hostb:9092 多个的写法
# 控制入kafka的过滤条件
# filter= exclude: *.*, include: flink_gmall.test_table # 排除所有,只采集flink_gmall.test_table表的数据
kafka_topic=test_topic_db_02
# 控制数据分区模式-表 [database, table,primary_key, transaction_id, column]
producer_partition_by=table # 同一张表的数据,进入同一个分区
# mysql配置
host=192.168.1.100
user=maxwell
password=pw_maxwell
jdbc_options=useSSL=false&serverTimezone=Asia/Shanghai
2.1: partition control of kafka
We generally use maxwell to monitor the data of multiple mysql databases, and then send these data to a topic topic of kafka, and this topic must also be multi-partitioned, in order to improve concurrency.
producer_partition_by=database # the same database data, enter the same partition
手动创建topic命令
kafka-topics.sh --zookeeper node100:2181/kafka --create --replication-factor 1 --partitions 3 --topic test_topic_db
[root@node100 ~]# sudo vim project_v3/kafka_config.properties
# *** kafka ***
kafka_topic=test_topic_db_02
# 控制数据分区模式,可选模式有 库名,表名,主键,列名
#producer_partition_by=database # [database, table,primary_key, transaction_id, column]
producer_partition_by=database # 同一个数据库数据,进入同一个分区
# 如果根据字段自动分区,需要指定字段,这个字段名必须存在
producer_partition_by=column
producer_partition_columns=name # 这里写字段名
2.2: Start the maxwell acquisition channel
cd /usr/local/maxwell-1.29.2
bin/maxwell --config ./project_v3/kafka_config.properties
2.3: Test process
Open the consumer of the kafka console to consume maxwell channel data
kafka-console-consumer.sh --bootstrap-server 192.168.1.100:9092 --topic test_topic_db_02
Modify the database data with binlog enabled