Three formats of mysql biglog:
- 1) STATMENT mode: SQL statement-based replication (statement-based replication, SBR), each SQL statement that will modify the data will be recorded in the binlog
-
Advantages: binlog has fewer logs, reduces disk IO, and improves performance
-
Disadvantages: The following will cause inconsistent data in the master-slave (such as sleep() function, last_insert_id(), and user-defined functions(udf), etc., there will be problems)
-
-
2) Row-based replication (RBR) : Record changes in the final data
-
Advantages: There are no disadvantages of statement mode.
-
Disadvantages: A large number of logs will be generated, especially when altering the table, the logs will skyrocket.
-
-
3) Mixed-based replication (MBR): The above two modes are mixed. General replication uses the STATEMENT mode to save the binlog. For operations that cannot be replicated in the STATEMENT mode, the ROW mode is used to save the binlog. MySQL will follow the executed SQL The statement selects the log saving method.
-
Note: Because the statement only has sql, there is no data, and the original change log cannot be obtained, so it is generally recommended to use ROW mode
Comparison between various ways of parsing biglog:
Turn on the binlog function of mysql and configure maxwell:
- 1) Add mysql ordinary user maxwell:
##进入mysql客户端,然后执行以下命令,进行授权
mysql -uroot -p
set global validate_password_policy=LOW;
set global validate_password_length=6;
CREATE USER 'maxwell'@'%' IDENTIFIED BY '123456';
GRANT ALL ON maxwell.* TO 'maxwell'@'%';
GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'%';
flush privileges;
- 2) Turn on the binlog mechanism of mysql:
## shell 进入一台节点编辑 mysql 的配置文件 my.cnf
sudo vim /etc/my.cnf
## 指定 biglog 格式 为row
log-bin= /var/lib/mysql/mysql-bin
binlog-format=ROW
server_id=1
## shell 重启mysql 服务
sudo service mysqld restart
## shell 验证
mysql -uroot -p
mysql> show variables like '%log_bin%';
- 3) Install max-well to collect mysql data in real time:
-
Download the max-well installation package, download address:
https://github.com/zendesk/maxwell/releases/download/v1.21.1/maxwell-1.21.1.tar.gz - Unzip and modify the configuration file:
-
## 编辑配置文件
cd xx/maxwell-1.21.1
cp config.properties.example config.properties
vim config.properties
## 指定数据 sink 为 kafka
producer=kafka
##根据实际的 kafka 配置
kafka.bootstrap.servers=node01:9092,node02:9092,node03:9092
host=node03.liz.com
port=3306
user=maxwell
password=123456
## 这个 topic 要手动创建
kafka_topic=maxwell_kafka
- 4) Start the service:
- Start zookeeper and kafka successively (ignored during installation)
- 创建 topic:kafka-topics.sh --create --topic maxwell_kafka --partitions 3 --replication-factor 2 --zookeeper node01:2181
- Dynamic maxwell: cd xx / maxwell-1.21.1 bin / maxwell
- 5) Insert data and test
- Create a table in mysql:
-
CREATE DATABASE IF NOT EXISTS `test` DEFAULT CHARACTER SET utf8; USE `test`; /*Table structure for table `myuser` */ DROP TABLE IF EXISTS `myuser`; CREATE TABLE `myuser` ( `id` int(12) NOT NULL, `name` varchar(32) DEFAULT NULL, `age` varchar(32) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; /*Data for the table `myuser` */ insert into `myuser`(`id`,`name`,`age`) values (1,'zhangsan',NULL),(2,'xxx',NULL),(3,'ggg',NULL),(5,'xxxx',NULL),(8,'skldjlskdf',NULL),(10,'ggggg',NULL),(99,'ttttt',NULL),(114,NULL,NULL),(121,'xxx',NULL);
-
- Start the kafka consumer thread to consume the data in kafka ( --bootstrap-server modify according to the zookeeper address installed by yourself ): kafka-console-consumer.sh --topic maxwell_kafka --from-beginning --bootstrap-server node01:9092, node02:9092,node03:9092
- Create a table in mysql:
- 6) Sample configuration of maxwell-kafka in the project (/xxx/maxwell-1.22.1/my_project.properties):
-
## my_project.properties log_level=INFO producer=kafka kafka.bootstrap.servers=node01:9092,node02:9092,node03:9092 ## maxwell 所在机器 host=yourhost.com user=maxwell password=123456 producer_ack_timeout = 600000 port=3306 ######### output format stuff ############### output_binlog_position=ture output_server_id=true output_thread_id=ture output_commit_info=true output_row_query=true output_ddl=false output_nulls=true output_xoffset=true output_schema_id=true ######### output format stuff ############### kafka_topic= yourtopic ## kafka recorde key 的生成方式,支持 array 和 hash kafka_key_format=hash kafka.compression.type=snappy kafka.retries=5 kafka.acks=all ## kafka 分区方式:表主键 producer_partition_by=primary_key kafka_partition_hash=murmur3 ############ kafka stuff ############# ## 在处理bootstrap时,是否会阻塞正常的binlog解析 async(异步)不会阻塞 ############## misc stuff ########### bootstrapper=async ############## misc stuff ########## ## 配置过滤:库名.表名, 下面的配置为:只采集db1 的 三张表 ## 支持正则过滤:include:db1:/table\\d{1}/ ############## filter ############### filter=exclude:*.*, include: db1.table1,include: db1.table2,include: db1.table3 ############## filter ###############
-
- 7) Maxwell-kafka startup script:
-
#!/bin/bash case $1 in "start" ){ nohup /xxx/maxwell-1.22.1/bin/maxwell --daemon --config /xxx/maxwell-1.22.1/my_project.properties 2>&1 >> /xxx/maxwell-1.22.1/maxwell.log & };; "stop"){ ps -ef | grep Maxwell | grep -v grep |awk '{print $2}' | xargs kill };; esac
-