Kafka + Canal + MySQL cluster deployment

Table of contents

1. What is Canal?

 The background generated by canal:

 The working principle of canal is mainly to use the master-slave replication principle of mysql:

 Canal works:

lab environment: 

Purpose:

2. Installation and deployment of mysql

 mysql download path:

Enable binary logging

Configure the permissions of mysql slave

3. Installation and deployment of kafka

Kafka download path:

kafka placement:

Start the kafka cluster and start the kafka of the three servers

Create topic test

Enter the ZooKeeper client to view the newly created topic

producer consumer test

4. Installation and deployment of Canal

Canal download path:

https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz

5. Kafka consumes mysql data test in real time

1. What is Canal?

 The background generated by canal:

In the early days, Alibaba B2B had a business requirement for cross-computer room synchronization due to the deployment of dual computer rooms in Hangzhou and the United States. However, the early database synchronization business was mainly based on triggers to obtain incremental changes. However, since 2010, Alibaba companies began to gradually try to obtain incremental changes for synchronization based on database log analysis, and thus derived incremental changes. The business of mass subscription & consumption has opened a new era since then.

 Canal is translated into waterways, pipes, and ditches. It is developed by java language. Its positioning is based on database incremental log analysis, providing incremental data subscription & consumption. Currently, it mainly supports mysql/mariadb. The current canal supports source MySQL versions
include 5.1.x, 5.5.x, 5.6.x, 5.7.x, 8.0.x

 The working principle of canal is mainly to use the master-slave replication principle of mysql:

The simple process of master-slave replication: when there is a data change, the master will record the change to the binary log (binary log). These records are called binary log events, which can be viewed through show binlog events. The slave copies the binary log on the master to its relay log; finally, the slave reproduces the events in the relay log and reflects the data changes into its own data.

 Canal works:

1. Canal simulates the interactive protocol of mysql slave, pretends to be mysql slave, and sends dump protocol to mysql master

2. The mysql master receives the dump request and starts to push the binary log to the slave (that is, canal).

3. Canal parses the binary log object (originally byte stream)

lab environment: 

 The purpose of this experiment is to use canal to parse mysql data to kafka, so the experimental environment is Mysql5.7.34 + Canal 1.1.5 + Kafka2.12

Deploy the ZooKeeper cluster before deploying the cluster. 

Purpose:

Kafka consumes data from mysql in real time through canal. That is, as long as mysql changes the library, table, structure and content of the database, canal will spit the data to kafka, so that kafka consumers can consume mysql data in real time

################################################### 

2. Installation and deployment of mysql

mysql deployment version: 5.7.34
mysql deployment machine: node1

This mysql deployment adopts binary installation, and the firewall and selinux rules are closed before using the installation and deployment script. 

 mysql download path:

https://downloads.mysql.com/archives/get/p/23/file/mysql-5.7.34-linux-glibc2.12-x86_64.tar.gz

mysql installation deployment script:

[root@node1 lianxi]# cat install_mysql.sh 
#!/bin/bash
 
#解决软件的依赖关系
yum  install cmake ncurses-devel gcc  gcc-c++  vim  lsof bzip2 openssl-devel ncurses-compat-libs -y

#下载安装包
wget https://downloads.mysql.com/archives/get/p/23/file/mysql-5.7.34-linux-glibc2.12-x86_64.tar.gz
 
#解压mysql二进制安装包
tar  xf  mysql-5.7.34-linux-glibc2.12-x86_64.tar.gz
 
#移动mysql解压后的文件到/usr/local下改名叫mysql
mv mysql-5.7.34-linux-glibc2.12-x86_64 /usr/local/mysql
 
#新建组和用户 mysql
groupadd mysql
#mysql这个用户的shell 是/bin/false 属于mysql组 
useradd -r -g mysql -s /bin/false mysql
 
 
#新建存放数据的目录
mkdir  /data/mysql -p
#修改/data/mysql目录的权限归mysql用户和mysql组所有,这样mysql用户可以对这个文件夹进行读写了
chown mysql:mysql /data/mysql/
#只是允许mysql这个用户和mysql组可以访问,其他人都不能访问
chmod 750 /data/mysql/
 
#进入/usr/local/mysql/bin目录
cd /usr/local/mysql/bin/
 
#初始化mysql
./mysqld  --initialize --user=mysql --basedir=/usr/local/mysql/  --datadir=/data/mysql  &>passwd.txt
 
#让mysql支持ssl方式登录的设置
./mysql_ssl_rsa_setup --datadir=/data/mysql/
 
#获得临时密码
tem_passwd=$(cat passwd.txt |grep "temporary"|awk '{print $NF}')
  #$NF表示最后一个字段
  # abc=$(命令)  优先执行命令,然后将结果赋值给abc 
 
# 修改PATH变量,加入mysql bin目录的路径
#临时修改PATH变量的值
export PATH=/usr/local/mysql/bin/:$PATH
#重新启动linux系统后也生效,永久修改
echo  'PATH=/usr/local/mysql/bin:$PATH' >>/root/.bashrc
 
#复制support-files里的mysql.server文件到/etc/init.d/目录下叫mysqld
cp  ../support-files/mysql.server   /etc/init.d/mysqld
 
#修改/etc/init.d/mysqld脚本文件里的datadir目录的值
sed  -i '70c  datadir=/data/mysql'  /etc/init.d/mysqld
 
#生成/etc/my.cnf配置文件
cat  >/etc/my.cnf  <<EOF
[mysqld_safe]
[client]
socket=/data/mysql/mysql.sock
[mysqld]
socket=/data/mysql/mysql.sock
port = 3306
open_files_limit = 8192
innodb_buffer_pool_size = 512M
character-set-server=utf8
[mysql]
auto-rehash
prompt=\\u@\\d \\R:\\m  mysql>
EOF
 
#修改内核的open file的数量
ulimit -n 1000000
#设置开机启动的时候也配置生效
echo "ulimit -n 1000000" >>/etc/rc.local
chmod +x /etc/rc.d/rc.local
 
 
#启动mysqld进程
service mysqld start
 
#将mysqld添加到linux系统里服务管理名单里
/sbin/chkconfig --add mysqld
#设置mysqld服务开机启动
/sbin/chkconfig mysqld on
 
#初次修改密码需要使用--connect-expired-password 选项
#-e 后面接的表示是在mysql里需要执行命令  execute 执行
#set password='123456';  修改root用户的密码为123456
mysql -uroot -p$tem_passwd --connect-expired-password   -e  "set password='123456';"
 
 
#检验上一步修改密码是否成功,如果有输出能看到mysql里的数据库,说明成功。
mysql -uroot -p'123456'  -e "show databases;"

Enable binary logging

vim /etc/my.cnf

# 开启二进制日志
log_bin
server_id=1

Configure the permissions of mysql slave

The principle of canal is to simulate itself as a mysql slave, so you need to create a user, configure the relevant permissions of mysql slave, and authorize canal to connect to mysql with the permission of being a mysql slave

root@(none) 11:37  mysql>create user canal identified by 'canal';
Query OK, 0 rows affected (0.00 sec)

root@(none) 11:38  mysql>grant select,replication slave,replication client on *.* to 'canal'@'%';
Query OK, 0 rows affected (0.00 sec)

root@(none) 11:42  mysql>flush privileges;
Query OK, 0 rows affected (0.00 sec)

root@(none) 11:42  mysql>show grants for 'canal';
+---------------------------------------------------------------------------+
| Grants for canal@%                                                        |
+---------------------------------------------------------------------------+
| GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%' |
+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

###################################################  

3. Installation and deployment of kafka

Kafka download path:

https://archive.apache.org/dist/kafka/2.1.1/kafka_2.12-2.1.1.tgz

Deployment nodes: node1, node2, node3

Upload the kafka_2.12 installation package to the /usr/local directory, then unzip it, change permissions, and make a soft link

[root@node031 local]# tar -xvf kafka_2.12-2.1.1.tgz
[root@node031 local]# ln -s kafka_2.12-2.1.1 kafka
[root@node031 local]# chown -R hadoop:hadoop kafka
[root@node031 local]# chown -R hadoop:hadoop kafka_2.12-2.1.1

kafka placement:

Operate on node1, node2, and node3 respectively, pay attention to the inconsistent broker.id

[hadoop@node1 config]$ vim server.properties

broker.id=1
zookeeper.connect=node1:2181,node2:2181,node3:2181

Start the kafka cluster and start the kafka of the three servers

[hadoop@node1 kafka]$ bin/kafka-server-start.sh -daemon config/server.properties 

[hadoop@node1 kafka]$ ps -ef |grep kafka
hadoop    50006      1  1 17:11 pts/1    00:00:07 /usr/local/jdk/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xloggc:/usr/local/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/usr/local/kafka/bin/../logs -Dlog4j.configuration=file:bin/../config/log4j.properties -cp /usr/local/kafka/bin/../libs/activation-1.1.1.jar:/usr/local/kafka/bin/../libs/aopalliance-repackaged-2.6.1.jar:/usr/local/kafka/bin/../libs/argparse4j-0.7.0.jar:/usr/local/kafka/bin/../libs/audience-annotations-0.5.0.jar:/usr/local/kafk

Create topic test

Create topic test and example

[root@node1 bin]# ./kafka-topics.sh --create --zookeeper node1:2181 --replication-factor 1 --partitions 1 --topic test
Created topic "test".
[root@node1 bin]# ./kafka-topics.sh --create --zookeeper node1:2181 --replication-factor 1 --partitions 1 --topic example
Created topic "example".

Enter the ZooKeeper client to view the newly created topic

[root@node1 bin]# ./zkCli.sh 
[zk: localhost:2181(CONNECTED) 0] ls /
[admin, brokers, cluster, config, consumers, controller, controller_epoch, feature, hbase, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]
[zk: localhost:2181(CONNECTED) 0] ls /brokers/topics
[example, test]

producer consumer test

# 创建一个生产者
[root@node1 bin]# ./kafka-console-producer.sh --broker-list 192.168.20.11:9092 --topic test
# 创建一个消费者
[root@node1 bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.20.11:9092 --topic test
# 在生产者里面产生数据看消费者能否消费到数据。
[root@node1 bin]# ./kafka-console-producer.sh --broker-list 192.168.20.11:9092 --topic test
>hello world
>this is kafka test
>fef
>fef
# 消费数据正常
[root@node1 bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.20.11:9092 --topic test
hello world
this is kafka test
fef
fef

################################################### 

4. Installation and deployment of Canal

Canal download path:


https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz

Canal deployment node: node1

Upload the binary package of canal1.1.5 to /usr/local, and unzip it

[root@node1 local]# tar -xvf canal.deployer-1.1.5.tar.gz

Make a soft link

[root@node1 local]# ln -s canal-1.1.5 canal

change permissions

[root@node1 local]# chown -R hadoop:hadoop canal
[root@node1 local]# chown -R hadoop:hadoop canal-1.1.5

Change canal configuration:

As long as canal has two configuration files to modify, canal.properties modify the canal service mode to kafka, and set the kafka address

[root@node1 conf]# vim canal.properties
canal.serverMode = kafka
kafka.bootstrap.servers = node1:9092,node2:9092,node3:9092

example/instance.properties configure mysql user, password, kafka topic, etc. 

cd /usr/local/canal/conf/example
vim instance.properties

# 数据库地址
canal.instance.master.address=192.168.20.11:3306
# 数据库用户
canal.instance.dbUsername=canal
# 数据库密码
canal.instance.dbPassword=canal
# 数据库名.要监控的表名
canal.instance.filter.regex=test\..*
# topic name  canal从mysql获取的数据会存入这个主题
canal.mq.topic=canal_test

start canal

[root@node1 bin]# ./startup.sh
[root@node1 bin]# jps
42848 ResourceManager
67395 Kafka
68775 ConsoleConsumer
66167 QuorumPeerMain
69110 CanalLauncher
40824 DataNode
69135 Jps

###################################################  

5. Kafka consumes mysql data test in real time

Create a topic canal_test

./kafka-topics.sh --create --zookeeper node1:2181,node2:2181,node3:2181 --replication-factor 1 --partitions 1 --topic canal_test

Create a kafka consumer to consume canal_test topic.

[root@node1 bin]# ./kafka-console-consumer.sh --bootstrap-server node1:9092 --topic canal_test --from-beginning

Log in to mysql, operate on the test database, and wait for canal to spit data from mysql to kafka.

root@(none) 17:15  mysql>use test;
Database changed
root@test 17:15  mysql>create table company(id int);
Query OK, 0 rows affected (0.01 sec)

root@test 17:15  mysql>insert into company(id) values(1);
Query OK, 1 row affected (0.01 sec)

root@test 17:16  mysql>select * from company;
+------+
| id   |
+------+
|    1 |
+------+
1 row in set (0.01 sec)
root@test 17:16  mysql>insert into company(id) values(2);
Query OK, 1 row affected (0.00 sec)

The data is successfully consumed.

[root@node1 bin]# ./kafka-console-consumer.sh --bootstrap-server node1:9092 --topic canal_test --from-beginning
{"data":null,"database":"test","es":1680599305000,"id":1,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"DROP TABLE `grands` /* generated by server */","sqlType":null,"table":"grands","ts":1680599658035,"type":"ERASE"}
{"data":null,"database":"test","es":1680599735000,"id":2,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"create table company(id int)","sqlType":null,"table":"company","ts":1680599735417,"type":"CREATE"}
{"data":[{"id":"1"}],"database":"test","es":1680599764000,"id":3,"isDdl":false,"mysqlType":{"id":"int"},"old":null,"pkNames":null,"sql":"","sqlType":{"id":4},"table":"company","ts":1680599764271,"type":"INSERT"}

Guess you like

Origin blog.csdn.net/qq_48391148/article/details/129958282