canal overview
用处
The main purpose of canal is to analyze the incremental logs of the MySQL database, and provide subscription and consumption of incremental data. Simply put, it can synchronize the incremental data of MySQL in real time, and supports synchronization to data storage such as MySQL, Elasticsearch, and HBase.
工作原理
Canal will simulate the interaction protocol between the MySQL main library and the slave library, thereby pretending to be the MySQL slave library, and then send the dump protocol to the MySQL main library. After receiving the dump request, the MySQL main library will push the binlog to Canal, and Canal will synchronize the data by parsing the binlog to other storage .
1. version
Here my MySQL and ES are both installed 阿里云服务器上
, the MySQL version is 5.7 , and the ES version is 7.14 .
2. Download
The download address is Github . The version I downloaded here v1.1.5-alpha-2
is just download the first three components.
3. Upload
Due to the large memory required for Canal startup, my Alibaba Cloud server configuration is low, so here I install and configure Canal on the local Liunx server.
Upload the three components to the Linux server.
4. Configure MySQL
Since canal realizes data synchronization by subscribing to MySQL's binlog , we need to enable MySQL's binlog writing function and set it binlog-format
to ROW mode.
My MySQL is running in Alibaba Cloud Docker and mounted, so just modify the external my.cnf file and add the following configuration
[mysqld]
## 设置server_id,同一局域网中需要唯一
server_id=101
## 指定不需要同步的数据库名称
binlog-ignore-db=mysql
## 开启二进制日志功能
log-bin=mall-mysql-bin
## 设置二进制日志使用内存大小(事务)
binlog_cache_size=1M
## 设置使用的二进制日志格式(mixed,statement,row)
binlog_format=row
## 二进制日志过期清理时间。默认值为0,表示不自动清理。
expire_logs_days=7
## 跳过主从复制中遇到的所有错误或指定类型的错误,避免slave端复制中断。
## 如:1062错误是指一些主键重复,1032错误是因为主从数据库数据不一致
slave_skip_errors=1062
After the configuration is complete, you need to restart MySQL. After restarting, use the following command to check whether binlog is enabled
命令
# 查看binlog是否启用
show variables like '%log_bin%';
# 查看下MySQL的binlog模式;
show variables like 'binlog_format%';
Next, you need to create an account with slave library permissions to subscribe to binlog. The account created here is
canal:canal
CREATE USER canal IDENTIFIED BY 'canal';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;
Create the database bookstore used for the test , and then create a table book
,
create table book(
id int(32) auto_increment primary key,
book_name varchar(32),
price double,
introduce varchar(2048),
press varchar(32),
author varchar(32),
publish_date DATE
)
5. Configure canal-server
Unzip our upload canal.deployer-1.1.5-SNAPSHOT.tar.gz
to the specified directory
tar -zxvf canal.deployer-1.1.5-SNAPSHOT.tar.gz -C /指定路径
directory after decompression
├── bin
│ ├── restart.sh
│ ├── startup.bat
│ ├── startup.sh
│ └── stop.sh
├── conf
│ ├── canal_local.properties
│ ├── canal.properties
│ └── example
│ └── instance.properties
├── lib
├── logs
│ ├── canal
│ │ └── canal.log
│ └── example
│ ├── example.log
│ └── example.log
└── plugin
Change setting
To modify the configuration file conf/example/instance.properties
, you can configure it as follows, mainly to modify the database related configuration;
# 需要同步数据的MySQL地址
canal.instance.master.address=ip地址:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=
# 用于同步数据的数据库账号
canal.instance.dbUsername=canal
# 用于同步数据的数据库密码
canal.instance.dbPassword=canal
# 数据库连接编码
canal.instance.connectionCharset = UTF-8
# 需要订阅binlog的表过滤正则表达式
canal.instance.filter.regex=.*\\..*
修改运行参数
Since the default operation of canal requires a large amount of memory, let's modify it here (this step is very important)
Modify the startup.sh in the bin directory, here I have changed it to 512M
start up
- Enter the bin directory and start startup.sh
sh bin/startup.sh
6. Configure canal-adapter
As above, unzip the installation first. The directory after decompression is as follows.
├── bin
│ ├── adapter.pid
│ ├── restart.sh
│ ├── startup.bat
│ ├── startup.sh
│ └── stop.sh
├── conf
│ ├── application.yml
│ ├── es6
│ ├── es7
│ │ ├── biz_order.yml
│ │ ├── customer.yml
│ │ └── product.yml
│ ├── hbase
│ ├── kudu
│ ├── logback.xml
│ ├── META-INF
│ │ └── spring.factories
│ └── rdb
├── lib
├── logs
│ └── adapter
│ └── adapter.log
└── plugin
To modify the configuration file
conf/application.yml
, configure as follows, mainly to modify the canal-server configuration, data source configuration and client adapter configuration;
Notice: Linux needs to open port 11111
canal.conf:
mode: tcp # 客户端的模式,可选tcp kafka rocketMQ
flatMessage: true # 扁平message开关, 是否以json字符串形式投递数据, 仅在kafka/rocketMQ模式下有效
zookeeperHosts: # 对应集群模式下的zk地址
syncBatchSize: 1000 # 每次同步的批数量
retries: 0 # 重试次数, -1为无限重试
timeout: # 同步超时时间, 单位毫秒
accessKey:
secretKey:
consumerProperties:
# canal tcp consumer
canal.tcp.server.host: 127.0.0.1:11111 #设置canal-server的地址
canal.tcp.zookeeper.hosts:
canal.tcp.batch.size: 500
canal.tcp.username:
canal.tcp.password:
srcDataSources: # 源数据库配置
defaultDS:
url: jdbc:mysql://自己MySQL的ip地址:3306/bookstore?useUnicode=true
username: canal
password: canal
canalAdapters: # 适配器列表
- instance: example # canal实例名或者MQ topic名
groups: # 分组列表
- groupId: g1 # 分组id, 如果是MQ模式将用到该值
outerAdapters:
- name: logger # 日志打印适配器
- name: es7 # ES同步适配器
hosts: 自己ES的ip地址:9200 # ES连接地址
properties:
mode: rest # 模式可选transport(9300) 或者 rest(9200)
# security.auth: test:123456 # only used for rest mode
cluster.name: elasticsearch # ES集群名称
add mapping
Enter the canal-adapter/conf/es7 directory, create one book.yml
, and configure the mapping relationship between tables in MySQL and indexes in Elasticsearch
dataSourceKey: defaultDS # 源数据源的key, 对应上面配置的srcDataSources中的值
destination: example # canal的instance或者MQ的topic
groupId: g1 # 对应MQ模式下的groupId, 只会同步对应groupId的数据
esMapping:
_index: book # es 的索引名称
_id: _id # es 的_id, 如果不配置该项必须配置下面的pk项_id则会由es自动分配
sql: "SELECT
b.id AS _id, #这里是为了将MySQL的id对应上ES的_id
b.id, #这里是为了将MySQL的id对应ES文档中的id属性
b.book_name,
b.price,
b.introduce,
b.press,
b.author,
b.publish_date
FROM
book b" # sql映射
etlCondition: "where a.c_time>={}" #etl的条件参数
commitBatch: 3000 # 提交批大小
Modify running parameters
Go to canal_adapter/bin
the directory and modify the startup.sh file. I set it to 512M as above .
Then run startup.sh to start
7. Create an index in ES
PUT /book
{
"mappings" : {
"properties" : {
"author" : {
"type" : "keyword"
},
"book_name" : {
"type" : "text",
"analyzer": "ik_max_word" //使用分词器
},
"id" : {
"type" : "integer"
},
"introduce" : {
"type" : "text",
"analyzer": "ik_max_word" //使用分词器
},
"press" : {
"type" : "keyword"
},
"price" : {
"type" : "double"
},
"publish_date" : {
"type" : "date"
}
}
}
}
view index
8. Test
Create a record in the database using a SQL statement
insert into book values(1, "《Java》", 2323.2,"Java从入门到放弃", "机械工业出版社", "James","2012-12-11")
After the insertion is successful, we search in ES and find that the data has been synchronized.
test the modification
update book set book_name = "《Python》" where id = 1
Search again and find that book_name has been synchronized.
test delete
delete from book where id = 1
同步成功
9. canal-admin use
解压canal.admin-1.1.5-SNAPSHOT.tar.gz
, the directory after decompression is as follows
├── bin
│ ├── restart.sh
│ ├── startup.bat
│ ├── startup.sh
│ └── stop.sh
├── conf
│ ├── application.yml
│ ├── canal_manager.sql
│ ├── canal-template.properties
│ ├── instance-template.properties
│ ├── logback.xml
│ └── public
│ ├── avatar.gif
│ ├── index.html
│ ├── logo.png
│ └── static
├── lib
└── logs
executable file
Executing the conf/canal_manager.sql file will create a canal_manager database and several tables.
Modify configuration
To modify the configuration fileconf/application.yml
, you can configure it as follows, mainly to modify the data source configuration andcanal-admin
management account configuration. Note that you need to use a database account with read and write permissions, such as a management accountroot:root
;
server:
port: 8089
spring:
jackson:
date-format: yyyy-MM-dd HH:mm:ss
time-zone: GMT+8
spring.datasource:
address: ip地址:3306
database: canal_manager
username: root
password: root密码
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://${
spring.datasource.address}/${
spring.datasource.database}?useUnicode=true&characterEncoding=UTF-8&useSSL=false
hikari:
maximum-pool-size: 30
minimum-idle: 1
canal:
adminUser: admin
adminPasswd: admin
canal-server(canal_deployer)
Next, configure the previously builtconf/canal_local.properties
files, mainly the modifiedcanal-admin
configuration, andsh bin/startup.sh local
restart after the modification is completecanal-server
:
# register ip
canal.register.ip =
# canal admin config
canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
canal.admin.register.auto = true
canal.admin.register.cluster =
Similarly, modify the operating parameters of bin/startup.sh
Then run the startup script to start admin
Access the web interface of canal-admin, enter the account password admin:123456
to log in, access address: [http://192.168.80.100:8089
After successful login, you can use the web interface to operate the canal-server.
10. Problems encountered
-
运行内存问题
This problem is quite tricky. At the beginning, my Canal was running on the cloud server, without modifying the running parameters. The result is that Canal often runs, and then ES hangs up, or Canal can’t run, so the memory is small. xdm
一定要
modify parameters. -
端口问题
Make sure Linux opens port 11111 11110 8089
-
启动顺序问题
to start first
canal_deployer
, then startcanal_adapter
-
It is found that there are some
hs_error_xxx
files in the bin directory, delete them, and then continue to configure and start