Alibaba-Canal use detailed explanation (pit row version)_MySQL and ES data synchronization

canal overview

用处

The main purpose of canal is to analyze the incremental logs of the MySQL database, and provide subscription and consumption of incremental data. Simply put, it can synchronize the incremental data of MySQL in real time, and supports synchronization to data storage such as MySQL, Elasticsearch, and HBase.

工作原理

Canal will simulate the interaction protocol between the MySQL main library and the slave library, thereby pretending to be the MySQL slave library, and then send the dump protocol to the MySQL main library. After receiving the dump request, the MySQL main library will push the binlog to Canal, and Canal will synchronize the data by parsing the binlog to other storage .

insert image description here

1. version

Here my MySQL and ES are both installed 阿里云服务器上, the MySQL version is 5.7 , and the ES version is 7.14 .

2. Download

The download address is Github . The version I downloaded here v1.1.5-alpha-2is just download the first three components.

insert image description here

3. Upload

Due to the large memory required for Canal startup, my Alibaba Cloud server configuration is low, so here I install and configure Canal on the local Liunx server.

Upload the three components to the Linux server.

insert image description here

4. Configure MySQL

Since canal realizes data synchronization by subscribing to MySQL's binlog , we need to enable MySQL's binlog writing function and set it binlog-formatto ROW mode.

My MySQL is running in Alibaba Cloud Docker and mounted, so just modify the external my.cnf file and add the following configuration

[mysqld]
## 设置server_id,同一局域网中需要唯一
server_id=101 
## 指定不需要同步的数据库名称
binlog-ignore-db=mysql  
## 开启二进制日志功能
log-bin=mall-mysql-bin  
## 设置二进制日志使用内存大小(事务)
binlog_cache_size=1M  
## 设置使用的二进制日志格式(mixed,statement,row)
binlog_format=row  
## 二进制日志过期清理时间。默认值为0,表示不自动清理。
expire_logs_days=7  
## 跳过主从复制中遇到的所有错误或指定类型的错误,避免slave端复制中断。
## 如:1062错误是指一些主键重复,1032错误是因为主从数据库数据不一致
slave_skip_errors=1062

After the configuration is complete, you need to restart MySQL. After restarting, use the following command to check whether binlog is enabled

命令

# 查看binlog是否启用
show variables like '%log_bin%';

# 查看下MySQL的binlog模式;
show variables like 'binlog_format%';

insert image description here

Next, you need to create an account with slave library permissions to subscribe to binlog. The account created here iscanal:canal

CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;

Create the database bookstore used for the test , and then create a table book,

create table book(
		id int(32) auto_increment primary key,
		book_name varchar(32),
	    price double,
		introduce varchar(2048),
	    press varchar(32),
		author varchar(32),
	    publish_date DATE
)

5. Configure canal-server

Unzip our upload canal.deployer-1.1.5-SNAPSHOT.tar.gzto the specified directory

tar -zxvf canal.deployer-1.1.5-SNAPSHOT.tar.gz -C /指定路径

directory after decompression

├── bin
│   ├── restart.sh
│   ├── startup.bat
│   ├── startup.sh
│   └── stop.sh
├── conf
│   ├── canal_local.properties
│   ├── canal.properties
│   └── example
│       └── instance.properties
├── lib
├── logs
│   ├── canal
│   │   └── canal.log
│   └── example
│       ├── example.log
│       └── example.log
└── plugin

Change setting

To modify the configuration file conf/example/instance.properties, you can configure it as follows, mainly to modify the database related configuration;

# 需要同步数据的MySQL地址
canal.instance.master.address=ip地址:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=
# 用于同步数据的数据库账号
canal.instance.dbUsername=canal
# 用于同步数据的数据库密码
canal.instance.dbPassword=canal
# 数据库连接编码
canal.instance.connectionCharset = UTF-8
# 需要订阅binlog的表过滤正则表达式
canal.instance.filter.regex=.*\\..*

修改运行参数

Since the default operation of canal requires a large amount of memory, let's modify it here (this step is very important)

Modify the startup.sh in the bin directory, here I have changed it to 512M

insert image description here

start up

  • Enter the bin directory and start startup.sh
sh bin/startup.sh

6. Configure canal-adapter

As above, unzip the installation first. The directory after decompression is as follows.

├── bin
│   ├── adapter.pid
│   ├── restart.sh
│   ├── startup.bat
│   ├── startup.sh
│   └── stop.sh
├── conf
│   ├── application.yml
│   ├── es6
│   ├── es7
│   │   ├── biz_order.yml
│   │   ├── customer.yml
│   │   └── product.yml
│   ├── hbase
│   ├── kudu
│   ├── logback.xml
│   ├── META-INF
│   │   └── spring.factories
│   └── rdb
├── lib
├── logs
│   └── adapter
│       └── adapter.log
└── plugin

To modify the configuration file conf/application.yml, configure as follows, mainly to modify the canal-server configuration, data source configuration and client adapter configuration;

Notice: Linux needs to open port 11111

canal.conf:
  mode: tcp # 客户端的模式,可选tcp kafka rocketMQ
  flatMessage: true # 扁平message开关, 是否以json字符串形式投递数据, 仅在kafka/rocketMQ模式下有效
  zookeeperHosts:    # 对应集群模式下的zk地址
  syncBatchSize: 1000 # 每次同步的批数量
  retries: 0 # 重试次数, -1为无限重试
  timeout: # 同步超时时间, 单位毫秒
  accessKey:
  secretKey:
  consumerProperties:
    # canal tcp consumer
    canal.tcp.server.host: 127.0.0.1:11111 #设置canal-server的地址
    canal.tcp.zookeeper.hosts:
    canal.tcp.batch.size: 500
    canal.tcp.username:
    canal.tcp.password:

  srcDataSources: # 源数据库配置
    defaultDS:
      url: jdbc:mysql://自己MySQL的ip地址:3306/bookstore?useUnicode=true
      username: canal
      password: canal
  canalAdapters: # 适配器列表
  - instance: example # canal实例名或者MQ topic名
    groups: # 分组列表
    - groupId: g1 # 分组id, 如果是MQ模式将用到该值
      outerAdapters:
      - name: logger # 日志打印适配器
      - name: es7 # ES同步适配器
        hosts: 自己ES的ip地址:9200 # ES连接地址
        properties:
          mode: rest # 模式可选transport(9300) 或者 rest(9200)
          # security.auth: test:123456 #  only used for rest mode
          cluster.name: elasticsearch # ES集群名称

add mapping

Enter the canal-adapter/conf/es7 directory, create one book.yml, and configure the mapping relationship between tables in MySQL and indexes in Elasticsearch

dataSourceKey: defaultDS # 源数据源的key, 对应上面配置的srcDataSources中的值
destination: example  # canal的instance或者MQ的topic
groupId: g1 # 对应MQ模式下的groupId, 只会同步对应groupId的数据
esMapping:
  _index: book # es 的索引名称
  _id: _id  # es 的_id, 如果不配置该项必须配置下面的pk项_id则会由es自动分配
  sql: "SELECT
            b.id AS _id, #这里是为了将MySQL的id对应上ES的_id
            b.id,        #这里是为了将MySQL的id对应ES文档中的id属性
            b.book_name,
            b.price,
            b.introduce,
            b.press,
            b.author,
            b.publish_date    
        FROM
            book b"        # sql映射
  etlCondition: "where a.c_time>={}"   #etl的条件参数
  commitBatch: 3000   # 提交批大小

Modify running parameters

Go to canal_adapter/binthe directory and modify the startup.sh file. I set it to 512M as above .

Then run startup.sh to start

7. Create an index in ES

PUT /book
{
    
    
    "mappings" : {
    
    
      "properties" : {
    
    
        "author" : {
    
    
          "type" : "keyword"
        },
        "book_name" : {
    
    
          "type" : "text",
          "analyzer": "ik_max_word" //使用分词器
        },
        "id" : {
    
    
          "type" : "integer"
        },
        "introduce" : {
    
    
          "type" : "text",
          "analyzer": "ik_max_word"  //使用分词器
        },
        "press" : {
    
    
          "type" : "keyword"
        },
        "price" : {
    
    
          "type" : "double"
        },
        "publish_date" : {
    
    
          "type" : "date"
        }
      }
    }
 }

view index

insert image description here

8. Test

Create a record in the database using a SQL statement

insert into book values(1, "《Java》", 2323.2,"Java从入门到放弃", "机械工业出版社", "James","2012-12-11")

After the insertion is successful, we search in ES and find that the data has been synchronized.

insert image description here

test the modification

update book set book_name = "《Python》" where id = 1

Search again and find that book_name has been synchronized.

insert image description here

test delete

delete from book where id = 1

同步成功

insert image description here

9. canal-admin use

  • 解压canal.admin-1.1.5-SNAPSHOT.tar.gz, the directory after decompression is as follows
├── bin
│   ├── restart.sh
│   ├── startup.bat
│   ├── startup.sh
│   └── stop.sh
├── conf
│   ├── application.yml
│   ├── canal_manager.sql
│   ├── canal-template.properties
│   ├── instance-template.properties
│   ├── logback.xml
│   └── public
│       ├── avatar.gif
│       ├── index.html
│       ├── logo.png
│       └── static
├── lib
└── logs

executable file

Executing the conf/canal_manager.sql file will create a canal_manager database and several tables.

insert image description here

Modify configuration
To modify the configuration file conf/application.yml, you can configure it as follows, mainly to modify the data source configuration and canal-adminmanagement account configuration. Note that you need to use a database account with read and write permissions, such as a management account root:root;

server:
  port: 8089
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8

spring.datasource:
  address: ip地址:3306
  database: canal_manager
  username: root
  password: root密码
  driver-class-name: com.mysql.jdbc.Driver
  url: jdbc:mysql://${
    
    spring.datasource.address}/${
    
    spring.datasource.database}?useUnicode=true&characterEncoding=UTF-8&useSSL=false
  hikari:
    maximum-pool-size: 30
    minimum-idle: 1

canal:
  adminUser: admin
  adminPasswd: admin
  • canal-server(canal_deployer)Next, configure the previously built conf/canal_local.propertiesfiles, mainly the modified canal-adminconfiguration, and sh bin/startup.sh localrestart after the modification is complete canal-server:
# register ip
canal.register.ip =

# canal admin config
canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
canal.admin.register.auto = true
canal.admin.register.cluster =

Similarly, modify the operating parameters of bin/startup.sh

Then run the startup script to start admin

Access the web interface of canal-admin, enter the account password admin:123456to log in, access address: [http://192.168.80.100:8089

insert image description here

After successful login, you can use the web interface to operate the canal-server.

insert image description here

10. Problems encountered

  • 运行内存问题

    This problem is quite tricky. At the beginning, my Canal was running on the cloud server, without modifying the running parameters. The result is that Canal often runs, and then ES hangs up, or Canal can’t run, so the memory is small. xdm 一定要modify parameters.

  • 端口问题

    Make sure Linux opens port 11111 11110 8089

  • 启动顺序问题

    to start first canal_deployer, then startcanal_adapter

  • It is found that there are some hs_error_xxxfiles in the bin directory, delete them, and then continue to configure and start

Guess you like

Origin blog.csdn.net/qq_46312987/article/details/125588273
Recommended