Incremental synchronization mysql database information to ElasticSearch

  

 

  1 Introduction

  Introduction 1.1 canal

  Canal is a MySQL based binary log performance data synchronization system. Widely used in Canal Alibaba (including https://www.taobao.com), to provide a reliable low-latency data pipe increments, github Address: https: //github.com/alibaba/canal

  Canal Server can parse MySQL binlog and subscribe to data changes, and the changes can be implemented Canal Client broadcast to any place, such as databases and Apache Kafka.

  It has the following features:

  All supported platforms.

  Support powered by Prometheus fine-grained monitoring system.

  Analytical support and subscribe to MySQL binlog in different ways, for example by GTID.

  Support high-performance, real-time data synchronization. (See Performance)

  Canal Server and Canal Client support HA / Scalability, supported by Apache ZooKeeper

  Docker support. Disadvantages:

  Does not support full amount update only supports incremental updates.

  The full wiki address: https: //github.com/alibaba/canal/wiki

  1.2 How it works

  The principle is simple:

  Canal analog MySQL interactive protocol of the slave, slave disguised as MySQL, and transmitted to the forwarding protocol Master MySQL server.

  MySQL Master dump request is received and begins to push the binary log Slave (i.e. canal).

  Canal binary log object resolves to their data type (raw byte stream) as shown:

  

 

  1.3 synchronous es

  Es to synchronize data when the need to use an adapter: canal adapter. The latest version 1.1.3, download address: https: //github.com/alibaba/canal/releases.

  Seemingly es currently supported versions 6.x, 7.x versions are not supported !!!

  2. Preparation

  2.1 es and jdk

  Can refer to the installation es: https: //www.dalaoyang.cn/article/78

  Reference may be mounted jdk: https: //www.dalaoyang.cn/article/16

  2.2 Installation canal server

  Download canal.deployer-1.1.3.tar.gz

  

 

  unzip files

  

 

  Unzip files into the folder

  

 

  Modify conf / example / instance.properties file, several major attention to the following:

  canal.instance.master.address: address database, for example 127.0.0.1:3306

  canal.instance.dbUsername: Database User

  canal.instance.dbPassword: the complete contents of the database password as follows:

  

 

  

 

  Back canal.deployer-1.1.3 directory, start the canal:

  sh bin/startup.sh

  View Log:

  We logs / canal / canal.log

  View specific instance log:

  vi logs/example/example.log

  Close the command

  sh bin/stop.sh

  2.3 Installation canal-adapter

  Download canal.adapter-1.1.3.tar.gz

  wget https://github.com/alibaba/canal/releases/download/canal-1.1.3/canal.adapter-1.1.3.tar.gz

  Decompression

  tar -zxvf canal.adapter-1.1.3.tar.gz

  Unzip files into the folder

  cd canal.adapter-1.1.3

  Modify conf / application.yml file, the main attention to the following, because it is yml file, note the name of my property described here:

  server.port: canal-adapter port number

  canal.conf.canalServerHost: canal-server and ip address

  canal.conf.srcDataSources.defaultDS.url:数据库地址

  canal.conf.srcDataSources.defaultDS.username:数据库用户名

  canal.conf.srcDataSources.defaultDS.password:数据库密码

  canal.conf.canalAdapters.groups.outerAdapters.hosts:es主机地址,tcp端口

  完整内容如下:

  server:

  port: 8081

  spring:

  jackson:

  date-format: yyyy-MM-dd HH:mm:ss

  time-zone: GMT+8

  default-property-inclusion: non_null

  canal.conf:

  mode: tcp

  canalServerHost: 127.0.0.1:11111

  batchSize: 500

  syncBatchSize: 1000

  retries: 0

  timeout:

  accessKey:

  secretKey:

  srcDataSources:

  defaultDS:

  url: jdbc:mysql://127.0.0.1:3306/test?useUnicode=true

  username: root

  password: 12345678

  canalAdapters:

  - instance: example

  groups:

  - groupId: g1

  outerAdapters:

  - name: es

  hosts: 127.0.0.1:9300

  properties:

  cluster.name: elasticsearch

  另外需要配置conf/es/*.yml文件,adapter将会自动加载conf / es下的所有.yml结尾的配置文件。在介绍配置前,需要先介绍一下本案例使用的表结构,如下:

  CREATE TABLE `test` (

  `id` int(11) NOT NULL,

  `name` varchar(200) NOT NULL,

  `address` varchar(1000) DEFAULT NULL,

  PRIMARY KEY (`id`)

  ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

  需要手动去es中创建索引,比如这里使用es-head创建,如下图:

  

 

  test索引结构如下:

  {

  mappings:{

  _doc:{

  properties:{

  name:{

  type:text

  },

  address:{

  type:text

  }

  }

  }

  }

  }

  接下来创建test.yml(文件名随意),内容很好理解_index为索引名称,sql为对应语句,内容如下:

  dataSourceKey: defaultDS

  destination: example

  groupId:

  esMapping:

  _index: test

  _type: _doc

  _id: _id

  upsert: true

  sql: select a.id as _id,a.name,a.address from test a

  commitBatch: 3000

  配置完成后,回到canal-adapter根目录,执行命令启动

  bin/startup.sh

  查看日志

  vi logs/adapter/adapter.log

  关闭canal-adapter命令

  bin/stop.sh

  3.测试

  都启动成功后,先查看一下es-head,如图,现在是没有任何数据的。

  

 

  接下来,我们在数据库中插入一条数据进行测试,语句如下:

  INSERT INTO `test`.`test`(`id`, `name`, `address`) VALUES (7, '北京', '北京市朝阳区');

  然后在看一下es-head,如下

  

 

  接下来看一下日志,如下:

  2019-06-22 17:54:15.385 [pool-2-thread-1] DEBUG c.a.otter.canal.client.adapter.es.service.ESSyncService - DML: {data:[{id:7,name:北京,address:北京市朝阳区}],database:test,destination:example,es:1561197255000,groupId:null,isDdl:false,old:null,pkNames:[id],sql:,table:test,ts:1561197255384,type:INSERT}

  Affected indexes: test

  小知识点:上面介绍的查看日志的方法可能不是很好用,推荐使用如下语法,比如查看日志最后200行:

  tail -200f logs/adapter/adapter.log

  4.总结

  1.全量更新不能实现,但是增删改都是可以的。

  2.一定要提前创建好索引。

  3.es配置的是tcp端口,比如默认的9300

Guess you like

Origin blog.csdn.net/qianfeng_dashuju/article/details/93496608