Actual combat | canal realizes real-time incremental synchronization from Mysql to Elasticsearch

Inscription

The relational database Mysql/Oracle incremental synchronization Elasticsearch is an issue of continuous concern and one of the most discussed issues in the community and QQ groups. Questions include but are not limited to: 1. How does Mysql synchronize to Elasticsearch? 2. What are the differences in the selection of Logstash, kafka_connector, and canal, and how to choose? 3. Can you add, delete, modify and check simultaneously? ..... This article gives the answer.

1. Canal synchronization

1.1 Canal officially supports Mysql to synchronize ES6.X

Synchronization principle, see before: Dry goods | Debezium realizes efficient real-time synchronization of Mysql to Elasticsearch.

After canal 1.1.1 version, the adaptation and startup function of client data landing has been added. The Elastic Search version of the canal adapter supports 6.xx and above. Need to use adapter to achieve.

1.2 Synchronization effect

1) Verified: Only incremental synchronization is supported, and full existing data synchronization is not supported. In this regard, the original intention of canal is to "incremental subscription & consumption component of Alibaba mysql database binlog".

2) Verified: Due to the binlog mechanism, the new, updated, and deleted operations in Mysql, the corresponding Elasticsearch can be added, updated, and deleted in real time.

3) The recommended use scenario canal is suitable for business scenarios that require real-time addition, deletion, and modification of Mysql and Elasticsearch data. Real-time scenarios are not demanding business scenarios, logstashinputjdbc can also meet.

It is recommended to do a good job selection and screening.

2. Synchronous version:

3. Interpretation of synchronization steps

3.1 Start canal, which can run in the background as a resident process.

The official website has a detailed description https://github.com/alibaba/canal/wiki/QuickStart, the following only lists the key precautions.

Corresponding download file: canal.deployer-1.1.3-SNAPSHOT.tar.gz, you can follow the latest version in real time.

3.1.1 Enable binlog

The principle of canal is based on mysql binlog technology, so it is necessary to enable mysql's binlog writing function here. It is recommended to configure the binlog mode as row.

[mysqld]
log-bin=mysql-bin #添加这一行就ok
binlog-format=ROW #选择row模式
server_id=1 #配置mysql replaction需要定义,不能和canal的slaveId重复

3.1.2 Modify configuration file

vi conf/example/instance.properties

Configure the basic information of the database.

3.1.3 Start canal

Bin/startup.sh can check the error through the log.

3.2 Configure ElasticSearch adapter and realize synchronization.

The official website has been described in detail: https://github.com/alibaba/canal/wiki/Sync-ES . The following describes only the pits encountered in deployment.

3.2.1 Deployment version

anal.adapter-1.1.3-SNAPSHOT.tar.gz, if updated, it is recommended to use the latest version.

3.2.2 Core configuration

[root@localhost es]# cat mytest_user.yml 
dataSourceKey: defaultDS
destination: example
esMapping:
_index: baidu_index
_type: _doc
_id: _id
pk: id
sql: "select a.id as _id, a.title, a.url, a.publish_time, a.content, 
from baidu_info as a"
# objFields:
# _labels: array:;
etlCondition: "where a.id >= 1"
commitBatch: 3000

Realization purpose: The id field of the library table is used as the _id of Elasticsearch, in order to achieve self-increment.

4. Multi-table association realization

It is recommended to refer to the official website: https://github.com/alibaba/canal/wiki/Sync-ES Support:

  • One to one
  • One to many
  • Many to many

5. Pit

Pit 1: canal.adapter-1.1.2 failed to start

Startup failure: https://github.com/alibaba/canal/issues/1513 This issue has been fixed in version 1.1.3.

Pit 2: Does not support full synchronization

It is recommended to use logstash or other tools for full synchronization:

Pit 3: Mapping of the corresponding index must be created in ES first

Otherwise, the index will not be recognized and a write error will be reported.

Pit 4: How to synchronize multiple tables?

Add *.yml configuration in canal.adapter-1.1.3/conf/es. In other words, there can be one configuration file per Mysql table.

Pit 5: Null pointer exception error

Solution: In the SQL statement section, specify the corresponding library table id as _id in ES, otherwise an error will be reported. For example:

select  sx_sid  as  _id, name  from  baidu_info

Pit 6: Will the binlog based on row mode record the values ​​before and after the change?

6 Summary of Synchronous Selection

Actual combat | canal realizes real-time incremental synchronization from Mysql to Elasticsearch

The above different selections have their own advantages and disadvantages. It is recommended to choose according to actual business. Welcome to leave your synchronous practice plan and thinking.

Actual combat | canal realizes real-time incremental synchronization from Mysql to Elasticsearch

Join the planet and settle technology with the boss!

Guess you like

Origin blog.51cto.com/15050720/2562053
Recommended