Inscription
The relational database Mysql/Oracle incremental synchronization Elasticsearch is an issue of continuous concern and one of the most discussed issues in the community and QQ groups. Questions include but are not limited to: 1. How does Mysql synchronize to Elasticsearch? 2. What are the differences in the selection of Logstash, kafka_connector, and canal, and how to choose? 3. Can you add, delete, modify and check simultaneously? ..... This article gives the answer.
1. Canal synchronization
1.1 Canal officially supports Mysql to synchronize ES6.X
Synchronization principle, see before: Dry goods | Debezium realizes efficient real-time synchronization of Mysql to Elasticsearch.
After canal 1.1.1 version, the adaptation and startup function of client data landing has been added. The Elastic Search version of the canal adapter supports 6.xx and above. Need to use adapter to achieve.
1.2 Synchronization effect
1) Verified: Only incremental synchronization is supported, and full existing data synchronization is not supported. In this regard, the original intention of canal is to "incremental subscription & consumption component of Alibaba mysql database binlog".
2) Verified: Due to the binlog mechanism, the new, updated, and deleted operations in Mysql, the corresponding Elasticsearch can be added, updated, and deleted in real time.
3) The recommended use scenario canal is suitable for business scenarios that require real-time addition, deletion, and modification of Mysql and Elasticsearch data. Real-time scenarios are not demanding business scenarios, logstashinputjdbc can also meet.
It is recommended to do a good job selection and screening.
2. Synchronous version:
- EN : 6.6.1
- Mysql: 5.7.25
- channel : v1.1.3-alpha-2
- canal-adapter:v1.1.3-alpha-2
- canal download address: https://github.com/alibaba/canal/releases
3. Interpretation of synchronization steps
3.1 Start canal, which can run in the background as a resident process.
The official website has a detailed description https://github.com/alibaba/canal/wiki/QuickStart, the following only lists the key precautions.
Corresponding download file: canal.deployer-1.1.3-SNAPSHOT.tar.gz, you can follow the latest version in real time.
3.1.1 Enable binlog
The principle of canal is based on mysql binlog technology, so it is necessary to enable mysql's binlog writing function here. It is recommended to configure the binlog mode as row.
[mysqld]
log-bin=mysql-bin #添加这一行就ok
binlog-format=ROW #选择row模式
server_id=1 #配置mysql replaction需要定义,不能和canal的slaveId重复
3.1.2 Modify configuration file
vi conf/example/instance.properties
Configure the basic information of the database.
3.1.3 Start canal
Bin/startup.sh can check the error through the log.
3.2 Configure ElasticSearch adapter and realize synchronization.
The official website has been described in detail: https://github.com/alibaba/canal/wiki/Sync-ES . The following describes only the pits encountered in deployment.
3.2.1 Deployment version
anal.adapter-1.1.3-SNAPSHOT.tar.gz, if updated, it is recommended to use the latest version.
3.2.2 Core configuration
[root@localhost es]# cat mytest_user.yml
dataSourceKey: defaultDS
destination: example
esMapping:
_index: baidu_index
_type: _doc
_id: _id
pk: id
sql: "select a.id as _id, a.title, a.url, a.publish_time, a.content,
from baidu_info as a"
# objFields:
# _labels: array:;
etlCondition: "where a.id >= 1"
commitBatch: 3000
Realization purpose: The id field of the library table is used as the _id of Elasticsearch, in order to achieve self-increment.
4. Multi-table association realization
It is recommended to refer to the official website: https://github.com/alibaba/canal/wiki/Sync-ES Support:
- One to one
- One to many
- Many to many
5. Pit
Pit 1: canal.adapter-1.1.2 failed to start
Startup failure: https://github.com/alibaba/canal/issues/1513 This issue has been fixed in version 1.1.3.
Pit 2: Does not support full synchronization
It is recommended to use logstash or other tools for full synchronization:
Pit 3: Mapping of the corresponding index must be created in ES first
Otherwise, the index will not be recognized and a write error will be reported.
Pit 4: How to synchronize multiple tables?
Add *.yml configuration in canal.adapter-1.1.3/conf/es. In other words, there can be one configuration file per Mysql table.
Pit 5: Null pointer exception error
Solution: In the SQL statement section, specify the corresponding library table id as _id in ES, otherwise an error will be reported. For example:
select sx_sid as _id, name from baidu_info
Pit 6: Will the binlog based on row mode record the values before and after the change?
- INSERT: Only the changed value.
- UPDATE: Contains the values before and after the change.
- DELETE: The value before the change
About full synchronization: https://github.com/alibaba/canal/issues/376
6 Summary of Synchronous Selection
The above different selections have their own advantages and disadvantages. It is recommended to choose according to actual business. Welcome to leave your synchronous practice plan and thinking.
Join the planet and settle technology with the boss!