Four solutions and practical demonstrations for data synchronization between MySQL and ES

1. Synchronous dual writing

That is, synchronous call, which is the simplest way. When writing data to mysql, the data is written to ES at the same time.

advantage

1. Simple business logic
2. High real-time performance

shortcoming

1. Hard coding. Wherever mysql needs to be written, code for writing ES needs to be added.
2. Strong business coupling
. 3. There is a risk of data loss due to double write failure.
4. Poor performance: Originally, the performance of mysql is not very high. With an ES, the performance of the system will inevitably decrease.

Risk of double-write failure

The ES system is unavailable;
there is a network failure between the program and ES;
the program restarts, causing the system to have no time to write to ES, etc.
In this case, if there is a requirement for strong data consistency, double writing must be done in a transaction. Once things are used, the performance will drop even more obviously.

Project demonstration

Please move to: Synchronous call for data synchronization between MySQL and ES

2. Asynchronous double writing (MQ mode)

For multi-data source writing scenarios, MQ can be used to implement asynchronous multi-source writing. In this case, the writing logic of each source does not interfere with each other, and the abnormal or slow writing of a single data source will not affect other data sources. Writing, although the overall writing throughput has increased, because MQ consumption is asynchronous consumption, it is not suitable for real-time business scenarios.

advantage

It has high performance
and is not prone to data loss problems. It is mainly based on the consumption guarantee mechanism of MQ messages. For example, if ES is down or writes fails, MQ messages can be consumed again.
Multi-source writes are isolated from each other to facilitate expansion of more data source writes.

shortcoming

Hard coding problem, access to new data sources requires the implementation of new consumer code
. Increased system complexity: the introduction of message middleware
may cause delay problems: MQ is an asynchronous consumption model, and the data written by the user may not be visible immediately , causing delay.

Project demonstration

Please move to: Asynchronous call for MySQL and ES data synchronization

3. Synchronization based on Datax

DataX is a widely used offline data synchronization tool/platform within the Alibaba Group. It implements various types of data including MySQL, Oracle, SqlServer, Postgre, HDFS, Hive, ADS, HBase, TableStore (OTS), MaxCompute (ODPS), DRDS, etc. Efficient data synchronization function between structured data sources.

core components

Reader: Data acquisition module, responsible for collecting data from the source
Writer: Data writing module, responsible for writing to the target library
Framework: Data transmission channel, responsible for processing data buffering, etc.
The above only needs to rewrite the Reader and Writer plug-ins to implement new data sources Support
understanding the core module components of datax from a JOB:
datax completes a single data synchronization job, called a job. The job will be responsible for data cleaning, task segmentation, etc.;
after the task is started, the job will be based on the segmentation strategies of different sources. Split into multiple Tasks for concurrent execution. Task is the smallest unit for executing the job.
After the splitting is completed, the Tasks are combined into TaskGroup according to the Scheduler module. Each group is responsible for certain concurrency and assignment of Tasks.

Architecture diagram

Insert image description here

Supported data sources and operations

Insert image description here
Insert image description here
Insert image description here

Project demonstration

Please move to: DataX implements data synchronization between Mysql and ElasticSearch (ES)

4. Real-time synchronization based on Binlog

Implementation principle

The specific steps are as follows:
read the binlog log of mysql and obtain the log information of the specified table;
convert the read information into MQ;
write an MQ consumer program;
continuously consume MQ, and write the message to ES after each message is consumed. .

advantage

There is no code intrusion or hard coding;
the original system does not require any changes and there is no perception;
high performance;
business decoupling, no need to pay attention to the business logic of the original system.

shortcoming

Building a Binlog system is complicated;
if MQ is used to consume the parsed binlog information, there will also be the risk of MQ delay like the second option.
The most popular solution in the industry: using canal to monitor binlog and synchronize data to es

Project demonstration

Please move to:
Docker deploys Canal to monitor MySQL binlog
SpringBoot integrates Canal to achieve MySQL and ES data synchronization


End~

Guess you like

Origin blog.csdn.net/m0_68681879/article/details/132837139