ES data synchronization scheme

When the business volume increased, due to the ability mysql full-text search or fuzzy query support is not strong, local inquiry in the system, such as sql often appear slow, dragging down other system modules, resulting in poor performance.

With the increased popularity of using ES, ES is an effective complement mysql. We can send data to the search engines (such as ES), to provide professional services by the search engines.

Next, the scene would work in conjunction with the actual use of the data from mysql to some analysts es synchronization.

In practice, I summed up in the following ways.

Item 1: synchronous dual-write

This is one of the most simple way, when data is written to mysql, while data is written to ES, dual write data.

advantage:

Business logic is simple.

Disadvantages:

Hard-coded: the need to write mysql places where you need to add code to write ES; business strong coupling; there is double the risk of failure lost data; poor performance: mysql performance of the original is not very high, plus a write ES, system performance will inevitably decline. Description:

Point 3 above mentioned double the risk of failure, including the following three:

ES system is unavailable; failure between network applications and ES; system restart application, resulting in too late to ES system and the like. In view of this situation, there is strong data consistency requirements, you must double put things to deal with, but once things to spend, the more noticeable drop in performance.

No. 2: asynchronous dual-write (MQ way)

For synchronizing a first dual write performance and data loss problem, consider the introduction of the MQ, to form a double-write asynchronous protocol, as shown below:

Because MQ performance an order of magnitude substantially higher than mysql, performance can be significantly improved.

advantage:

High performance; lost data is not a problem. Disadvantages:

Hard coding problem: traffic intensity persists coupling: still increasing complexity: the system adds mq code; there may be delay problem: the program write performance is improved, but may be due to consumption due MQ network or other reasons causing the user's data is not necessarily written to be seen right away, causing a delay. No. 3: asynchronous dual-write (Worker way)

The above two solutions are hardcoded problem exists, i.e. there is anything to be mysq CRUD either implanted ES codes, too invasive or replaced MQ codes, code.

If the case where less demanding real-time, can be considered to deal with the timer, the following steps:

Related tables in the database to add a timestamp field to field any crud operations will result in a change in the time of the field; the original CURD operation program without any changes; add a timer program (called internal Jingdong Worker), so that the program scans a certain period of time specified in the table, the data of the period change is extracted; one by one written into the ES. Into shown below

advantage:

The code does not change the original, not invasive, not hard-coded; no strong coupling operations; performance without changing the original program; no need to consider the simple coding Worker CRUD. Disadvantages:

Aging property is poor, the duty cycle is not possible because the timer provided in the second stage, the real time no good 2 above; polling the database certain pressure, an improved method is to put heavy pressure is not poll library on. Fourth: Binlog Synchronization:

The above three programs have codes or intrusion, either hard-coded or have a delay in the fourth embodiment, may be utilized to synchronize binlog mysql

Specific steps are as follows:

1) read log binlog mysql acquires log information of the specified table;

2) the information read into the MQ;

3) the preparation of an MQ consumer program;

4) continue to consume MQ, complete consumption per message, the message is written into the ES.

advantage:

Code no invasion, no hard-coded; legacy system does not require any changes, there is no perceived; high performance; decoupled service, service logic need not be concerned of the original system. Disadvantages:

Construction of Binlog system complexity; also like Option II, there is a risk of delay MQ

Guess you like

Origin www.cnblogs.com/zeenzhou/p/12125634.html