How did you solve the problem of data consistency in distributed systems

1. The concept of consistency:

Refers to the weak consistency between distributed service systems, including application system consistency and data consistency.
Large data volume, high concurrency requirements, strong computing capabilities, fast response speed, and other Internet requirements, services Nodes began to pool, container applications and data split began to appear, the idea and logic of divide and conquer
horizontal split and vertical split

2. Modes and ideas for solving consistency problems

(1) Acid-base balance theory
①ACID (acid)
atomicity, consistency, isolation, and durability.
Relational database transaction processing to ensure strong consistency is usually achieved through the multi-version control protocol (MVCC). The
inconsistencyofplacing orders and deducting inventory can be placed in the same database shard, and 4 transactions are processed through the relational database. The basic element ACID can solve this inconsistency.

②CAP (Hat Principle): The CAP principle of distributed systems.
Consistency, availability, and partition tolerance.
Distributed service systems need to meet partition tolerance (allowing partial message loss on the network), but must be consistent (all system nodes are The data read at the same time must be the latest copy of the data) and availability (good response performance, the service will be processed and responded within a limited time under any failure state). The balance can only meet the above two points, not three Both.

③BASE (base) The
BASE idea solves the problem of incompatibility and availability of distributed systems proposed by CAP.
BA: Basically usable, S: Soft state, the state can be out of sync within a period of time, E: Final consistency;
Soft state is the method to realize the BASE idea, basic availability and final goal consistency is the goal; to
solve the consistency of orders and inventory Problem, disassemble complex distributed transactions, record the soft state of all steps in the middle, and continue to perform tasks according to the recorded state when there is a problem, to reach the final consistency;

Summary:
Scale up, upgrade hardware, and use relational databases.
Use an open source relational database, perform horizontal scaling and sharding, and divide relevant data into the same shard of the database to ensure transaction execution.
If the relevant data cannot be divided on the same slice, it is necessary to achieve eventual consistency and record the soft state of the transaction.

(2) Distributed consistency protocol
DTS distributed transaction processing model:
contains four roles: application, transaction manager, resource manager and communication manager.
Transaction manager is the manager (coordinator) who manages the overall situation, resource and The communication manager is the participant (participant) of the transaction and the
coordinator issues instructions to the participants

① Two-phase submission protocol:
a preparation phase and a submission phase;
there are the following topics: blocking, single point of failure and split-brain problems
ensure strong consistency of the system, but when the processing state is in an error state, it leads to consistency and Usability cannot have both

②The three-phase protocol
solves the problem of blocking (automatically committing the transaction successfully over time) and the problem of resource lock forever.
Inquiry phase, preparation phase, commit phase

③The TCC protocol (Try, Confirm, Cancel) recommends this. There are basically no scenarios in the high-concurrency system of the above two protocols. The
simplified version of the three-phase protocol is used. In extreme cases, inconsistencies and split-brain problems may still occur. The benefits are certain Self-repairing ability, any participant can automatically repair Cancel
, try first, confirm if there is no problem, if there is a problem, perform the reverse operation Cancel

(3) Guarantee final consistency mode
①Query mode
Any service operation needs to provide a query interface to output the status of the operation as an external output. By knowing the execution status of the operation, the service consumer performs different processing operations according to different statuses.
In order to achieve query, each service operation needs to have a unique serial number identification, such as request serial number, order number, etc.

②Compensation mode
With the above query mode, we can know the specific service operation. If the operation is in an abnormal state, we need to repair the operation, and make the entire distributed system consistent through repair, and make efforts to make the system reach a consistent state It is called compensation.

Compensation operation classification:
1. Automatic recovery: The program automatically reaches a consensus state by continuing the unfinished operation or rolling back the completed operation according to the inconsistent environment that occurs.
2. Notification operation
3. Technical operation

③Asynchronous guarantee mode
Usually this kind of operation is removed from the main process, processed in an asynchronous manner, and the result is notified to the user through the notification system after processing. The biggest advantage is that it can eliminate the peak of high concurrent traffic.
The asynchronous operation to be executed is encapsulated and stored in the library for a long time, and then the unfinished tasks are retrieved regularly to compensate for the asynchronous guarantee mode. As long as the timing system is robust enough, any task will eventually be successfully executed.

④Regular proofreading mode
Regularly proofread the operating status of each system. If there is an operation in an inconsistent state, perform a compensation operation!
A key to regular proofreading is that a distributed system needs to have a unique ID from beginning to end, and there are two ways to generate a unique ID:
1. Persistence: use database table auto-increment fields or sequence generation, in order to improve efficiency, each application node Can cache a batch of ID
2. Time type: generally composed of machine number, service number, time, and self-increasing ID in a single node.

It is mostly used in the consistent reconciliation between systems in the financial system, cash reconciliation, financial reconciliation, etc.

⑤Reliable message mode
1. Reliable message sending (two types). The
service module persistent message sending
is similar to the first one, but the database of persistent messages is independent and not coupled in the business system

2. The idempotence of the message processor. To
ensure that the message will be sent out, a retry mechanism is needed. If there is a retry mechanism, the message will be
repeated. The best way to deal with repetitive problems is to ensure the idempotence of operations.

⑥Cache consistency mode
Try to use distributed cache instead of local cache.
Ensure weak consistency.
Cached data must be complete and correct

(4) Timeout processing mode
1. Microservice interaction mode
(1) Synchronous call mode: suitable for large-scale, high-concurrency short request operations
(2) Interface asynchronous call mode: request acceptance, return acceptance results, and asynchronous return processing later Result
(3) Message queue asynchronous processing mode: Using message queue as a communication mechanism, service 2 does not need to return the processing result to the caller service 1, and the caller just tells service 2 to handle this event!

2. The choice between synchronization and asynchrony.
Try to use asynchrony to replace synchronous operations
. For problems that can be solved by synchronization, do not introduce asynchrony

3. Solution to the timeout problem in interactive mode
Generally speaking, there are two ways to deal with it: fast failure and internal compensation
(1) Solution in synchronous call mode
①Two-state synchronous interface: success, failure,
abnormal timeout found, fast return Failure
②Three-state synchronous interface: success, failure, processing. If
abnormal timeout is found during processing , handle the request sent by the user as much as possible, and the status is returned in processing
(2) Solution in asynchronous call mode
(3) Message queue asynchronous processing mode Solution
(4) Principle of overtime compensation
(5) Design of migration switch

It is recommended to use the order switch: mark the switch on the requested association, such as the order, to mark whether to call the old system or the new system, instead of judging by the global or configured switch, to avoid the switch between each node is not synchronized and inconsistent , Resulting in calling both the old system and the new system.

Guess you like

Origin blog.csdn.net/phpCenter/article/details/105241941