Zookeeper topic - 3, distributed consistency, advantages and disadvantages of several implementations

If it is from the consensus algorithm level:

Personally, I think it is more appropriate to call it a fault-tolerant distributed consistency protocol. Of course, the word fault-tolerance is very important here, and it implies the strong consistency requirements (such as linear consistency) of the so-called replicated state machine.

Consistency itself is a widely used concept, such as concurrent programming, database transaction processing, cache consistency, and so on. Many other algorithms can also solve different requirements for consistency (such as linear, sequential, causal, final, etc.) in certain scenarios (strong and weak assumptions about the system model). for example:

2PC cannot be fault-tolerant but can solve strong consistency;

Memory ordering in concurrent programming generally guarantees happen-before causal ordering. Therefore, different application scenarios have different requirements for consistency.

The name Paxos should be commonly used as a Paxos-like algorithm. In fact, concepts such as majority have already existed before Paxos, and gbcast, view stamped replication, etc. in vsync are actually before Leslie Lamport's Basic Paxos algorithm. And although gbcast and Basic Paxos can be converted into each other, there are essential differences, so strictly speaking, it cannot be called "Paxos algorithm". Of course, the follow-up Multi-Paxos, Zab, and Raft can be regarded as some evolution and optimization of Basic Paxos for strong consistency requirements under the system model of asynchronous network and crash failure.

Therefore, I think that for consistency-related issues, it is still necessary to understand more objectively from the scene requirements of the problem itself and the history of related algorithms that solve the requirements of this scene. Including applications to the Byzantine Generals problem mentioned in the other answers.


foreword

In the current application system, whether it is an enterprise-level application or an Internet application, the final data consistency is a problem that every application system has to face. The bullet solution is not something that can be solved by introducing specific middleware or a specific open source framework. It depends more on business scenarios and provides solutions according to the scenarios. According to the author's understanding in recent years, I have summarized several points. More application systems pay more attention to the consistency of data when coding, so that the system is robust.

basic theory

Speaking of transactions, there are several current theories, ACID transaction characteristics, CAP distributed theory, and BASE, etc. ACID is reflected in database transactions, CAP and BASE are theories of distributed transactions, combined with business systems, such as order management, such as Warehouse management, etc., can learn from these theories to solve the problem.

  • ACID 特性

    • A (atomicity) The atomic operation unit of the transaction, the modification of the data is either all executed or not all executed;
    • C (Consistency) At the beginning and completion of a transaction, the data must remain in a consistent state, the relevant data rules must be applied to the modification of the transaction to ensure the integrity of the data, and at the end of the transaction, all internal data structures must be correct;
    • I (isolation) guarantees that transactions are not executed in an independent environment from external concurrent operations;
    • D (persistence) After the transaction is completed, the modification of the data is permanent and can be maintained even if the system fails;
  • CAP

    • C (consistency) Consistency refers to the atomicity of data, which is guaranteed by transactions in classic databases. When a transaction is completed, regardless of success or rollback, the data will be in a consistent state. In a distributed environment, consistency is Refers to whether the data of multiple nodes is consistent;
    • A (availability) service remains available all the time. When the user sends a request, the service can return the result within a certain period of time;
    • P (partition tolerance) In distributed applications, the system may not be able to operate due to some distributed reasons. Good partition tolerance makes the application a distributed system, but it seems to be a functioning whole
  • BASE

    • BA: Basic Availability Basic business availability;
    • S: Soft state flexible state;
    • E: Eventual consistency final consistency;

Several practices for eventual consistency



 

Transactions in a single database case

If the application system is a single database, then this is well guaranteed, and the transactional characteristics of the database are used to meet the consistency of the transaction, and the consistency at this time is strong consistency. For java application systems, it is rarely hard-coded directly through transaction start, commit and rollback, and most of them are guaranteed through spring's transaction template or declarative transaction.

Eventual Consistency Based on Transactional Message Queuing

With the help of the message queue, where the business logic is processed, the message is sent. After the business logic is successfully processed, the message is submitted to ensure that the message is sent successfully, and then the message queue is delivered for processing. Try until you succeed, but it only applies to the business logic where the first stage is successful and the second stage must be successful. Corresponds to the C process in the above figure.

Eventual Consistency Based on Message Queue + Timing Compensation Mechanism

The difference between the previous part and the above queue based on transactional messages is that the retry in the second stage is no longer the retry logic of the message middleware itself, but a separate compensation task mechanism. In fact, in most logics, the probability of failure in the second stage is relatively small, so it can be more clear to list the independent compensation tasks separately, and it can be more clear how many tasks have failed until now. Corresponds to the E process in the above figure.

Commit/rollback mechanism for business logic of business system

It is not difficult to say this. Commit and rollback are typical concepts in database transactions, but in the case of distributed system, this needs to be implemented in business code. Commit is successful and rollback fails.

Idempotency Control of Business Application System

Why idempotent? The reason is simple, after the system call does not achieve the expected result, it will retry. Then the retry will face problems. After the retry, it cannot affect the business logic. For example, when creating an order, the first call times out, but the calling system does not know whether the timeout succeeded or failed, and then he retry. , but in fact the first call to create an order was successful. At this time, it was retried. Obviously, the order can no longer be created.

  • Inquire

The query API can be said to be naturally idempotent, because you query once and query twice, for the system, there is no data change, so querying once is the same as querying multiple times.

  • MVCC scheme

多版本并发控制,update with condition,更新带条件,这也是在系统设计的时候,合理的选择乐观锁,通过version或者其他条件,来做乐观锁,这样保证更新及时在并发的情况下,也不会有太大的问题。例如update tablexxx set name=#name#,version=version+1 where version=#version# ,或者是 update tablexxx set quality=quality-#subQuality# where quality-#subQuality# >= 0 。

  • 单独的去重表

如果涉及到的去重的地方特别多,例如ERP系统中有各种各样的业务单据,每一种业务单据都需要去重,这时候,可以单独搞一张去重表,在插入数据的时候,插入去重表,利用数据库的唯一索引特性,保证唯一的逻辑。

  • 分布式锁

还是拿插入数据的例子,如果是分布是系统,构建唯一索引比较困难,例如唯一性的字段没法确定,这时候可以引入分布式锁,通过第三方的系统,在业务系统插入数据或者更新数据,获取分布式锁,然后做操作,之后释放锁,这样其实是把多线程并发的锁的思路,引入多多个系统,也就是分布式系统中得解决思路。

  • 删除数据

删除数据,仅仅第一次删除是真正的操作数据,第二次甚至第三次删除,直接返回成功,这样保证了幂等。

  • 插入数据的唯一索引

插入数据的唯一性,可以通过业务主键来进行约束,例如一个特定的业务场景,三个字段肯定确定唯一性,那么,可以在数据库表添加唯一索引来进行标示。

  • API层面的幂等

这里有一个场景,API层面的幂等,例如提交数据,如何控制重复提交,这里可以在提交数据的form表单或者客户端软件,增加一个唯一标示,然后服务端,根据这个UUID来进行去重,这样就能比较好的做到API层面的唯一标示。

  • 状态机幂等

When designing a document-related business or a task-related business, a state machine will definitely be involved, that is, there is a state on the business document, and the state will change under different circumstances. Generally, there is a finite state machine. At this time, If the state machine is already in the next state, and a change to the previous state comes at this time, it cannot theoretically be changed. In this way, the idempotency of the finite state machine is guaranteed.

The introduction of asynchronous callback mechanism

Application A calls B. In the return result of the synchronous call, B returns successfully to A. In general, it ends at this time. In fact, it is no problem in 99.99% of the cases, but sometimes in order to ensure 100%, remember Live at least 100% in the system design. At this time, system B will call back to A again and tell A that you have succeeded in calling my logic. In fact, this logic is very similar to the three-way handshake in the TCP protocol. Process B in the figure above.

Confirmation mechanism similar to double check mechanism

It is still the process of asynchronous callback in the above figure. A calls B synchronously, and B returns successfully. This call is over, but in order to ensure that after a period of time, this time can be a few seconds, or it can be processed regularly every day, and call B again to check whether the previous call was successful. For example, A calls B to update the order status, which is successful at this time. After a delay of a few seconds, A queries B to confirm whether the status is what he just expected. Process D in the figure above.

Summarize

The summary of the above points is more reflected in the business system. In the ultra-complex system, the consistency of data is not something that can be solved by simply introducing middleware, but more flexible response based on business scenarios. .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325652917&siteId=291194637
Recommended