Distributed theoretical basis: BASE theory


BASE refers to Basically Available, Soft State, and Eventual Consistency. The core idea is that even if strong consistency cannot be achieved (the consistency of CAP is strong consistency), the application can adopt A suitable way to achieve final consistency.

Basically Available

When a failure occurs in a distributed system, partial availability is allowed to be lost, that is, core availability is guaranteed.
The key words here are "part" and "core". It is a challenging task to specifically select which businesses can be lost and which businesses must be guaranteed. For example, for a user management system, "login" is a core function, while "registration" can be counted as a non-core function. Because unregistered users have not yet used the system's business, failure to register will at best result in the loss of some users, and the number of these users is small. If a user has registered but cannot log in, it means the user cannot use the system. For example, games that have been charged with money cannot be played, cloud storage cannot be used... these will cause great losses to users, and the number of logged-in users is far greater than that of newly registered users, and the scope of the impact is larger.

Soft State

Allows the system to exist in intermediate states without affecting overall system availability. That is, there is a delay in the process of data synchronization that allows the system to synchronize data between data copies on different nodes. The intermediate state here is the data inconsistency in CAP theory.

Eventual Consistency

After a certain period of time, all data copies in the system can eventually reach a consistent state.
The key words here are "a certain time" and "eventually". "A certain time" is strongly related to the characteristics of the data. Different data can tolerate different inconsistent times. To give an example of the Weibo system, it is best for user account data to reach a consistent state within 1 minute, because after a user registers or logs in at node A, it is unlikely to switch to another node immediately within 1 minute, but after 10 minutes You may just log in to another node again; and the latest Weibo posted by the user can be tolerated to reach a consistent state within 30 minutes, because for the user, the latest Weibo posted by a certain star cannot be seen, and the user is unaware. , would think that the celebrity did not post on Weibo. The meaning of "eventually" is that no matter how long it takes, a consistent state must be reached in the end.

3 levels of distributed consistency

  • Strong consistency: What is written to the system is what is read out.
  • Weak consistency: It is not necessarily possible to read the latest written value, nor is it guaranteed that the data read after a certain time will be the latest, but it will try to ensure that the data is consistent at a certain moment.
  • Final consistency: An upgraded version of weak consistency. The system will ensure that data is consistent within a certain period of time.

The industry recommends the eventual consistency level. However, certain scenarios that require very strict data consistency, such as bank transfers, still need to ensure strong consistency. consistency.

How to ensure eventual consistency?

  • Repair while reading: When reading data, detect data inconsistencies and repair them. For example, Cassandra's Read Repair implementation. Specifically, when querying data from the Cassandra system, if it detects that the copy data on different nodes is inconsistent, the system will automatically repair the data.
  • Repair on write: When writing data, detect data inconsistencies and perform repairs. For example, Cassandra’s Hinted Handoff implementation. Specifically, when writing data remotely between nodes in the Cassandra cluster, if the writing fails, the data will be cached and then retransmitted regularly to repair data inconsistencies.
  • Asynchronous repair: This is the most commonly used method, which detects the consistency of replica data through regular reconciliation and repairs it.

Summarize

BASE theory is essentially an extension and supplement to CAP, more specifically, a supplement to the AP scheme in CAP.
When analyzing the CAP theory, the previous article mentioned two points related to BASE:

  • CAP theory ignores delay, but in practical applications delay is unavoidable.
    This means that a perfect CP scenario does not exist. Even if the data replication delay is a few milliseconds, the system does not meet the CP requirements during this interval of several milliseconds. Therefore, the CP scheme in CAP actually achieves eventual consistency, but the "certain time" refers to only a few milliseconds.
  • The sacrifice of consistency in the AP scheme only refers to the partition period, rather than giving up consistency forever.
    This is actually where the BASE theory extends. Consistency is sacrificed during partitioning, but after partition failure recovery, the system should reach final consistency.

Guess you like

Origin blog.csdn.net/weixin_44816664/article/details/134359633