Redis learning-distributed database CAP principle

What are traditional ACID

A (Atomicity) Atomicity
C (Consistency) Consistency
I (Isolation) Independence
D (Durability) Persistent

relational databases follow the ACID rules. The transaction is transaction in English, which is very similar to the transaction in the real world. It has the following Four characteristics:

1. A (Atomicity) atomicity

  Atomicity is easy to understand, that is to say , all operations in the transaction are either completed or not done. The condition for the success of the transaction is that all operations in the transaction are successful. As long as one operation fails, the entire transaction will fail and need to be returned. Get out . For example, bank transfer, transfer 100 yuan from account A to account B, is divided into two steps: 1) withdraw 100 yuan from account A; 2) deposit 100 yuan to account B. These two steps are either completed together or not completed together. If only the first step is completed and the second step fails, the money will be inexplicably lost by 100 yuan.

2. C (Consistency) consistency

  Consistency is also easier to understand, that is, the database must always be in a consistent state , and the operation of the transaction will not change the original consistency constraints of the database .

3. I (Isolation) independence

  The so-called independence neutrality means that do not affect each other concurrent transactions , data to be accessed if a transaction is being modified by another transaction, as long as the other uncommitted transactions, access to the data it is not uncommitted transactions Influence . For example, there is an existing transaction that transfers 100 yuan from account A to account B. If the transaction has not been completed, if B queries his account at this time, he will not see the newly added 100 yuan

4. D (Durability) durability

Persistence means that once the transaction is committed, the changes it makes will be permanently stored in the database , and will not be lost even if there is a downtime.

CAP

C: Consistency (strong consistency)
A: Availability (availability)
P: Partition tolerance (partition tolerance) or distributed tolerance

CAP theory means that in a distributed storage system, only the above two points can be achieved at most.
Strong consistency: For example, what is on the data is what. Are all data backups in a distributed system the same value at the same time ? (Equivalent to all nodes accessing the same copy of the latest data)
Availability: For example, it is impossible to use Taobao Double Eleven. After some nodes in the cluster fail, whether the entire cluster can respond to client read and write requests . (High availability for data update)
Partition fault tolerance: In terms of actual effect, partition is equivalent to the time limit of communication. If the system data consistency can not be reached within the time limit, it means that the situation occurred partition, must make a choice between C and A on the current operation .
Example: such as Taobao's bag.
For strong consistency, we require that the number of likes for this bag is 141, which must not be wrong. Accurate guidance is necessary, but it is difficult to ensure the unity of data in high concurrency.
For high availability: there can be weak consistency, such as allowing errors in the number of likes and views, but it cannot cause the website to crash.
So most website architectures use AP. Weak consistency + high availability

For Nosql, partition tolerance is a must . Distributed systems may not be in the same city, such as Taobao, where content distribution is closest to you. Taobao servers may have servers in Hangzhou, some in Shanghai and Suzhou.
And since current network hardware will definitely have problems such as delay and packet loss, partition tolerance is what we must achieve. So we can only weigh between consistency and availability, no NoSQL system can guarantee these three points at the same time .

CA traditional Oracle database
AP Most website architecture choices
CP Redis, Mongodb
Note: A trade-off must be made in the case of a distributed architecture.

There is a balance between consistency and availability. Most web applications do not need strong consistency. Therefore, at the expense of C in exchange for P, this is the direction of current distributed database products.

The decision of consistency and availability
  For web2.0 websites, many of the main features of relational databases are often useless.
Database transaction consistency requirements.
  Many real-time web systems do not require strict database transactions, and require very high read consistency. Low, some occasions do not require high write consistency. Allow for eventual consistency.
The real-time writing and reading requirements
  of the database. For relational databases, if you insert a piece of data and query it immediately, you can definitely read the data, but for many web applications, such high real-time performance is not required. For example, after sending a message on Weibo, it is completely acceptable for my subscribers to see this news after a few seconds or even ten seconds later.
For complex SQL queries, especially multi-table related queries,
  any web system with large data volume is very taboo against related queries of multiple large tables, as well as complex data analysis type report queries, especially SNS type websites. From the perspective of demand and product design, this situation is avoided. More often than not, the primary key query of a single table, and the simple conditional paging query of a single table, the function of SQL has been greatly weakened.

Classic CAP diagram

The core of the CAP theory is that a distributed system cannot satisfy the three requirements of consistency, availability, and partition fault tolerance at the same time. At most, it can only satisfy two at the same time.
Therefore, according to the CAP principle, NoSQL databases are divided into three categories: satisfying the CA principle, satisfying the CP principle, and satisfying the AP principle:

CA-Single-point cluster, a system that meets consistency and availability, usually not very powerful in scalability.
CP-A system that satisfies consistency and partition tolerance. Usually the performance is not particularly high.
AP-A system that satisfies availability and partition tolerance, and generally may have lower consistency requirements.
Insert picture description here

BASE

BASE is a solution proposed to solve the problems caused by the strong consistency of relational databases and the reduced availability caused by them.

  BASE is actually an abbreviation of the following three terms:

  Basically Available (Basically Available)
  Soft state (
  Eventually consistent)
Its idea is to relax the system's requirements for data consistency at a certain moment in exchange for the overall scalability and performance of the system. Why do you say this? The reason is that large-scale systems are often due to geographical distribution and extremely high performance requirements, it is impossible to use distributed transactions to complete these indicators. To obtain these indicators, we must use another way to complete, here BASE Is the solution to this problem

Introduction to Distributed + Cluster

A distributed system is
composed of multiple computers and communication software components connected through a computer network (local network or wide area network). A distributed system is a software system built on the network. It is precisely because of the characteristics of software that distributed systems are highly cohesive and transparent. Therefore, the difference between a network and a distributed system lies more in high-level software (especially operating systems) rather than hardware. Distributed systems can be used on different platforms such as PCs, workstations, local area networks and wide area networks.
To put it simply:
Distributed: different service modules (projects) are deployed on different multiple servers, and they communicate and call between RPC/RMI to provide external services and collaboration within the group.
Cluster: The same service module is deployed on different multiple servers, and unified scheduling is performed through distributed scheduling software to provide external services and access.

Guess you like

Origin blog.csdn.net/qq_39736597/article/details/110959226