CAP and data consistency

It assumes a distributed system is working with multiple nodes in the system under network environment, and these nodes will itself become unstable due to various reasons. Of which there is a very important concept --CAP principle. This principle guides the most distributed system design process, CAP principles generally distributed systems must say that there are three characteristics: consistency (Consistency), partition tolerance (Partition) and availability (Availability), and three properties can not all meet at the same time in the design of distributed systems.

 

 

For example, a distributed system with N nodes work together via network link. First, you can not only complete data X stored on a node, the node that is because once stopped working due to various reasons, the data can not be accessed X it certainly does not meet the system availability, and once this node can no longer is restored, the data X will never be lost. Therefore, data should be stored at least X different nodes on multiple parts, the more the amount of the stored copy of the data to ensure the safety of X, but also to better ensure that even in a case where a plurality of nodes simultaneously unavailable, data can be similarly X It is accessed. This is the partition of the requirements, in accordance with the general experience, the copy number data X should be at least three. In this case, when the data changes how X update these copies of it? The best results is that when the client asks if issuing an update request data X, and access data from any node X can get it up to date, this is the conformance requirements. Of course, the best results too theoretical, you know the work of network-based distributed systems affected by many external factors: If the synchronization process can not find a copy of the node connected to how to do? If a customer while another data X update again how to do? If you really want to achieve conformance requirements such theories, it will only make all need to read / write data X client to wait until after the completion of all copies of the X data synchronization, and then respond, but apparently it from the perspective of availability but also does not meet the requirements.

 

 

 

As another example, the design is typically based on a relational database ACID principle, i.e. atomicity (Atomicity), consistency (Consistency), isolation (Isolation) and persistent (Durability Rev), commonly used relational database and transaction technology ACID principle, wherein each transaction is to minimize atomic operations, the transaction may be provided different levels, including readable uncommitted, such repeatable read transaction level; relational database also provides Once the transaction can not be performed correctly submitted data rollback, If you want to continue to modify the data can only start a new business. And all of this in order to ensure the consistency of the database system is a strong system. Even erected on top of each relational database instance of a distributed transaction mechanism, but also to ensure that this goal: as long as there is any database instance of a distributed transaction processes involved in abnormal, the entire distributed transaction may not submit properly, of course, unable to complete the data write operation. Of course, the old data can be read, even if there is a problem in some other database node node may also assume the read operation, because such a change in the consistency of operational risk does not exist.

Any distributed system can not in principle to the CAP three properties at the same time as the main design goals, to reach very high strong consistency and high partition tolerance, it is necessary to sacrifice the availability expense (note the expense but not completely give up). The partition tolerance and is the basis for the establishment of a distributed system, without any partition tolerance of distributed systems can not even be called a distributed system; high-pressure environment can not unduly sacrificing the availability of a distributed system, you know 99.99% availability and 99.999% availability is completely two grades of distributed systems.

1-2. Data consistency
so similar to a distributed transaction mechanisms as excessive emphasis on data consistency of design ideas are less affected by the mainstream distributed system design ideas are welcome, so at least from the design principle of distributed systems present a variety of published point of view of. For example, such HDFS distributed file system, first of all emphasized that a certain degree of high availability and partition tolerance followed by the data consistency, data consistency through a copy of the guarantee. After all copies are completed but not written to HDFS action was considered consistent with the data, but as long as part of a complete copy of the data write operation, data is successfully written to HDFS considered and the client can call the new data, but did not complete synchronized copy It will be followed by data synchronization, to reach the final data consistency. DNS is also a distributed system need to consider high availability and partition tolerance, because the DNS service across different networks, so when a new domain name resolution DNS configuration changes after a request to wait for all global DNS nodes are the changes to take effect before continuing work , but the organizational structure of the DNS to ensure that the results of several DNS service nodes resolve www.XXXX.com the domain name is ultimately the same. This sacrificial system design ideas to ensure the consistency and reliability of the system partition tolerance, there is a specific call in the field of distributed systems: BASE. Basic available (Basically Availble), soft state (Soft-state) and a final consistency (Eventual Consistency).

In the previous section, a brief description of the contents of CAP principles and examples, reference is about the consistency of two concepts: strong consistency and eventual consistency. Strong customer acquisition data consistency can be summarized X is any point in a distributed system, whether it carried out the operation in which a node distributed system that acquires data X are the same. From this definition, a distributed transaction mechanism is a kind of strong consistency of implementation. Weak consistency corresponding definitions have strong consistency, weak consistency do not mean to maintain data consistency, but say no all the time to ensure data consistency, do not promise when can we ensure that any distributed system node can read consistent data. The final consistency is a particular weak consistency of the results, based on both a weak commitment to consistency, after a time window of data inconsistencies, and ultimately to ensure data consistency. The time window of inconsistent data in the client appeared to be very short, and a distributed system can also block inconsistent data to the client by a variety of ways, for example from a master copy mode.
----------------
Disclaimer: This article is CSDN bloggers "can not say good fight face" of the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/yinwenjie/article/details/60584554

Guess you like

Origin www.cnblogs.com/sylar5/p/11525147.html