Coherence distributed systems

table of Contents

Foreword

1, CAP principle

2, BASE and eventual consistency

3, Paxos algorithm Introduction

The basic role of Paxos is as follows:

Paxos may exist in the following roles.

Here are the specific process of Paxos.

(1) to initiate the first phase of the proposal stage

(2) The second stage is the resolution of approval phase


Foreword

ACID require relational database theory, the identity exists in the transaction, the transaction occurs around represents, database integrity constraints are not broken ring.

In distributed systems, "consistency" the term includes two aspects.

A plurality of copies of the content (1) is the same data (or may be viewed as a completeness requirement atoms). If you require multiple copies at any time it is the same content. This can also be seen as a transaction requirement that updates to the data requirements simultaneously on multiple copies, either all succeed or not succeed.

(2) the system to perform a series of related operations, state of the system remains intact.

For example, users A and B are simultaneously updated data D. D1.0 of the original data set, the correct order is to modify the data A to D2.0, B A is updated on the basis of form D3.0. But because A and B are simultaneously updated data, it may be A and B are based on the updated data D1.0. In this case, the A and B versions of written data is successfully written to the first cover (not written on the first version). If A and B, respectively, to write data to different copies, two conflicting versions appears: D2.a and D2.b.

However, taking into account the common data application is not very big need for traditional strong transactional consistency mechanism, many practical NoSQL database software does not support distributed transactions, or the need to achieve through a complex secondary development. At this point, only the consistency of synchronization involving multiple copies.

1, CAP principle

CAP means a distributed system, Consistency (consistency), Availability (availability), Partition tolerance (partition fault tolerance) three characteristics.

CAP theorem means that in a distributed system, three characteristics of the CAP can not have both, can only meet two.

Consistency (Consistency): refers to all the nodes can agree on a distributed data system. Specific to NoSQL system, said multiple copies of data content main concern is the same. If only one copy of the data, so a consensus can be reached easily in the system, but in the case of multiple copies of data necessary in writing, reading and other process design consistency policy. In addition, the NoSQL database will be concerned about the consistency of "strength", such as whether to allow data inconsistency within much time.

Availability (availability): it refers to the system can be feedback to the user operation. Most software operating system will be operational feedback to the user, and therefore the availability of this system usually refers to the degree of immediate feedback. There are also some NoSQL systems will not have to be such as feedback data delete operation, it requires the user to query the results themselves.

Partition tolerance (partition fault tolerance): can also be called a protective partition. Partition will be appreciated that a system failure occurs, some nodes unreachable message loss or partially, be appreciated that at this time divided into a plurality of regions of the system. Fault tolerance is a partition in the lower part of a node failure, and loss of message occurs, the cluster system can still provide services for data access. It was also understood to partition data partition, partition tolerance, which is fault-tolerant via data partitioning, ie multiple copies, to achieve complete data can be accessed when the system is part of a node failure. Regardless of understanding, it can be regarded as a multi-copy strategy in the system.

CAP theory is that a distributed system in which only take into account two characteristics that appear CA, CP and AP three cases, taking into account the CA system can not use multiple copies; both CP must tolerate slow response system; you need to take into account the AP tolerated within the system multiple copy inconsistencies that may arise.

eg: When a user attempts to access data, strong consistency principle requires system needs to write all copies of data, or to check whether a copy of all the data consistent. System principle requires the availability of the operation quickly and give the user feedback. But if some of the nodes appears unreachable, it is impossible to guarantee that all data are consistent, if mandatory for all data consistency. The system before recovery can not be a result of the feedback operation to the user.

In practice, CAP principle can not understand either-or choice, usually a trade-off according to the actual situation, or at the software level in a configurable manner, allowing users to carry out policy options. eg: an ATM bank machine network failure and the main computer room, at this time whether to allow the ATM machine out of money? If you allow the resulting data inconsistencies may cause economic losses and service abuse, do not allow the resulting service is unavailable, affecting the user experience, detrimental to the corporate image. In practical applications, it can be prescribed to the ATM out of money lost contact when a limit, the data does not match acceptable, lifting some of the user experience.

CAP principles can not only be understood as the entire distributed software design principles, at different levels, subsystems or modules, are designed to develop local strategies according to the principle of CAP. For example, a request from a node in the system, in the management of their data is both CA, but on the whole cluster is both CP or AP.

Finally, CAP principle and the principle of distributed systems for consistency, not only for big data, NoSQL field, also adapted to the site's terms of distributed architecture design and business process design.

2, BASE and eventual consistency

According to CAP principles, can be seen in a distributed system can not be the perfect solution for both consistency, availability, partition and fault tolerance. Therefore, this problem will appear in the design NoSQL database.

(1) strong consistency are the advantages of traditional relational databases, ACID reflected in four aspects. Many people think that the so-called database should be strong consistency. But NoSQL database design is still to maintain such characteristics?

(2) Availability (here can be seen as a response delay) is a distributed system, many very important indicator. NoSQL design requirements and major electricity supplier sites differ, if it is applied to the back end of such a system, you need to ensure that even if the operation of large data sets, have a very short response time.

(3) partitions fault tolerance is bound to take into account a lot of NoSQL. People put big data as "assets" will inevitably require the data can not be lost, and the data to be full-line, off-line do not save, so as to use the data to create value. Therefore supports distributed, multi-copy is mandatory most NoSQL systems.

In order to solve the above problems, a distributed system according to the actual business requirements, consistency make certain compromises, this time not to give up guarantees consistency in distributed systems, but to provide weak consistency guarantees. Specific requirements can be described by the three word BASE,.

(1) Basically Available (available basic): where the context allows some nodes or functions in a distributed system failure, or other part of the core system data is still available. For example, certain electrical Chamber of Commerce at the scene "two-eleven" and other transaction busy, temporarily shut down non-essential functions such as product reviews.

(2) Soft-state (state soft / flexible transaction): allows the system appears "intermediate state", in NoSQL may be embodied as a plurality of temporary inconsistencies allowed copies.

(3) Eventual Consistency (eventual consistency): allows the system to exist between the state or multiple copies of the temporary inconsistencies, but over time, there will always become the same. This inconsistency is generally not too long a time, but as the case may be. Eventual consistency similar to the non-real-time bank transfers through the scene, who transfers the money is designated gone, it may take 24 hours to reach the recipient's account, during, before and after the transfer and the user account status is inconsistent.

The final consistency can be seen as the core BASE theory, that is, by weakening the consistency required for better scalability, reliability (multiple copies) and responsiveness. Differences choice NoSQL and relational database on consistency, but also reflects the characteristics of the two can not replace each other.

In practical applications, ACID and BASE is not absolute opposites, according to the actual situation in the different modules of distributed systems, subsystems using different principles. For the actual NoSQL software, since most dropped support for distributed transactions, so the focus is more consistency in multiple copies of the final aspects, namely whether to allow copies of data inconsistencies arise during the short period of time or failure, However, each copy of the final data will be synchronized, and Web sites such as this scene opposite sex difference.

3, Paxos algorithm Introduction

In a distributed system, and sometimes require multiple nodes to reach a consensus on an issue, for example, a plurality of nodes need to work together to update a configuration property, jointly execute an instruction, or in a master-slave distributed system architecture, when the master node when a failure occurs, multiple nodes from the election of a new master node (ie, when the new master to whom this issue to reach a consensus).

Taking into account the presence of network latency, packet loss and the possibility of interruption, the node may not receive the message in time, and the node might have a different opinion on the proposal, such as different nodes in a cluster while a different configuration parameters. At this point a need for a distributed consensus algorithm, enabling faster and vote on a proposal to reach a consensus between the nodes.

Paxos algorithm is a proposed by Leslie Lambert (Leslie Lamport) message-based consensus algorithm, also known as a distributed consensus algorithm, which is considered to be the most effective kind of algorithm, its main purpose that consensus between multiple nodes on a particular proposal.

The basic role of Paxos is as follows:

(1) number of proposer (proposers): proposed resolution proposer in charge of the voting proposal (proposal), and given recommendations (otherwise known as value, value).

(2) number (usually three or more) acceptor (voters): acceptor vote after receiving the proposal, the principle of majority rule to decide whether to accept the proposal, and whether to approve the value.

Paxos may exist in the following roles.

(1) several client (Client): Proposed producer, client proposals will be submitted to any of the proposer, submitted by the vote.

(2) number of learner (learners): learner no voting rights but concerned about the proposal, they can only observe the voting results, and update their knowledge, to obtain the resolution approved (value).

(3) Several coordinator (collaborator) and a leader (leader): The presence of these roles in Paxos mechanism improved in order to better coordinate the proposal to initiate the process.

In a real system, usually only the concept of client and server. The client usually play the role of client, proposer and learner, and the server role accepter, coordinator and leader of. In addition, a node may assume multiple roles.

Here are the specific process of Paxos.

The actual Paxos algorithm into a plurality of stages, herein referred to as prepare, promise, accept and accepted four stages, if the learner present system, may also be added to a learn phase. The following describes the details of the algorithm.

(1) to initiate the first phase of the proposal stage

proposer prepare request to send at least half of the accepter. Because there may be multiple proposer expect to vote on their proposal, in order to ensure that the process only a proposal, proposer will offer to their respective numbers, numbers can be interpreted as increasing numbers or timestamps. proposer will be sent to the various proposals and numbers acceptor.

acceptor decide whether to accept the proposal, and sends promise to respond proposer. To receive proposals acceptor may be one of the following.

① If the acceptor found the proposal received prior to proposal number than the number of older, it does not make any response.

② If the acceptor find this proposal is the latest number (received before the latest numbers greater than), will accept the offer, and record this number, pledge not to receive any number of proposals and resolutions older. At this point acceptor promise to send a response to the proposer. If the acceptor of the proposal has adopted a resolution in response to the latest information is added to a number of resolutions, note that if the acceptor historic resolution, in theory, the resolution has been approved by a majority of more than acceptor; if there is no history acceptor resolution then add a null value in the response message.

proposer collective response acceptor within a time limit.

① When found that more than half of the acceptor had to respond (ie consent vote on this subject), the algorithm into the second stage.

② If the response does not exceed half of the acceptor --- possibly due to network failure or node number is too old, the proposer needs to be updated proposal number values, and repeat the first stage again.

(2) The second stage is the resolution of approval phase

proposer send a request to accept at least half of the acceptor. Since the first stage, acceptor history will be appended to the resolution or null response information, and therefore, at this time proposer may be one of the following.

① If there are multiple acceptor resolution comes in response to the historical information, the number to find the latest agreement (in theory, should be a majority), it will be sent to all acceptor, as well as on a number used to send.

② If you do not have any acceptor resolution, the proposer will own proposed resolution to all acceptor, as well as the use of a number of simultaneously transmitted.

acceptor sends accepted respond to proposer. When the acceptor received the desired resolution proposer sent in the second stage, will check its compliance with the latest numbers principle.

① If the number is up to date (greater than or equal received before the latest numbers), the approval of the resolution, persistent storage, and confirm.

② If the number is not the latest (for example, in the process setting a new number of other proposer), the resolution is rejected, and comes with the latest number of the current accceptor at.

When the proposer accepted response is received more than half acceptor, there will be the following situations.

① If you find a number of updates, then process other proposer refresh the latest numbers, and updates its own number, return to the first stage, re-proposed.

② If there is no update number is considered to have reached a consensus resolution.

③ If the response is less than half the number, probably due to network or node failure, update their numbers, return to the first stage, re-proposed.

④ If there is learn the system, you will learn through active or passive, to understand the current resolution at the proposal from the acceptor, and update their knowledge.

In the classic Paxos process, when the proposer found that the proposal can not receive enough promise to respond or respond accepted, in theory, will increase the number of proposals to make the new number, prepare and resubmit the request. So when multiple clients expectations were proposed, they may continue to improve the epoch number to seize proposed lowering the voting rights that may result in efficiency, productivity and even life lock.

To solve this problem, in the second-phase commit coordinator (coordinator) character is introduced. coordinator may have multiple roles, but which has been called one of the most authoritative leader (leader). When the client or the proposer needs to be proposed, the proposal can be submitted to any coordinator, coordinator will be submitted to the leader, which is determined by the leader of the resolution vote. In the second stage, we need to be approved by the leader value is passed to the respective acceptor. This mechanism avoids the proposer increase the number of live-locking problems caused by their own.

The following diagram depicts the process after the introduction of the protocol Paxos coordinator and leader.

Note that the figure server1 first played the role of proposer, followed by playing the role of acceptor in the polls. In addition, if the leader goes down, the coordinator can be found to this problem through mechanisms such as heartbeat, and the public to elect a new leader, the election process can be achieved through a variety of mechanisms, are not concerned here.

Paxos protocol using the well-known open-source software systems such as Google's Chubby Zookeeper and maintenance of the Apache Software Foundation. In addition many NoSQL data in the database multiple copies of consistency, the master node election and other functions are implemented based on the idea of ​​Paxos.

Reference: "The basic principle of NoSQL databases."

Published 95 original articles · won praise 16 · views 50000 +

Guess you like

Origin blog.csdn.net/tiankong_12345/article/details/101156486