Distributed 001 study notes

Speaking CAP theory, BASE theory from the distributed consensus

statement of problem

In the field of computer science, distributed consistency is a very important and widely explore and demonstrate the problem, first look at three service scenarios.

1, the train station ticket

If you say that our end users is a frequent train traveler, usually he is to go to the station ticket office to buy tickets, and then took the ticket to the wicket, and then boarded the train and began a wonderful trip ---- all it seems so harmonious. Imagine if he chooses the destination of Hangzhou, and Hangzhou train bound for a trip only last a ticket, possibly at the same time, another passenger ticket window are also different purchase the same ticket. If you say no consistency ticketing system protection, both ticketing success. And when the ticket wicket, one of the passengers told that his ticket will be invalid ---- Of course, modern Chinese railway ticketing system has been rarely seen such a problem. But in this case we can see that the end-user demand for very simple systems:

"Please give me a ticket, if I vote no, please tell me the ticket when the ticket is invalid."

This system presents data on the ticket strict compliance requirements ---- system (in this case is that the middle finger of Hangzhou train trip bound for the remainder of the votes) regardless of which ticket window at all the time must be accurate!

2, bank transfer

If our end users is a newly graduated college students, usually in the first month to get the wages that they will choose to remittances home. When he arrived at the bank counter, complete the transfer operation, the bank teller will be kind enough to remind him: "Your transfer will be credited for future work at the N!." At this point it graduates have some frustration, that name will counter attendant told: "Well, it does not matter how long, less money do not like!" ---- this has become almost all users for the most modern banking system basic needs

3, online shopping

If you say that our end users is an online shopping of people, when he saw an inventory of favorite products. 5, will quickly confirm the purchase, write down the delivery address, and then single----- However, the next one that moment, the system may inform the user: "insufficient inventory!." At this time, the vast majority of consumers will complain that they are too slow, making the goods beloved by others taken away.

But in fact there have been online shopping experience in systems development engineers must understand, that the inventory displayed on the product details page, usually not true inventory of the goods, and only in the real orders to buy time, the system will check the true of the merchandise inventory. But who cares?

 

Interpretation issues

For the above three examples, I believe we must see it, our end-users in the use of various computer products for data consistency is not the same demand:

1, some systems, it is necessary to quickly respond to user, but also to ensure the system's data to any client are authentic, just like the train station ticket system

2, some systems, the user needs to guarantee reliable data security, although there is a delay in data consistency, but ultimately must ensure strict consistency, as the bank transfer system

3, some of the system, although some can show users said to be "wrong" data, but in the course of the entire system, will carry out checks on the accuracy of system data at a certain process to avoid the occurrence of unnecessary user loss, like online shopping system

 

The proposed distribution of consistency

An important issue in distributed systems to solve is to copy the data. In our daily experience in the development, I believe that many developers have encountered such a problem: Suppose a client C1 value K system is updated by V1 V2, but the client can not read C2 immediately to the latest value K , you need to read to at a later time. This is normal, because there is a delay between the database replication.

Distributed systems for data replication needs generally come from two reasons:

1, in order to increase the availability of the system, to prevent single point of failure due to unavailability

2, improve overall system performance by load-balancing technology to make copies of the data in different places are able to provide services for users

Data replication in the availability and performance of distributed systems to bring great benefits are self-evident, but data replication consistency brings challenge and every system developers have to face.

After the distribution of the so-called consistency, it refers to the introduction of data replication mechanism in a distributed environment, that may arise between different nodes of the data, inconsistent and can not rely on their own computer applications to solve data situation. Simply speaking, when the data consistency refers to a copy of the data is updated, it must ensure that it is possible to update other copies, or copy data between different inconsistent.

So how to solve this problem? An idea is " since it is due to the problems caused by delay action, then I can write the action blocked until after the data has been copied, the write operation to complete ." Yes, this seems to solve the problem, but there are some systems architecture is indeed the direct use of this idea. But this idea at the same time solve the problem of consistency, but also brought new problems: write performance. If your application has a lot of scenes written request, after the use of this idea, the subsequent write requests will block write operations on a request of the former, leading to a sharp decline in overall system performance.

Generally speaking, we could not find a way to meet all of the distributed system distributed system properties consistent solutions. So, how are we to ensure data consistency, without affecting the performance of the system is running, each distributed systems is important to consider and weigh. Thus, the consistency level was born:

1, strong consistency

This level of consistency is the most user intuitive, it requires a system of what is written, will read out what is a good user experience, but it will often large impact on system performance

2, weak consistency

This level of consistency restraint system after writing success, does not promise immediate value can read the writing, but also how long commitment shortly after the data can reach consensus, but to ensure a certain level of time (such as second level as much as possible after), the data can be consistent state

3, eventual consistency

The final consistency is the special case of a weakly consistent, the system will guarantee within a certain time, to reach a consistent state of the data. The reason here will eventually consistent singled out because it is a very weak consistency model of consistency respected, but also on industry data consistency large distributed systems more respected model

 

All kinds of problems in distributed environments

Distributed systems architecture from the beginning appeared along with many difficulties and challenges:

1, the communication abnormality

From a centralized to a distributed process of evolution, the inevitable introduction of network elements, due to the unreliability of the network itself, and therefore introduces additional problems. Distributed systems require communications between various network nodes, each network will be accompanied by the risk of network communication is not available, an optical fiber network, a router or other hardware device or system DNS unavailable will eventually cause trouble distributed system complete a communications network. Further, even if the network communication between the nodes can be normally distributed system, the delay will be greater than its stand-alone operation. We generally find modern computer architecture, a single memory access delay in nanoseconds (typically 10ns), while the normal first-order lag network communication is about 0.1 ~ 1ms (corresponding to 105 times the memory access latency), delay such a huge difference, also affect the send and receive process the message, so the message is lost and became very popular message delay

2, network partition

When the network due to an abnormal situation occurs, leading to network delay between the nodes in the distributed system portion increasing, eventually leading to all the nodes in a distributed system, the normal communication between certain nodes, while other nodes can not ---- we call this phenomenon is called a network partition . When the network partition occurs, there will be a distributed system of local small clusters, in extreme cases, these local small clusters will be completed throughout the distributed system independent of the original need to complete functionality, including transaction processing of the data, which distributed consistency made a very big challenge

3, three-state

The above two points, we have learned that in a distributed environment, a wide range of possible network problems occur, so the distributed system every request and response, there is a unique tri-state concept that success, failure timeout . In the traditional stand-alone system, the application after calling a function, be able to get a very clear response: success or failure. In a distributed system, since the network is unreliable, although in most cases, the communication network is also able to receive a response success or failure was the case when the network is abnormal, it may appear timeouts, usually you have the following two situations:

(1) due to network reasons, the request has not been successfully transmitted to the receiving side, but in the course of transmission loss phenomenon occurred message

After successful (2) the request is received by the receiver, it is processed, but in the process the response back to the sender, the message is lost phenomenon occurs

When such a time-out phenomenon, the originator of the communication network is unable to determine whether the current request is processed successfully

4, node failure

Node failure is another relatively common problem in distributed environments, referring to the down server nodes distributed system that appears or "dead" phenomenon, usually based on experience, each node are likely to fail, and every day occur

 

Distributed things

With the development of distributed computing, distributed computing things in the field has also been widely used. In the stand-alone database, we can easily implement a transaction processing system to meet the ACID properties, but in a distributed database, data is scattered on different machines each, and how these data distributed transaction processing with very large It challenges.

Refers to a distributed transaction participant things, things support server, and the server resource manager things are located on different nodes of a distributed system, typically a distributed transaction will involve the operation of a plurality of data sources or business system.

It is contemplated that one of the most typical distributed object scene: a cross-bank call money transfer operation involving two off-site banking services, one of which is to provide ATM service local banks, and the other is the target deposit services provided by banks, both service itself is stateless and independent of each other, together form a complete distributed transaction. If successful withdrawals from local banks, but for some reason fails deposit services, then it must be rolled back to the state before the withdrawal, or the user may find that their money was missing.

You can see from this example, a distributed transaction can be seen as a sequence of operations consisting of multiple distributed, such as the example above, service withdrawal and deposit services, this sequence of operations can usually distributed a series of things called sub . Therefore, distributed transactions can also be defined as a nested things, while also having the ACID properties of things. However, due to the distributed transaction, the execution of each child things are distributed, so to implement a distributed transaction processing system to ensure ACID properties is particularly complex.

 

CAP theory

A classic distributed systems theory. CAP theory tells us: a distributed system can not meet the consistency (C: Consistency), availability (A: Availability) and partitions fault tolerance (P: Partition Tolerance ) these three basic needs, which can only meet two items .

1. Consistency

In a distributed environment, consistency means that data between multiple copies of your ability to match characteristics. In the consistency of demand, after a system update operation is performed in a consistent state data, the system should ensure that the data is still in the state has been.

For a copy of the data distributed on different nodes of a distributed system, if a first node of the data update operation and were successfully updated, but no such data on the corresponding nodes of the second update, then when the second node of the data read operation, the acquired data is still the old (or dirty data), which is a typical distributed data inconsistent situation. In a distributed system, if able to do an update operation for the successful implementation of a data item, all users can read the latest value, then such a system was considered to have strong consistency

2. Availability

Availability refers to the service provided by the system must always be in a usable state, always returns results for each operation request user within a limited time. The focus here is "limited time" and "return the result."

"Limited period of time" means, for a user operation request, the system must be able to return the processing result for the specified time, if more than this time frame, then the system is considered to be unavailable. In addition, the "limited time" means a system designed from the beginning designed to run indicators, there is usually a big difference between the different systems, however, the user request, the system must exist a reasonable response time, otherwise the user We will be disappointed with the system.

"Return result" is another very important indicator of availability, which requires the system after the processing of the user request, it returns a normal response results. Processing Normal response can often result clearly reflects the team's request that the success or failure, rather than a confused allow users to return results.

3, partition fault tolerance

Partition fault tolerance constraints of a distributed system has the following characteristics: a distributed system in the face of any network partition fails, we still need to be able to provide external assurance to meet the consistency and availability of services, unless the entire network environment has failed .

Refers to a partition in a distributed network system, different nodes in different sub-network (room or remote network), some particular situation causes the network communication sub-networks not appear, but the internal network of the respective sub-network is normal, causing the entire system to a network environment is segmented into a plurality of isolated regions. It should be noted that the composition of each node in a distributed system to join and exit can be seen as a special network partitions.

Since a distributed system can not meet the consistency, availability, partition fault tolerance features three at the same time, so we need to abandon the same:

A chart illustrates this:

Select Description
THAT Give up partitions fault tolerance, greater consistency and usability, in fact, it is to choose the traditional stand-alone database
Of Give up consistency (here say consistency is the strong consistency), the pursuit of partitions fault tolerance and availability, this is the choice of many distributed system design, such as is the case a lot of NoSQL systems
CP Availability give up the pursuit of consistency and partitions fault tolerance, basically not choose, network problems directly so that the whole system is not available

To be clear is that for a distributed system, partition, fault tolerance is a basic requirement. Since we are a distributed system, the distributed system components will inevitably need to be deployed to different nodes, otherwise it does not matter distributed systems, and therefore inevitable subnetwork. And for distributed systems, network problems is a abnormal situation will certainly arise, so partitions fault tolerance has become a problem of a distributed system must need to face and solve. Therefore, system architects often need to spend their energy on how to find a balance between C (consistency) and A (availability) based on business characteristics.

 

BASE theory

BASE is Basically Available (basic available), abbreviated three phrases Soft state (soft state) and Eventually consistent (eventual consistency). BASE theory is the result of the CAP consistency and availability trade-offs, which comes from the summary of the large-scale distributed systems Internet practice is based on CAP theorem gradually evolved. BASE is the core idea of the theory: even if unable to do so strong consistency, but each application according to their operational characteristics, in an appropriate manner to enable the system to achieve eventual consistency . Let's look at the three elements of BASE:

1, the basic available

The basic means of distributed systems available in times of unpredictable failures, allowing lose some usability ---- Note that this is not equivalent to the system unusable. such as:

(1) loss of response time. Under normal circumstances, an online search engine needs to be returned to the corresponding user query results within 0.5 seconds, but due to a failure, the response time of the query result is increased by 1 to 2 seconds

(2) loss of system functionality: Under normal circumstances, the time for shopping at an e-commerce website, consumers can successfully complete almost every order, but in some holiday shopping season big promotion, because of consumers' surge in shopping behavior, in order to protect the stability of shopping system, some consumers may be directed to a page downgrade

2, soft state

Refers to a soft state allows data system there is an intermediate state, and that there is the intermediate state will not affect the overall availability of the system, i.e. the system allows data synchronization process delay between the presence of the copy data of different nodes

3, eventual consistency

Eventual consistency emphasized that all copies of the data, after synchronization over time, eventually able to achieve a consistent state. Therefore, the nature of eventual consistency is the need to ensure that the final system to achieve data consistency, without the need for real-time systems guarantee strong consistency of the data.

Overall, BASE theory-oriented large-scale high-availability scalable, distributed systems, and the traditional ACID properties of things are the opposite, it is completely different from the model ACID strong consistency, but obtained by sacrifice strong consistency availability and allow a period of time the data is inconsistent, but eventually reaches a consistent state . But, in an actual distributed scenario, different business units and components are different requirements for data consistency, so in a particular distributed system architecture design process, ACID BASE theoretical characteristics and will often together.

Guess you like

Origin blog.csdn.net/weixin_37641163/article/details/90755303