Become a way of understanding a distributed architecture architect

Understanding distributed architecture

As computer systems become more and more large-scale, all of the business units centralized deployment architecture on one or several mainframe, it has become increasingly unable to meet today's computer systems, especially the rapid development of large Internet systems, each kind of flexible system architecture model after another. The distributed approach increasingly favored by the industry - the computer system is undergoing an unprecedented transformation from a centralized to a distributed architecture.

Distributed Learning

Centralized and distributed

Centralized system

A so-called centralized system consists of a composition refers to a central node or host computer, the data set stored in the central node, and all business units throughout the system has been focused on the deployment of the central node, all functions of the system by its focus.

The greatest feature of the system is to deploy centralized structure is very simple, the underlying general use of expensive mainframe purchase from IBM, HP and other vendors to. There is no need to consider how to deploy multi-service node, also do not consider the problem of distributed collaboration between the nodes. However, the use of stand-alone deployment. The system is likely to bring large, complex, difficult to maintain, single point of failure (when a single point of failure will spread to the entire system or network, resulting in paralysis of the entire system or network), poor scalability issues.

Distributed Systems

Distributed systems is a hardware or software components located on different network computers, and passes only the coordination system communicates through a message between one another. It is simply a collection of a group of independent computers together to provide services, but for users of the system, it is like a computer in the same service. Distributed means you can use more ordinary computer (as opposed to expensive mainframe) distributed cluster consisting of external services. The more computers, the more CPU, memory, storage and other resources, the greater the amount of concurrent access can handle.

From the concept of distributed systems, we know that communication and coordination between the various hosts mainly through the network, the distributed computer system is virtually no limit in space, and which could be placed on a different rack, it may be deployed in different rooms, it is also possible in different cities, for large sites may even be located in different countries and regions. However, no matter how they are distributed in space, a standard distributed system should have the following main characteristics:

Distributive

Distributed among multiple computers in the system can be randomly distributed in space position, while the distribution of the machine will change at any time.

Reciprocity

Distributed computer system without the master / slave distinction, i.e. without control of the host of the system, nor controlled slave composition of all nodes of a distributed computer system are peers. Copy (Replica) is one of the most common concept of distributed systems, it refers to a distributed system for data redundancy and services provided. In a typical distributed system, in order to provide high availability of external services, we tend to have a copy of the data and processing services. Copies of data on different nodes means that the same data persisted when the data stored on one node is lost, the data can be read from a copy of which was distributed systems to solve data loss problems are most effective means. Another copy is a copy service, refers to a plurality of nodes to provide the same service, each node has the capability to receive a request from the outside and make corresponding treatment.

Concurrency

In a computer network, the concurrent operation of the process is running very common behavior. For example with multiple nodes of a distributed system may operate concurrently some shared resources, how to accurately and efficiently coordinate distributed concurrent operation has become one of the biggest challenges of a distributed system architecture and design.

The lack of global clock

After the distributed system, two events is difficult to define exactly who should and who, simply because the lack of a distributed system clock sequence control global.

Failure always happens

All computer components of a distributed system, there is any kind of failure may occur. Unless demand indicators permit, can not let any anomalies in the system design.

Problems faced by distributed systems

Abnormal communication

Distributed systems require network communications between each node, so the network will be accompanied by risk communication system or network unavailable unavailable will lead eventually to complete a network of distributed systems can not communicate well. Further, even if the network communications between nodes of a distributed system to function properly, the delay time is also much larger than the stand-alone operation, the process will affect the messages sent and received, and the message is lost and therefore become very common message delay.

Network partition

When the network due to an abnormal situation occurs, leading to network delay between the nodes in the distributed system portion increasing, eventually leading to all the nodes in a distributed system, to ensure normal communication between certain nodes, while other nodes are can not - we will this phenomenon is known as network partitioning, known as "split brain." When the network partition occurs, there will be a distributed system of local small clusters, in extreme cases, these local small clusters will be completed independently distributed throughout the original need to complete the function, which is a very big challenge for distributed consensus proposed class .

Tri-state

Distributed systems each request and response, the presence of unique "tri-state" concept, i.e., success, failure and timeout. When the timeout occurs, the originator of the communication network is unable to determine whether the current request is processed successfully.

Node failure

Node failure is more common in distributed environments Another problem is the server nodes distributed system downtime or appear "dead" phenomenon.

Become a way of architecture

Distributed Transaction

In the stand-alone database, we can easily implement a transaction processing system to meet the ACID properties, but in a distributed database, data is dispersed on different machines, how a distributed transaction processing these data have a very big challenge . But in the field of distributed computing, in order to ensure the reliability of distributed applications, distributed transaction can not be avoided.

Distributed transaction is a transaction participants, support of servers, server and transaction manager resource are located on different nodes of a distributed system. Usually a distributed transaction will involve the operation of a data source or a plurality of business systems. One of the most typical distributed transaction scenarios: a transfer operation involving inter-bank call two remote banking services, one of which is to provide ATM service local banks, and the other is the target deposit services provided by banks, both the service itself They are stateless and are independent of each other, and together form a complete distributed transaction.

For a high-traffic, high concurrency Internet distributed systems, if we expect to achieve a rigorous meet the ACID properties of distributed transactions, the situation is likely to occur is the conflict between usability and consistency of the system of strict - - because when we ask when a distributed system with a strict consistency, it may need to sacrifice system availability. But one thing is beyond doubt, but also the availability of a consumer does not allow us all bargaining system properties, such as Taobao online site so it requires the ability to 7 * 24 hours a day to provide services, and for consistency, it is more for all consumers just need a software system. Therefore, there is never a best of both worlds between availability and consistency, so how to build a balance between availability and consistency of distributed systems become engineers explored numerous problems, there has been a classic distributed systems such as CAP and BASE theory.

CAP theorem

CAP theory tells us that a distributed system can not meet the consistency (C: Consistency), availability (A: Availability) and partitions fault tolerance (P: Partition tolerance) of these three basic needs, can only meet them two.

consistency

In a distributed environment, consistency means that data between multiple copies of whether to maintain the consistency of characteristics. In the consistency of demand, after a system update operation is performed in the state of data consistency, the system should ensure that data remains in a consistent state. In a distributed system, if able to do an update operation for the successful implementation of a data item, all users can read the latest value, then such a system is considered (or strict conformance with strong consistency ).

Availability

Availability refers to the service provided by the system must always be in a usable state, always returns results for each operation request user within a limited time.

Partition fault tolerance

Partition fault tolerance constraints of a distributed system must have the following characteristics: a network of distributed systems encounter any partition fails, they still need to be able to ensure the availability of external provide consistency to meet the service, unless the entire network environment environment has failed . It should be noted that the composition of each node in a distributed system to join and exit can be seen as a special network partitions.

During application of the CAP theorem, we need to abandon one of them, the following table is abandoned CAP theorem in any one of the characteristics of the scene description.

Abandon CAP theorem

P If you want to be able to abandon avoid system partition fault tolerance issue, a simpler approach is to put all the data on a distributed node. Such an approach, while not 100% guarantee system can not go wrong, but at least it will not hit the negative impact brought about due to a network partition. But it should be noted that, to give P also means giving up the class system scalability

A usability give up give up, once the system is experiencing network partition or other fault, then the affected services need to wait for some time, so wait while the system is unable to provide normal services externally, that is not available

In fact abandon C, to give consistency refers to the strong consistency of data to give up, but final data consistency. Such a system can not guarantee real-time data to maintain consistency, but can promise that the final data will reach a consistent state. This introduces the concept of a window of time, how long can reach specific agreement depends on the system design data, including data replication copies the length of time between the different nodes

For a distributed system, partition, fault tolerance can be said to be a basic requirement. Since it is a distributed system, the distributed system components will inevitably need to be deployed to different nodes, otherwise it does not matter distributed systems, and therefore inevitable subnetwork. And for distributed systems, network problems is a abnormal situation will certainly arise, so partitions fault tolerance also called a distributed system must need to face and solve problems. Therefore, system architects often need to spend their energy on how to find a balance between C (consistency) and A (availability) based on business characteristics.

BASE theory

BASE is Basically Available (basic available), three short phrases Soft state (soft state) and Eventually consistent (eventual consistency), it was presented by eBay architect. BASE is the result of the CAP consistency and availability trade-offs, which comes from the summary of the large-scale distributed systems Internet practice is based on CAP theorem gradually evolved, the core idea is that even if unable to do so strong consistency (Strong consistency), but each application according to their operational characteristics, an appropriate way to enable the system to a final consistency (Eventual consistency).

 

Distributed learning together and kittens

Basic Available

The basic means of distributed systems available at the time of unpredictable failures occur, allowing the loss of part availability - but please note that this is not equivalent to the system unusable. Typical examples of "substantially available":

1, in response to loss of time: Normally, an online search engine needs to be returned to the corresponding user query results within 0.5 seconds, but due to a failure, the response time of the query result of increased 1-2 seconds.

2, loss of function: Under normal circumstances, carried on an e-commerce website shopping, consumers almost every order can be completed successfully, but at a time when the pro-shopping season due to the surge in consumer shopping behavior, in order to stability of the protective system of shopping, some consumers may be directed to a page downgrade.

Weak state

Weak state is also referred to as soft state and hard state relative means to allow data system there is an intermediate state, and that there is the intermediate state will not affect the overall availability of the system, allowing a copy of data between different nodes in the system there is no delay in the data synchronization process.

The final consistency

Eventual consistency emphasized that all copies of the data system, after synchronization over time, and ultimately to achieve a consistent state. Therefore, the nature of eventual consistency is the need to ensure that the final system to achieve data consistency, without the need for real-time systems guarantee strong consistency of the data.

Author: codersm

Guess you like

Origin www.cnblogs.com/xiaoshen666/p/11118608.html