Large-scale distributed systems field, with Daniel Ali understand you implement a distributed system

Large-scale distributed systems field, with Daniel Ali understand you implement a distributed system

Distributed Systems

CORBA distributed system from the original to the EJB, Web and the SOA, from the cluster to the present NoSQL Hadoop cloud data and the like a large distributed system, the horizontal lateral extension Scala out / in characteristic is a distributed system design, the reliability of fault tolerance are the two quality indicators.

What is a distributed system?

  1. A collection of a large number of servers, the user is still a coherent overall system.
  2. A. Tanenbaum defined: cooperative operation between computers in the distributed network components communicate through a message.
  3. G. Coulouris Definition: When you know there is a computer crash, but you never stop running the software.
  4. Leslie Lamport definition: a distributed system is a system: designed to support the development of applications and services, you can use the physical architecture of the processing element consists of multiple autonomous, not shared main memory, but the message is sent asynchronously cooperation through the network.
  5. Application hierarchical difference: layered applications (e.g., layer 3) is a division of application logic, a layered logic, rather than physical, and DS is a physical hierarchical distributed system, and related to actual deployment.

Compared with the traditional centralized system:

Is a centralized system Scale out / in, the longitudinal expansion or upgrade server upwardly into the mainframe, or polynuclear upgrade, increasing the number of CPU cores, the centralized vertical extension adapted to calculate a relatively high degree of polymerisation data, and for distributed computing loose data, unstructured or semi-structured data. Either telescopic extension program to take, need to be based on business data features.

Any distributed system always needs to accomplish two tasks: computing and storage. Isolated computing and storage is an important feature of the distributed system. Usually in a centralized or stand-alone systems, both of which may be combined, such as by a sorting query to achieve the SQL statement is a query to obtain data from the memory, the ranking is calculated belongs, so that the SQL statement is actually computing and a storage coupled together. In the case of large data or big deal with concurrency, this convenient bundle a performance problem, and although distributed computing and distributed storage to bring complexity, but also opens up space for the increased expansion of the processing capacity of the system.

Distributed System features:

  1. Concurrency: shared resources, or take ACID Base principles, see: CAP theorem.
  2. Distributed system design follows the CAP theorem, CAP is: Consistency (consistency), Availability (availability), and Partition tolerance (partitions fault tolerance) referred reliability, CAP theorem states that, CAP the three, only two of which meet species.
  3. Scalability is an important feature Scalable, high performance and high throughput can be obtained by expanding the low latency Latency.
  4. Reliability / availability: fault detection and recovery processing and fault tolerance. Presence in a time scale of a functioning system. If a user can not access the system increases in proportion, it is considered unusable. Availability formula:
  5. Availability = uptime / (uptime + downtime)
  6. Refers to a system fault tolerant failover in the event of an error, everything is still operating normally. It indicates that the system is tolerant of error.
  7. Messaging: Specific products: RabbitMQ ZeroMQ Netty and so on.
  8. Heterogeneity: different operating systems hardware programming language developers, middleware is a solution.
  9. Safety: Authorized Certification SSO single sign-on Oauth and so on.

Positioning commands:

  1. URLs identify resources
  2. Naming Service Naming services
  3. Positioning looking Lookup
  4. See main services in an SOA lookup. Zookeeper realized as service discovery.

Transparency:

  1. Access transparency: the same operating local and remote resources
  2. Location transparency: to access resources without knowing their physical or network location
  3. Concurrent Transparency: a plurality of processes can be run simultaneously accessed using shared resources, when clogging can not interfere with the process flow thereof
  4. Replication transparency: multiple instances of resources may be used to improve reliability and performance copied, but need not be specially prepared by the user's application program.
  5. Fault Transparency: when software and hardware failure, users and applications can continue to complete their task will not be affected.
  6. Mobile Transparency: allow the existence of mobile resources and clients in the system.
  7. Performance Transparency: allows the system to be reconfigured to improve performance load variations
  8. Zoom transparency: the ability to extend or stretch the system in the case of the application structure does not change in size, to increase throughput processing capabilities.

Challenges of Distributed Systems

Distributed systems are difficult to understand, design, build, and management of, they need more than a single machine exponentially variable into the design so that the root causes of application problems more difficult to find . SLA (Service Level Agreement) is a measure of downtime and / or performance degradation of standards, most modern applications have a desired elasticity SLA level, usually by number "9" is increased (eg, 99.9 or 99.99% availability per month). Each additional 9 is becoming increasingly difficult to achieve.

To make things even more complicated is that we see more and more common: fault performance distributed systems for intermittent errors or performance degradation (commonly known as brownouts) . The failure mode takes more time to diagnose. For example, Joyent some distributed operating system as part of its cloud computing infrastructure. In such a system, including high availability, distributed key / value storage, Joyent recently experienced a transient application timeouts. For most users, the system operating normally, the reaction is also within the delay range of SLA. However, there are 5 percent - 10 requests exceeds a predefined timeout. Such failures are not reproduced in development or test environment, they often "disappear" from minutes to hours. This is the fundamental rule out the failure of the data storage system requires a lot of analysis.

These systems include: a data storage API (. Node js), RDBMS ( relational database management system), and is used internally by the system (the PostgreSQL) and end-user applications and operating systems rely on the key / value system . In the end, lead to excessive fundamental question is locked in application semantics, but requires considerable data collection and related work before deciding, including engineers spend a lot of time and work to learn professional knowledge in different fields.

Distributed system consisting of two physical factors limit:

  • Number of nodes (and can be increased computing capacity required to store)
  • The distance between nodes (the distance information is transmitted, preferably in the speed of light)

This situation leads to two constraints worthy challenge the following occur:

  • As the number of independent nodes increased to increase the probability of failure occurring (reduced availability and management costs)
  • As the number of independent nodes may increase the consumption of communication between nodes (the size increases with reduced performance)
  • Increased to improve the communication delay geographical distance between the remote node (reduction performance of certain operations)

How to structure a distributed system

The most common term for distributed system architecture is SOA (Service Oriented Architecture). SOA can avoid unpleasant CORBA (Common Object Request Broker Architecture), by WS - * standards, a set of loosely coupled Web services to complete the small independent function, and independent of each other, they are the basis for a flexible distributed systems. The contrast generation, a new service processes, they are right level of abstraction system of discrete functions.

The first step in building a service-oriented architecture is to determine how each function function integral business objectives, to map these services to the discrete service, and has a separate fault boundary, scalability and data load. Determining for each service, you must consider the following:

  • Geography . The system is global or regional run separately?
  • Data isolation . The system provides a single or multi-tenant model?
  • SLAs . Throughput delay the availability of consistency and redundancy must be defined.
  • Security . IAAA (identity identity, verify authentication, authorization authorization, and auditing audit), confidentiality and privacy of data must be considered
  • Availability tracking Learn to use the system every day is the daily operation of the system, such as capacity planning. It may also be used to perform the use and / or management accounting systems (quota / speed limit).
  • Deployment and configuration management system is how to deploy the update?

Abstract model of distributed systems

  • System model (asynchronous / synchronous)
  • Failure mode (Crashes, partitions)
  • Consistency model (strong, final)

In general, we are most familiar mode (for example, a shared memory abstraction on a distributed system) is too expensive. A distributed system, the more vulnerable the more elements which can ensure greater freedom of movement, so full of potential for even greater performance - but it can also lead to difficult to manage. That we need to have great wisdom, can not sacrifice performance in exchange for the convenience of management. Therefore, a distributed system tried to thinking as a single, unified system will hinder the expansion of distributed systems.

Distributed system follows the laws of CAP, high consistency between high availability and fault tolerance pick two partitions:

Large-scale distributed systems field, with Daniel Ali understand you implement a distributed system

  • CA (consistency high consistency + availability high availability) using a two-stage transaction commits 2pc guaranteed. The disadvantage can not be achieved partitions fault tolerance, once an operation fails, the entire system is wrong and can not be tolerated (no clean water to fish).
  • CP (consistency high consistency + partition tolerance partition fault tolerance). Paxos used to ensure the availability decreases.
  • AP (availability partition + partition tolerance for high availability fault tolerance). The final consistency achieved using Gossip like, as Dynamo.
  • How to understand the CAP theory?

Distributed systems design skills: partitioning and replication

For a design data set in two ways:

  1. Partitions: It can be divided in a plurality of nodes, to allow more parallel processing. Better performance, but low fault tolerance.
  2. Copy: It can also be copied or cached on different nodes, to reduce the distance between the client and the server, greater fault tolerance, but replication consumption performance. The key is consistency between the data replication. Weak consistency provides lower latency and higher availability.

Distributed systems design skills: Clock and order

Distributed system policy for computing and storage are different, mainly for data storage partitioning and replication, and for the calculation of the order to ensure the event is mainly because the distributed computing task is event-driven, such as Storm, and so on. Then the sequence of events represents the order of business logic, events sometimes nest tree event, a tree must ensure that reliability is the set of all events have been executed website is a transaction atoms

This article ends, like a friend little bit of attention and praise, thanks! ! !

Guess you like

Origin blog.csdn.net/qwe123147369/article/details/92185647
Recommended