Summary of Knowledge System (9) Design Principles, Design Patterns, Distributed, High Performance, High Availability

architecture design

Why design a technical framework

  • Modular function: make the program modular, that is, high internal aggregation and low coupling between modules
  • Improve development efficiency: developers only need to focus on one point (view display, business logic, data processing)
  • Improve test efficiency: In the later test, you can quickly locate the location of the problem based on the error feedback.

Six Design Principles

The six design principles are the theory of design patterns, and the design patterns are the practice of design principles.

1. Single Responsibility Principle

A class is responsible for only one responsibility, the term is only one cause of change. A class should be an encapsulation of a set of highly related functions and data.

Second, the principle of opening and closing

A software entity should be open for extension and closed for modification.
It is recommended that once a class is developed, new functions should be added sequentially. It should not be implemented by modifying this class, but by adding new classes through inheritance or interface implementation.

3. Dependency Inversion Principle

Abstractions should not depend on details, details should depend on abstractions. In other words, program to the interface, not the implementation.

That is to say, the communication between the two modules should be realized through the interface.

insert image description here

4. Principle of Interface Separation

Use multiple dedicated interfaces instead of a single general interface, i.e. a client should not depend on interfaces it does not need. That is to make the interface that the caller depends on as small as possible, and interface separation is similar to the principle of single responsibility.

5. Dimit's law (also known as the principle of minimum knowledge)

A software entity should interact with other entities as little as possible. In other words, a class knows the least about the class it needs to call, and the interior of the class should have nothing to do with the callee, also known as 迪米特隔离.

For example, using the run method under a Thread class, according to the Dimit principle, you can separate the run and build a Runnable interface for userClass to use, so that the interaction between the caller userClass and Thread is the least.
insert image description here

6. Liskov Substitution Principle

All places that refer to the base class (parent class) must be able to use objects of its subclasses transparently. That is, in a software system, if all the places where a certain class is used are replaced with its subclasses, the system should still work normally. This principle relies on object-oriented inheritance and polymorphism.

case interpretation

  1. 单一职责原则Construct a class or interface according to
  2. Build new classes based on 开闭原则inheriting classes or implementing interfaces.
  3. Based on 里式替换the principle, all places where the parent class is used can be replaced by the subclass.
  4. When interacting with other classes, based on 依赖倒置the principle, use interface communication.
  5. When designing an interface, based on 接口隔离the principle, you should design multiple interfaces that specifically implement a certain function
  6. Based on this 迪米特原则, if the teacher class wants to realize the roll call of the students, it should be implemented in layers through a monitor who is used to interact with the students.

insert image description here

Common Design Patterns

Stereotype

singleton pattern

Hungry Singleton: Create a static instance object directly during initialization, which is inherently thread-safe.
Lazy singleton: it is created when it is actually needed, and it needs to use the thread synchronization mechanism. There are three ways to write it as follows:

  1. Synchronous code block: privatize the constructor, staticize the instance members, publicize the static method of obtaining the singleton, if it is detected that the instance is not created, use synchronized to build the synchronous code block, because it is a static method, so use the bytecode of the class (class name.class) object as a synchronized lock object.
class SingleInstance {
    
    
    private SingleInstance() {
    
    
    }
    private static SingleInstance singleInstance;
    public static SingleInstance getInstance() {
    
    
        if (singleInstance == null) {
    
    
            synchronized (SingleInstance.class) {
    
    
                if (singleInstance == null) {
    
    
                    singleInstance = new SingleInstance();
                }
            }
        }
        return singleInstance;
    }
}
  1. Use the synchronization method to directly lock the static method for obtaining the singleton as a whole.
class Single {
    
    
    private Single(){
    
    
        
    }
    private static Single single;
    public static synchronized Single GetInstance() {
    
    
        if (single == null) {
    
    
            single = new Single();
        }
        return single;
    }
}

  1. static inner class + final inner member

The members of the static inner class are only initialized when they are called for the first time, which is very consistent with the way of lazy singletons.

class SingleByInner{
    
    
    private SingleByInner() {
    
    
		
    }
    static class Inner {
    
    
        private static final SingleByInner INSTANCE = new SingleByInner();
    }
    public static SingleByInner getInstance() {
    
    
        return Inner.INSTANCE;
    }
}

class SingleByInner{
    
    
    private SingleByInner() {
    
    
		
    }
    static class Inner {
    
    
        private static final SingleByInner INSTANCE = new SingleByInner();
    }
    public static SingleByInner getInstance() {
    
    
        return Inner.INSTANCE;
    }
}

factory pattern

simple factory

  • The simple factory pattern is proposed to realize that when creating an object, it does not expose internal details to customers, but only provides a common interface for creating objects. (Based on Dimit's Law/Minimum Knowing Principle)
  • The simple factory puts the instantiation operation into a class alone, and this class becomes a simple factory class, which allows the simple factory class to decide which specific subclass to instantiate the application.
  • This can decouple the implementation of client classes and specific subclasses. We often have multiple client classes in business. If a client class needs to know the details of all subclasses, once the subclass changes, all client classes must Revise. When using the factory mode, you only need to modify the factory class, and modify the interface parameters when the client class that needs a subclass is used.

factory method

  • 定义了一个创建对象的接口,但由子类决定要实例化哪个类。工厂方法把实例化操作推迟到了子类
  • In a simple factory, it is the factory class that creates the object, and in a factory method, it is a subclass of the factory class that creates the object.

generator pattern

封装一个对象的构造过程,并允许按步骤构造
insert image description here

Behavioral

Listener (observer) pattern

The listener is used to monitor the events of interest to itself, and perform custom operations when receiving the events of interest to itself.

The listener mode is widely used in Android, I believe everyone will not feel unfamiliar. In Android development, the click event of the Button control is the most common example of the listener mode.

When the Button is clicked, OnClickListener.onClick is executed. The Activity sets the OnClickListener implemented by itself for the Button, and overrides the onClick method to perform custom operations.

mediator pattern

Use an intermediary object to encapsulate a series of object interactions, and the intermediary enables each object to interact with each other without displaying, so as to achieve the purpose of loose coupling.

Consists of the following parts

  • mediator abstract intermediary - used to coordinate the interaction between various colleagues
  • Concrete mediator Concrete mediator role - depends on each colleague class
  • Colleague role (object to be encapsulated)

Each co-worker role is aware of the mediator role, but does not interact with other co-workers, but is scheduled through the mediator.

  • The advantage is to reduce the dependency between classes, and change the original one-to-many dependency into one-to-one dependency. Reduce coupling.
  • The disadvantage is that the intermediary will become very large and the logic is complicated.

Proxy mode

Provides a proxy for other objects to control access to this object. Also called delegation mode.
There are three main roles:

  • Subject abstract subject role. It can be an abstract class or an interface, and it is the most common business type definition without special requirements.
  • RealSubject specific subject role. It is also called the delegated role and the delegated role. Is the executor of specific business.
  • Proxy proxy theme role. Also called delegated class, proxy class. Responsible for the application of real characters, you can add custom operations, such as pre-operations and aftermath processing.

advantage:

  • Clear Responsibilities - The real role is only responsible for implementing the actual business logic, not related to other affairs. Proxies can do more.
  • Highly extensible - as long as the interface is implemented, the implementation of concrete subject roles can be highly changed without changing the proxy.

chain of responsibility model

The Chain of Responsibility pattern is a behavioral design pattern thatIt is used to connect multiple request processor objects into a chain, allowing requests to be continuously passed along the chain until a request processor is processed, realizing efficient and flexible request processing and distribution.

There are three main roles involved:

  • Abstract processor (Handler): defines an interface for processing requests, and maintains a subsequent processor object.
  • Concrete handler (ConcreteHandler): implements the interface for processing requests, and decides whether the request can be processed. If it cannot be processed, the request is forwarded to a subsequent processor.
  • Client: Create a request handler object and add it to the chain of responsibility.

advantage:

  • Reduce the degree of coupling: the Chain of Responsibility mode decouples the sender and receiver of the request. The request will be passed from the beginning of the chain to the end of the chain. During this period, each node only needs to focus on its own processing logic, thus reducing the number of nodes the degree of coupling between them.
  • Enhanced flexibility and good scalability: The chain of responsibility model can dynamically add or delete node objects, change the calling order of node objects in the chain, and flexibly modify or expand the system process.

shortcoming:

  • When the chain is too long, it will reduce system performance and efficiency
  • The request may not be processed. It may reach the end of the chain, but there is still no suitable processing node to process it. In this case, a special processing mechanism is required to handle this situation.
  • Difficulty in debugging: If there is a problem with the call of a certain node in the chain, the requests of the entire chain may not be processed, resulting in the need to check one by one during debugging.

Structural

Adapter (wrapper) pattern

Transforms the interface of a class into another interface that the client lock expects, so that two classes that would not otherwise work together due to interface mismatches can work together.

advantage:

  • Allows unrelated classes to run together
  • Increased class transparency
  • Improved class reusability

distributed theory

CAP

CAP is the combination of the first letters of Consistency (consistency), Availability (availability), and Partition Tolerance (partition fault tolerance).
insert image description here

The CAP theorem (CAP theorem) points out that for a distributed system, when designing read and write operations, only two of the following three points can be satisfied at the same time:

  • C: Consistency: All nodes access the same latest copy of data
  • A: Availability: A non-faulty node returns a reasonable response (not an error or timeout response) within a reasonable amount of time.
  • P: Partition Tolerance (Partition Tolerance): When a distributed system has a network partition, it can still provide external services.

What is a network partition?

In a distributed system, the network between multiple nodes is originally connected, but due to some failures (such as some node network problems), some nodes are not connected, and the entire network is divided into several areas. It's called a network partition.

When most people explain this law, they often simply express it as: "Consistency, availability, and partition tolerance can only achieve two of them at the same time, and it is impossible to achieve them at the same time." In fact, this is a very misleading statement, and 12 years after the CAP theory was born, the father of CAP also rewrote the previous paper in 2012.

When a network partition occurs, if we want to continue the service, then strong consistency and availability can only choose 1 of 2. That is to say, after the network partition, P is the premise , and only after P is determined can there be a choice between C and A. That is to say, partition tolerance (Partition tolerance) must be realized.
In short: in the CAP theory, partition fault tolerance P must be satisfied. On this basis, only availability A or consistency C can be satisfied.

Therefore, it is theoretically impossible for a distributed system to choose a CA architecture, but only a CP or AP architecture. For example, ZooKeeper and HBase are CP architectures, Cassandra and Eureka are AP architectures, and Nacos supports not only CP architecture but also AP architecture.

Why is it impossible to choose the CA architecture?
For example: If there is a "partition" in the system, a certain node in the system is performing write operations. In order to ensure C, the read and write operations of other nodes must be prohibited, which conflicts with A. If in order to ensure A, the read and write operations of other nodes are normal, then there will be a conflict with C.

The key to choosing CP or AP lies in the current business scenario, and there is no conclusion. For example, for scenarios that need to ensure strong consistency, such as banks, they generally choose to guarantee CP.

In addition, one point that needs to be added is: if the network partition is normal (the state that the system is in most of the time), that is to say, when P does not need to be guaranteed, C and A can be guaranteed at the same time.

consensus agreement

two-phase commit

three phase commit

high performance

load balancing

Load balancing refers to distributing user requests to different servers for processing to improve the overall concurrent processing capability and reliability of the system. The load balancing service can be completed by specialized software or hardware. Generally speaking, the performance of the hardware is better, and the price of the software is cheaper.

The most common are Layer 4 and Layer 7 load balancing:
insert image description here

  • The four-layer load balancing works on the fourth layer of the OSI model, which is the transport layer. The main protocol of this layer is TCP/UDP. The load balancer can see the source port address and destination port address in the data packet at this layer. Based on this information, the data packet will be forwarded to the backend real server through a certain load balancing algorithm. In other words, the core of Layer 4 load balancing is the load balancing at the IP+port level, which does not involve specific packet content.
  • Seven-layer load balancing works on the seventh layer of the OSI model, which is the application layer. The main protocol of this layer is HTTP. The load balancing at this layer is more complicated than the four-layer load balancing routing network request. It will read the data part of the message (such as our HTTP part of the message), and then according to the read data content (such as URL, Cookie) to make load balancing decisions. That is to say, the core of a layer-7 load balancer is load balancing at the level of message content (such as URL, Cookie), and a device that performs layer-7 load balancing is usually called a reverse proxy server .

Layer-7 load balancing consumes more performance than layer-4 load balancing. However, it is relatively more flexible and can route network requests more intelligently. For example, you can optimize such as caching, compression, and encryption according to the content of the request.

Simply put, the four-layer load balancing performance is stronger, and the seven-layer load balancing function is stronger! However, for most business scenarios, the performance difference between Layer 4 load balancing and Layer 7 load balancing is basically negligible.

At work, we usually use Nginx to do seven-layer load balancing, and LVS (Linux Virtual Server virtual server, Linux kernel's four-layer load balancing) to do four-layer load balancing.

Common Load Balancing Algorithms

  • Random method: If no weight is configured, all servers have the same probability of being accessed. If the weight is configured, the server with higher weight is more likely to be accessed.
  • Round Robin: The round robin algorithm is suitable for clusters of servers with similar performance, where each server carries the same load. The weighted round-robin algorithm is suitable for clusters with different server performances, and the existence of weights can make request allocation more reasonable.
  • Minimum connection method: When a new request appears, traverse the list of server nodes and select a server with the smallest number of active connections to respond to the current request. The number of active connections can be understood as the number of requests currently being processed. However, this method is also the most complicated to implement, and it is necessary to monitor the number of request connections handled by each server.
  • Consistent Hash method:
    Imagine the hash function output space as a ring domain:

insert image description here
Different object data are mapped to the ring by hash:

insert image description here
Use the same hash map to input the unique identification code of the server to map the machine to this ring: the
insert image description here
object and the machine are in the same hash space, so turn clockwise: m1 is stored in t3, m3 and m4 are stored in t2 , m2 is stored in t1.

insert image description here
When it is necessary to add a machine t4: only need to modify m4->t2 to
m4->t4
data movement only occurs between t2 and t4, and the data on other machines is not affected.
insert image description here
When deleting a machine t1, it is necessary to modify m2->t1 to m2->t3, and data migration only occurs between t1 and t3.
insert image description here

existing problems

  • Problem 1. When the number of machine nodes is small, the data domain is unbalanced.
    When the number of nodes in the cluster is small, the problem of unbalanced distribution of nodes in the hash space may occur. As shown in the figure below, the distribution of nodes A, B, and C in the figure is relatively concentrated, resulting in the inclination of the hash ring. Data 1, 2, 3, 4, and 6 are all stored on node A, only data 5 is stored on node B, and no data is stored on node C. The load on the three machines A, B, and C is extremely unbalanced.
    insert image description here

  • Problem 2 Load imbalance caused by data migration
    In extreme cases, if node A fails, all the data stored on A must be transferred to B. A large amount of data may cause node B to crash, and then nodes A and B will fail. All data is migrated to node C, causing node C to also crash, which causes the entire cluster to go down. This situation is known as the avalanche effect.

Both of these problems can be solved with virtual nodes.

virtual node

Each actual machine (node) is assigned a large number of virtual nodes. The large number of virtual nodes ensures the balance in the hash input ring domain, as long as the mapping between actual nodes and virtual nodes is recorded.
When increasing or decreasing machines, a large number of virtual nodes are also increased or decreased, and the data migration can be completed on the virtual nodes.
insert image description here
At the same time, different numbers of virtual nodes can be allocated according to the size of the actual machine memory (assuming that the load capacity of each virtual node is consistent), which can be used for load management.

Guess you like

Origin blog.csdn.net/baiduwaimai/article/details/132323350