[Architecture] Common technical points--architecture design

Guide: It is necessary to collect common architectural technical points as a project manager to understand these knowledge points and solve specific scenarios. Technology needs to serve the business, and the combination of technology and business can bring out the value of technology.

1. High Concurrency

Due to the advent of distributed systems, high concurrency (High Concurrency) usually refers to the design to ensure that the system can process many requests in parallel at the same time . Generally speaking, high concurrency means that at the same point in time, many users access the same API interface or URL address at the same time.

The essence of high concurrency problems is: the limitation of resources, such as: bandwidth, CPU, memory, IO, etc.

It often occurs in business scenarios with a large number of active users and a high concentration of users. High concurrency design must also adhere to the three principles of architecture design: simplicity, appropriateness and evolution. "Premature optimization is the root of all evil." It cannot be divorced from the actual situation of the business, let alone over-design. A suitable solution is the most perfect.


2. High Availability

High Availability HA (High Availability) is one of the factors that must be considered in the design of distributed system architecture. It usually means that a system is specially designed to reduce downtime and maintain high availability of its services .

High availability practice plan:

  1. Peer node failover, both Nginx and the service governance framework support access to another node after one node fails.
  2. Failover of non-peer nodes, through heartbeat detection and implementation of master-standby switchover (such as redis sentinel mode or cluster mode, MySQL master-slave switchover, etc.).
  3. Timeout settings, retry strategies, and idempotent design at the interface level.
  4. Degradation processing: Guarantee core services, sacrifice non-core services, and fuse when necessary; or when there is a problem with the core link, there is an alternative link.
  5. Current limiting processing: Directly reject or return error codes to requests that exceed the system's processing capacity.
  6. The message reliability guarantee of the MQ scenario includes the retry mechanism on the producer side, the persistence on the broker side, and the ack mechanism on the consumer side.
  7. Grayscale release can support small-traffic deployment based on the machine dimension, observe system logs and business indicators, and push the full volume after the operation is stable.
  8. Monitoring and alarming: a comprehensive monitoring system, including the most basic monitoring of CPU, memory, disk, and network, as well as monitoring of Web servers, JVM, databases, various middleware, and business indicators.
  9. Disaster recovery drill: Similar to the current "chaos engineering", some destructive means are carried out on the system to observe whether local failures will cause usability problems.

The high availability of the entire Internet layered system architecture is comprehensively realized through redundancy + automatic failover of each layer, specifically:

  • (1) The high availability from [client layer] to [reverse proxy layer] is achieved through the redundancy of the reverse proxy layer. The common practice is keepalived + virtual IP automatic failover
  • (2) The high availability from the [reverse proxy layer] to the [site layer] is achieved through the redundancy of the site layer. The common practice is the survivability detection and automatic failover between nginx and web-server
  • (3) The high availability from [site layer] to [service layer] is realized through the redundancy of the service layer. A common practice is to ensure automatic failover through service-connection-pool
  • (4) High availability from the [service layer] to the [cache layer] is achieved through the redundancy of cached data. The common practice is to double-read and double-write the cache client, or use the master-slave data synchronization of the cache cluster to keep alive with sentinel and automatic failover; in more business scenarios, there is no high availability requirement for the cache, and the cache service can be used to shield the underlying complexity from the caller
  • (5) The high availability from [service layer] to [database "read"] is achieved through the redundancy of the read library. A common practice is to ensure automatic failover through db-connection-pool
  • (6) The high availability from [service layer] to [database "write"] is achieved through the redundancy of writing database. The common practice is keepalived + virtual IP automatic failover

3. Read and write separation

In order to ensure the stability of database products, many databases have a dual-machine hot backup function. That is, the first database server is a production server that provides external addition, deletion and modification services; the second database server mainly performs read operations.

Synchronize data through master-slave replication, and then improve the concurrent load capacity of the database through read-write separation, which can not only solve the problem of availability, but also solve the problem of database performance.


 4. Cold standby/hot standby

Cold standby: Two servers, one running and one not running as a backup. In this way, once the running server goes down, the backup server will run. The cold backup solution is relatively easy to implement, but the disadvantage of cold backup is that the backup machine will not automatically take over when the host fails, and needs to actively switch services.

Hot standby: It is the so-called active/standby method. Server data including database data are written to two or more servers at the same time. When the active server fails, the standby machine is activated through software diagnosis (usually through heartbeat diagnosis) to ensure that the application can fully resume normal use in a short time. When a server goes down, it will automatically switch to another standby machine for use.


5. Live more in different places

Multi-active in different places generally refers to the establishment of independent data centers in different cities. "Active" is relative to cold backup. Cold backup is to back up the full amount of data. It usually does not support business needs. It will only be used when the main computer room fails. Switching to the backup computer room, and being more active means that these computer rooms also need traffic in daily business to provide business support.


6. Load Balance

Load balancing is a load balancing service that distributes traffic to multiple servers. It can automatically distribute the external service capability of the application among multiple instances, improve the availability of the application system by eliminating single points of failure, and allow you to achieve a higher level of application fault tolerance, thereby seamlessly providing the load required to distribute application traffic Balanced capacity to provide you with efficient, stable and secure services.

When load balancing is running, generally one or more front-end load balancers distribute customer access requests to a group of back-end servers, so as to achieve high performance and high availability of the entire system.

Several ways to achieve:

(1) HTTP redirection load balancing.

        The advantage of this load balancing solution is that it is relatively simple, but the disadvantage is that the browser needs to request the server twice each time to complete one visit, and the performance is poor.
(2) DNS domain name resolution load balancing.

       The advantage of DNS domain name resolution load balancing is that the load balancing work is handed over to DNS, which omits the trouble of network management. The disadvantage is that DNS may cache A records and is not controlled by the website.
(3) Reverse proxy load balancing.

        The advantage is that it is easy to deploy, but the disadvantage is that the reverse proxy server is a transfer station for all requests and responses, and its performance may become a bottleneck.
(4) IP load balancing.

        Advantages: IP load balancing completes data distribution in the kernel process, which has better processing performance than reverse proxy balancing. Disadvantage: The network card bandwidth of load balancing becomes the bottleneck of the system.
(5) Data link layer load balancing.

        Avoiding load balancing server NIC bandwidth from becoming a bottleneck is currently the most widely used load balancing method for large websites.


 7. Separation of static and dynamic

Dynamic and static separation refers to the architecture design method of separating static pages from dynamic pages or static content interfaces from dynamic content interfaces in the web server architecture to improve the access performance and maintainability of the entire service. In addition, search the backend architect of the official account and reply "the structure is neat" in the backstage, and get a surprise gift package.


8. Cluster

The concurrent carrying capacity of a single server is always limited. When the processing capacity of a single server reaches the performance bottleneck, multiple servers are combined to provide services. This combination is called a cluster, and each server in the cluster is called this A "node" of the cluster, each node can provide the same service, thus doubling the concurrent processing capability of the entire system.


9. Distributed

A distributed system is to split a complete system into many independent subsystems according to business functions. Each subsystem is called a "service". The distributed system sorts and distributes requests to different subsystems, allowing different Services to handle different requests. In a distributed system, subsystems operate independently, and they are connected through network communication to realize data intercommunication and composite services.

Distributed system advantages :

  • 1. All nodes in a distributed system are interconnected. So nodes can easily share data with other nodes.
  • 2. More nodes can be easily added to the distributed system, i.e. it can be expanded as needed.
  • 3. The failure of one node will not lead to the failure of the entire distributed system. Other nodes can still communicate with each other.
  • 4. Hardware resources can be shared with multiple nodes instead of being limited to one node.

Disadvantages of distribution:

  • 1. It is difficult to provide sufficient security in a distributed system, because both nodes and connections need to be secure.
  • 2. Some messages and data may be lost in the network while being transferred from one node to another.
  • 3. Connecting to a database in a distributed system is quite complex and unwieldy compared to a single-user system.
  • 4. If all nodes in a distributed system try to send data at the same time, there may be an overload in the network.

10. Elastic expansion

It refers to the dynamic online expansion of the deployed cluster. The elastic expansion system can automatically add more nodes (including storage nodes, computing nodes, and network nodes) according to a certain strategy according to the actual business environment to increase system capacity, improve system performance, or enhance system reliability, or achieve these three goals at the same time .


11. Horizontal expansion/vertical expansion

Horizontal expansion Scale Out spreads the load by adding more servers or program instances, thereby increasing storage capacity and computing capacity. Vertical expansion Scale Up improves the processing capacity of a single machine.

There are two ways to expand vertically:

  • (1) Enhance stand-alone hardware performance, for example: increase the number of CPU cores such as 32 cores, upgrade better network cards such as 10 Gigabit, upgrade better hard drives such as SSD, expand hard drive capacity such as 2T, and expand system memory such as 128G;

  • (2) Improve stand-alone software or architecture performance, for example: use Cache to reduce IO times, use asynchrony to increase single-service throughput, and use lock-free data structures to reduce response time;


11. Parallel expansion

Similar to horizontal scaling. The nodes in the cluster server are all parallel peer nodes. When expansion is required, more nodes can be added to improve the service capability of the cluster. Generally speaking, the key paths in the server (such as login, payment, core business logic, etc. in the server) need to support dynamic parallel expansion at runtime.


12. CAP Theory

The CAP theory refers to that in a distributed system, Consistency (consistency), Availability (availability), and Partition Tolerance (partition tolerance) cannot be established at the same time .

  • Consistency: It requires that at the same point in time, all data backups in the distributed system are the same or are in the same state.

  • Availability: After some nodes of the system cluster go down, the system can still correctly respond to user requests.

  • Partition tolerance: The system is able to tolerate failures in network communication between nodes.

Simply put, in a distributed system, at most the above two attributes can be supported. But obviously since it is distributed, we are bound to partition. Since partitioning, we cannot 100% avoid partition errors. Therefore, we can only make a choice between consistency and usability.

In distributed systems, we often pursue availability, which is more important than consistency. Then how to achieve high availability, there is another theory here, which is the BASE theory, which further expands the CAP theory.


12. BASE theory

BASE theory states:

  • Basically Available

  • Soft state

  • Eventually consistent ( final consistency )

The BASE theory is the result of a trade-off between consistency and availability in CAP. The core idea of ​​the theory is: we cannot achieve strong consistency, but each application can use an appropriate method according to its own business characteristics to make the system achieve to eventual consistency .

Guess you like

Origin blog.csdn.net/weixin_43800786/article/details/130049771