Detailed explanation of "high availability" of Internet architecture

1. What is high availability?

High availability HA (High Availability) is one of the factors that must be considered in the design of distributed system architecture. It usually refers to reducing the time when the system cannot provide services through design.

Assuming that the system can always provide services, we say that the system availability is 100%.

If the system is unable to provide services for every 100 time units it runs, we say that the system's availability is 99%.

The high availability goal of many companies is four nines, which is 99.99%, which means that the annual system downtime is 8.76 hours.

Baidu's search homepage is recognized in the industry as a system with excellent high availability guarantees. People even  judge "network connectivity" by whether www.baidu.com can be accessed. Baidu's high availability service makes people leave "the network is smooth". , Baidu can access it" and "Baidu cannot be opened, it should be because the network cannot be connected", this is actually the highest praise for Baidu HA.

2. How to ensure high availability of the system

We all know that single points are the enemy of system high availability. Single points are often the biggest risk and enemy of system high availability. We should try to avoid single points in the system design process. Methodologically, the principle of high availability guarantee is "clustering", or "redundancy": there is only one single point, and the service will be affected if it goes down; if there is a redundant backup, there will be other backups that can take over if it goes down.

To ensure high system availability, the core principle of architecture design is: redundancy.

Having redundancy is not enough. Every time a fault occurs, manual intervention is required to restore it, which will inevitably increase the unserviceability of the system. Therefore, high availability of the system is often achieved through " automatic failover ".

Next, let’s look at how to ensure the high availability of the system through redundancy + automatic failover in a typical Internet architecture .

3. Common Internet layered architecture

 

Common Internet distributed architectures are as above, divided into:

(1) Client layer : The typical caller is a browser or mobile application APP

(2) Reverse proxy layer : system entrance, reverse proxy

(3) Site application layer : implement core application logic and return html or json

(4) Service layer : If servitization is realized, there will be this layer

(5) Data-cache layer : cache accelerates access to storage

(6) Data-database layer : database solidified data storage

The high availability of the entire system is comprehensively achieved through redundancy + automatic failover at each layer.

4. Layered High Availability Architecture Practice

4.1 High availability of [Client layer->Reverse proxy layer]


The high availability from [Client Layer] to [Reverse Proxy Layer] is achieved through the redundancy of the reverse proxy layer. Take nginx as an example: there are two nginx, one provides services online, and the other is redundant to ensure high availability. A common practice is to keepalived survival detection, and the same virtual IP provides services.

Automatic failover : When nginx hangs up, keepalived can detect it, automatically perform failover, and automatically migrate traffic to shadow-nginx. Since the same virtual IP is used, this switching process is transparent to the caller. .

4.2 High availability of [reverse proxy layer -> site layer]

The high availability from [reverse proxy layer] to [site layer] is achieved through redundancy at the site layer. Assuming that the reverse proxy layer is nginx, multiple web backends can be configured in nginx.conf, and nginx can detect the viability of multiple backends.

Automatic failover : When the web-server hangs, nginx can detect it, automatically perform failover, and automatically migrate the traffic to other web-servers. The entire process is automatically completed by nginx and is transparent to the caller.

4.3 High availability of [site layer -> service layer]

High availability  from [site layer] to [service layer] is achieved through redundancy in the service layer. The "service connection pool" will establish multiple connections with downstream services, and each request will "randomly" select a connection to access the downstream service.

Automatic failover : When the service hangs up, service-connection-pool can detect it, automatically perform failover, and automatically migrate traffic to other services. The entire process is automatically completed by the connection pool and is transparent to the caller. (So ​​the service connection pool in RPC-client is a very important basic component).

4.4 High availability of [Service Layer>Cache Layer]

The high availability from [service layer] to [cache layer] is achieved through the redundancy of cached data.

There are several ways to implement data redundancy in the cache layer: the first is to use client encapsulation and service to double read or double write the cache.

The cache layer can also solve the high availability problem of the cache layer through a cache cluster that supports master-slave synchronization .

Take redis as an example. Redis naturally supports master-slave synchronization. Redis officially also has a sentinel mechanism to do redis survival detection.

After talking about the high availability of cache, I want to say one more thing here. The business does not necessarily have "high availability" requirements for cache. More usage scenarios for cache are to "accelerate data access": putting part of the data in the cache. Here, if the cache hangs or the cache does not hit, you can go to the back-end database to retrieve the data.

For this type of business scenario that allows "cache miss", the recommendations for the cache architecture are:

Encapsulate the kv cache into a service cluster, and set up a proxy upstream (the proxy can use cluster redundancy to ensure high availability). The backend of the proxy is horizontally divided into several instances according to the key accessed by the cache. Access to each instance is not done. High availability.
 


Cache instance hangs up and shielded : When a horizontally split instance hangs up, the proxy layer directly returns a cache miss. At this time, the cache hangup is also transparent to the caller. Key horizontal sharding instances are reduced, and re-hash is not recommended, as this can easily cause cached data inconsistencies.

4.5 High availability of [service layer>database layer]

In most Internet technologies, the database layer uses a "master-slave synchronization, read-write separation" architecture, so the high availability of the database layer is divided into two categories: "read database high availability" and "write database high availability".

High availability of [Service Layer>Database Layer "Read"]

The high availability from [service layer] to [database reading] is achieved through the redundancy of the reading database.

Since the reading database is redundant, generally speaking, there are at least 2 slave databases. The "database connection pool" will establish multiple connections to the reading database, and each request will be routed to these reading databases.


Automatic failover : When the reading library hangs, db-connection-pool can detect it, automatically perform failover, and automatically migrate the traffic to other reading libraries. The entire process is automatically completed by the connection pool, and the caller is Transparent (so the database connection pool in DAO is a very important basic component).

[Service layer>Database layer "write"] high availability


The high availability from [service layer] to [database writing] is achieved through the redundancy of the writing database.

Taking mysql as an example, you can set up two mysql dual-master synchronization, one to provide services online, and the other to provide redundancy to ensure high availability. A common practice is to keepalived survival detection, and the same virtual IP provides services.

Automatic failover : When the writing library hangs, keepalived can detect it, automatically perform failover, and automatically migrate the traffic to shadow-db-master. Since the same virtual IP is used, this switching process is very harmful to the caller. Be transparent.

Guess you like

Origin blog.csdn.net/m0_68949064/article/details/128946795