Internet High Availability Architecture Technology Practice (transfer)

Original author: http://www.iteye.com/news/32723

1. What is

High Availability High Availability (HA) is one of the factors that must be considered in the design of distributed system architecture. It usually refers to reducing the time when the system cannot provide services through design .

Assuming that the system has been able to provide services, we say that the availability of the system is 100%. If the system runs for every 100 time units, there will be 1 time unit that cannot provide service, we say that the availability of the system is 99%. The high availability target of many companies is four nines, or 99.99%, which means that the system has an annual downtime of 8.76 hours.

Baidu's search homepage is recognized as an excellent system with high availability guarantees in the industry. Even people will judge "network connectivity" by whether www.baidu.com can be accessed or not. Baidu's high availability services leave people behind "the network is smooth." , Baidu can access", "Baidu can't open, it should be because the network can't connect", this is actually the highest praise for Baidu HA.

Second, how to ensure the high availability of the system

We all know that a single point is the enemy of high system availability, and a single point is often the biggest risk and enemy of high system availability. We should try to avoid single points in the process of system design. Methodologically, the principle of high availability guarantee is "clustering", or "redundancy" : there is only one single point, and the service will be affected if it fails; if there is redundant backup, there are other backups that can be topped up.

To ensure high availability of the system, the core principle of architecture design is: redundancy. With redundancy, it is not enough. Every time a failure occurs, manual intervention is required to restore the system, which will inevitably increase the unserviceable practice of the system. Therefore, high availability of the system is often achieved through "automatic failover" . Next, let's take a look at how to ensure the high availability of the system through redundancy + automatic failover in a typical Internet architecture.

3. Common Internet Layered Architecture


The common Internet distributed architecture is as above, divided into:

  • (1) Client layer: The typical caller is a browser browser or a mobile application APP
  • (2) Reverse proxy layer: system entry, reverse proxy
  • (3) Site application layer: implement core application logic and return html or json
  • (4) Service layer: if the service is realized, there is this layer
  • (5) Data-cache layer: cache accelerates access to storage
  • (6) Data-database layer: database curing data storage

The high availability of the entire system is achieved comprehensively through redundancy + automatic failover at each layer.

Fourth, the practice of layered high-availability architecture
1. High availability of client layer -> reverse proxy layer


The high availability from the client layer to the reverse proxy layer is achieved through the redundancy of the reverse proxy layer. Take nginx as an example: there are two nginx, one provides services online, and the other is redundant to ensure high availability. A common practice is keepalived survival detection, and the same virtual IP provides services.


Automatic failover: when nginx hangs, keepalived can detect it, automatically failover, and automatically migrate traffic to shadow-nginx. Since the same virtual IP is used, this switching process is transparent to the caller .

2. Reverse proxy layer -> high availability at site layer


The high availability from the reverse proxy layer to the site layer is achieved through the redundancy of the site layer. Assuming that the reverse proxy layer is nginx, multiple web backends can be configured in nginx.conf, and nginx can detect the survivability of multiple backends.


Automatic failover: When the web-server hangs, nginx can detect it, and it will automatically failover and automatically migrate the traffic to other web-servers. The whole process is automatically completed by nginx and is transparent to the caller.

3. High availability of site layer -> service layer


The high availability from the site layer to the service layer is achieved through the redundancy of the service layer. "Service Connection Pool" will establish multiple connections with downstream services, and each request will "randomly" select connections to access downstream services.


Automatic failover: When the service hangs up, service-connection-pool can detect it, automatically failover, and automatically migrate traffic to other services. The whole process is automatically completed by the connection pool, which is transparent to the caller. (So ​​the service connection pool in RPC-client is a very important basic component).

4. High Availability of Service Layer > Cache Layer


The high availability from the service layer to the cache layer is achieved through the redundancy of cached data. There are several ways of data redundancy in the cache layer: the first is to use the encapsulation of the client, and the service performs double-read or double-write on the cache.


The cache layer can also solve the high availability problem of the cache layer by supporting the master-slave synchronization cache cluster.

Taking redis as an example, redis naturally supports master-slave synchronization, and redis officially has a sentinel sentinel mechanism for redis survivability detection.


Automatic failover: Sentinel can detect when the redis master hangs, and will notify the caller to access the new redis. The whole process is completed by the cooperation of sentinel and redis cluster, which is transparent to the caller.

After talking about the high availability of the cache, I would like to say one more thing here. The business does not necessarily have "high availability" requirements for the cache. More usage scenarios for the cache are used to "accelerate data access": put a part of the data in the cache. If the cache is down or the cache is not hit, you can go to the back-end database to fetch the data again.

For business scenarios that allow "cache misses", the cache architecture recommendations are:


The kv cache is encapsulated into a service cluster, and a proxy is set upstream (the proxy can ensure high availability in the form of a cluster). The backend of the proxy is divided into several instances according to the key level of cache access, and the access of each instance is not highly available. .


The cache instance is blocked: When a horizontally split instance hangs, the proxy layer directly returns a cache miss. At this time, the cache miss is also transparent to the caller. The number of key horizontal segmentation instances is reduced, and re-hash is not recommended, which may easily lead to inconsistency of cached data.

5. Service layer > high availability of database layer

Most of the Internet technologies, the database layer uses the "master-slave synchronization, read-write separation" architecture, so the high availability of the database layer is divided into "read database high availability" and "write high availability". "Library high availability" two categories.

Service Layer > Database Layer "Read" High Availability


The high availability of reading from the service layer to the database is achieved through the redundancy of the reading library.

Since the read library is redundant, generally speaking, there are at least two slave libraries. The "database connection pool" will establish multiple connections with the read library, and each request will be routed to these read libraries.


Automatic failover: When the read library is hung up, db-connection-pool can detect it, automatically perform failover, and automatically migrate traffic to other read libraries. The whole process is automatically completed by the connection pool. Transparent (so the database connection pool in DAO is an important basic component).

Service Layer > Database Layer "Write" High Availability


The high availability of writing from the service layer to the database is achieved through the redundancy of the writing library.

Taking mysql as an example, you can set up two mysql dual-master synchronizations, one provides services online, and the other is redundant to ensure high availability. A common practice is keepalived survival detection, and the same virtual IP provides services.


Automatic failover: keepalived can detect when the writing library hangs, it will automatically failover, and the traffic will be automatically migrated to shadow-db-master. Since the same virtual IP is used, this switching process is very important to the caller. is transparent.

V. Summary

High Availability HA (High Availability) is one of the factors that must be considered in the design of distributed system architecture. It usually refers to reducing the time when the system cannot provide services through design.

Methodologically, high availability is achieved through redundancy + automatic failover.

The high availability of the entire Internet layered system architecture is comprehensively realized through redundancy + automatic failover of each layer. Specifically:
(1) The high availability from the client layer to the reverse proxy layer is achieved through reverse The redundancy of the proxy layer is implemented, and the common practice is keepalived + virtual IP automatic failover.
(2) The high availability from the reverse proxy layer to the site layer is achieved through redundancy at the site layer. The common practice is survivability detection and automatic failover between nginx and web-server.
(3) The high availability from the site layer to the service layer is achieved through the redundancy of the service layer. The common practice is to ensure automatic failover through service-connection-pool.
(4) The high availability from the service layer to the cache layer is achieved through the redundancy of cached data. The common practice is to double-read and double-write the cache client, or use the master-slave data synchronization and sentinel keep-alive and automatic failover of the cache cluster. ; For more business scenarios, there is no high availability requirement for cache, and cache service can be used to shield the underlying complexity from the caller.
(5) The high availability of "read" from the service layer to the database is achieved through the redundancy of the read library. The common practice is to ensure automatic failover through db-connection-pool.
(6) The high availability of "write" from the service layer to the database is achieved through the redundancy of the write library. The common practice is keepalived + virtual IP automatic failover.

At the end, I hope that the ideas of the article are clear, and I hope that everyone has a systematic understanding of the concept and practice of high availability.

<!--Ad placeholder container-->
 
<!--投放代码--> <script type="text/javascript">// <![CDATA[ (window.cproArray=window.cproArray||[]).push({id:'u2448178'}); // ]]></script><script type="text/javascript" src="http://cpro.baidustatic.com/cpro/ui/c.js"></script>
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326236119&siteId=291194637