High Availability Design

Load Balancing and Reverse Proxy

The external network DNS should be used to implement traffic scheduling with GSLB (Global Load Balancing), such as assigning users to the server closest to him to improve the experience.

Intranet DNS can achieve simple round-robin load balancing. Consider choosing Nginx.

In order to improve the overall throughput, an access layer will be introduced between DNS and Nginx. For example, LVS (software load balancer) and F5 (hardware load balancer) can be used for four-layer load balancing.

What is Layer 7 Load Balancing or Layer 4 Load Balancing?

nginx usage points

  • upstream configuration
  • load balancing algorithm
  • retry on failure
  • health examination
  • demo

isolation

thread isolation

Mainly refers to thread pool isolation,

process isolation

Business splitting, splitting the system into multiple subsystems to achieve physical isolation. Through process isolation, problems in one subsystem will not affect other subsystems.

Cluster isolation

Service grouping. Instances in different groups correspond to different services, so that when a problem occurs in a certain group, it will not affect other groups.

Computer room isolation

With the requirements for system availability, multi-computer room deployment will be carried out. The services of each computer room have their own service groups. The services of this computer room should only call the services of this computer room, and not to call across computer rooms.

Read and write isolation

The read and write clusters are separated by the master-slave mode, and the read service only reads data from the slave machine.

Fast and slow isolation

Static and dynamic isolation

Static resources are isolated from dynamic resources. For example, static resources such as js/css are loaded into cdn.

reptile isolation

At the load balancing level, the crawler (or malicious IP) is routed to a separate cluster, using openresty, using the IP+cookie method, and planting a unique cookie identifying the user's identity in the user's browser. Plant a cookie before accessing the service, and verify the cookie when accessing the server. If there is no or incorrect cookie, you can consider diverting it to a fixed group, or you can access it after being prompted to enter a verification code.

Hotspot isolation

  • 1. Hotspots that can be predicted in advance, such as spikes, snap-ups, etc., are made into independent systems or services for isolation to ensure that the main process is not affected.
  • 2. Emergencies, hot spots use the cache + queue mode to cut peaks and fill valleys, and the persistence of message queues.

Resource isolation

Environmental isolation

Test environment, pre-release environment/grayscale environment, formal environment

Pressure test isolation

Real data, pressure measurement data isolation

AB test

cache isolation

query isolation

system protection

cache

Improving system access speed and increasing system processing capacity can be described as a silver bullet for increasing concurrent traffic.

downgrade

When there is a problem with the service or the performance of a very new process is affected, it needs to be temporarily blocked and opened after the peak has passed or the problem has been solved.

Limiting

The system is protected by rate limiting concurrent access/requests or requests within a time window. Once the rate limit is reached, service can be denied, queued or waited, and degraded.

Common current limiting strategies:

  • 1. Limit the total number of concurrency (such as database connection pool, thread pool)
  • 2. Limit the number of instantaneous concurrent connections (limit_conn module of Nginx is used to limit the number of instantaneous concurrent connections)
  • 3. Limit the average rate within the time window (limit_req module of Nginx)
  • 4. Limit the call rate of the remote interface-5. Limit the consumption rate of MQ
  • 6. Limit current according to the number of network connections, network traffic, cpu or memory load, etc.

Implementation of current limiting

Current limiting algorithm

Token bucket, leaky bucket

Application-level current limiting

  • Limit the total number of concurrent/connections/requests
  • The total number of current limiting resources
  • Limit the total number of concurrent/requests of an interface
  • Limit the number of time window requests for an interface
  • Smoothly limit the number of requests for an interface

Distributed current limiting

  • Redis+Lua: Realize the limit of the number of requests for an interface in a certain time window. After this function is implemented, it can be transformed to limit the total number of concurrent/requests and limit the number of resources.
  • Nginx+Lua

Access layer current limiting

The access layer usually refers to the entrance of request traffic. The main purposes of this layer are: load balancing, illegal request filtering, request aggregation, caching, downgrade, current limiting, A/B testing, service quality monitoring, etc.

Nginx's own module

  • ngx_http_limit_conn_module
  • ngx_http_limit_req_module

The lua current limiting module provided by OpenResty

  • lua_resty_limit_traffic

throttling

downgrade

downgrade plan

  • General: For example, some services occasionally time out due to network jitter or the service is going online, and can be automatically degraded.
  • Warning: Some services have fluctuating success rates over a period of time and can be downgraded automatically or manually.
  • Errors: For example: the availability rate is lower than 90%, or the database connection pool is used up, or the access volume suddenly increases to the maximum threshold that the system can withstand, it can be automatically downgraded or manually downgraded according to the situation.
  • Critical error: Manual downgrade required.

Timeout and retry mechanism

rollback mechanism

Stress testing and planning

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324532162&siteId=291194637