In the era of large traffic, how to plan system traffic to improve reliability

Abstract: This article is mainly an interpretation of "Phoenix Architecture", and describes several ways to plan system traffic.

This article is shared from the HUAWEI CLOUD community " In the Era of Large Traffic, How to Plan System Traffic to Improve Reliability ", author: breakDawn.

Transparent multi-stage shunt system

To plan the system traffic, the following two principles should be paid attention to:

  1. Minimize single-point components as much as possible, or reduce the flow or action to single-point components
  2. Occam's razor principle, make sure to use it when necessary, avoid over-design

1 client cache

That is, for some resources, cache is done on the client side, and the client does not repeat the request.

1.1 Mandatory caching

Similar to the two tags used in the header in the HTTP protocol, both of which are forcibly controlled by the server and based on time

  1. Expires
    The server directly returns the expiration time when the data will not change.
    Disadvantages: limited by the local time of the client, unable to indicate no caching unless the timestamp is forced to be changed, unable to indicate whether it is a private resource (to avoid private resources being cached by other nodes)
  2. The Cache-Control
    request header uses tags such as max-age, private, and no-cache to solve the three shortcomings in Expires.

1.2 Negotiation cache

Negotiating caches needs to consider whether changes actually occur. Negotiation and coercion can coexist, that is, negotiation can be used when coercion fails.
The negotiation cache exists not only in address input and jump, but also in F5 (but if Ctrl+F5 forces a refresh, the cache will be invalidated)

  1. Last-Modified
    tells the client the last modification time of the resource, and the client will also modify this time when it requests again.
    If the server finds that the resource has not changed after that time, it returns 304 Not Modified.
    If there is a change, it returns OK and carries complete resource
  2. ETag
    needs to calculate the hash value of the resource, and the client will also bring its own stored ETag when sending a request. Each time, the hash value of the resource will be compared to see if it is consistent, and if it is not consistent, a new resource will be returned.
    Etag is the most consistent local caching mechanism, but it is also the worst in performance.

2 Transmission channel optimization

Most of this chapter uses the well-known HTTP protocol as the main transmission channel protocol to explain how to optimize

2.1 Optimization of the number of connections

HTTP is based on TCP, and a TCP connection is re-established every time. Therefore, front-end developers have developed many small optimizations to reduce the number of requests, such as Sprite images, segmented documents, and merging Ajax requests.

Why can't the long connection (keep-alive connection multiplexing) in HTTP1.0 solve this problem?
Because there is a blocking problem at the head of the queue, it is essentially based on FIFO multiplexing connections. One request is stuck, and the next nine requests are blocked. However, if the return is supported at the same time, it cannot be processed normally in the case of disordered order

The multiplexing of HTTP2.0 solves this problem :

  • Taking frames as the smallest granularity unit, each frame carries a flow ID to identify which flow it is
  • Clients can easily reassemble HTTP request and response messages in different streams

2.2 Transmission Compression

HTTP has long supported GZip compression to reduce the transmission of large resources

In HTTP1.0, persistent connection and transmission compression cannot be used together, because it is impossible to identify whether the resource has been transmitted after compression.

HTTP1.1 introduces "blocked transfer encoding" to judge the end of resources.

2.3 Use UDP to speed up network transmission

In HTTP/3, hoping to replace the dependence of HTTP on TCP,
Google launched a fast UDP network connection, namely QUIC

  • QUIC is based on UDP, and the reliable transmission capability is realized by itself
  • QUIC is specially designed for mobile device support. The ip address of mobile devices is often switched, and it is not appropriate to use ip as a location. Therefore, a connection identifier is proposed to maintain the connection.
  • For the situation that does not support QUIC, it supports fallback to TCP connection to achieve compatibility

3 Content Delivery Network CDN

CND can solve the delay problem caused by the cross-operator and cross-regional physical distance of the Internet system, and play a role in diverting and reducing the burden of website traffic bandwidth.
It mainly includes the following 4 work parts

3.1 Route Analysis

The user's static resource request to access the CDN is completed through DNS resolution. Even a website may return CDN domain name resolution addresses in various regions. Through your routing configuration, the IP address that matches the region will be automatically selected.

3.2 Content Distribution

There are two ways how to distribute content:

  1. Actively distribute, actively push your own resources through the interface provided by the CND service provider, so you need to write additional code for resource push. Large-scale events such as Double 11 will give priority to actively distributing pre-prepared resources.
  2. Passive back-to-origin is triggered by user access. When no resources are found, the CDN will go to the origin site to request and return. You don’t need to write new relevant codes, as long as the CDN supports back-to-origin to your origin site. Small sites basically use this method.

There are two ways how to update resources:

  1. Passive invalidation after timeout, CDN resources have a validity period, and will be retrieved from the source when the timeout expires
  2. Manual active invalidation, the CDN service provider provides a cache invalidation interface, which actively triggers the invalidation and performs passive back-to-source updates.
    Now it is generally used in combination with 1 and 2, and the two do not conflict

4 load balancing

There are two types of load balancing

  • Four-layer load balancing
    refers to the combination of four-layer and below balancing strategies in the computer seven-layer model,
    that is, data link layer + network layer can be balanced
  • Seven-layer load balancing
    refers to balancing through actual code at the application layer

4.1 Data link layer load balancing (four layers of load balancing)

  • Balance the link layer by replacing the MAC address with the equalizer on the link layer
  • The IP of each load node is the same (same virtual IP)
  • There is no need to go through the equalizer when returning, just return directly (because the target ip and source ip are basically unchanged)

Disadvantages:
Must be in the same subnet, cannot span VLANs, and can only be used as the equalizer closest to the data center

4.2 Network layer load balancing (four layers of load balancing)

There are two ways:

IP tunnel mode

The equalizer packs a new header on the outside of the IP packet, and the header specifies the actual ip of the target machine or the ip of the small network. The receiving machine must support header decompression, and also requires that the returned virtual ip is consistent, and returns directly without going through an equalizer.
shortcoming:

  1. All servers used must support tunnel unpacking capability (Linux systems now support it)
  2. Virtual ip still has great limitations and requires manual intervention to manage many machines

NAT mode

In the NAT mode, the real ip conversion is carried out, and it is also returned to the NAT for ip conversion when returning, so that only manual management is required for the NAT.
The disadvantage is that NAT is easy to succeed and performance bottlenecks

SNAT will modify the source IP to the NAT ip, which can be truly transparent to the business, but the price is that if the source IP needs to be restricted, there will be problems, because all the source ips are the same.

4.3 Application Layer Load Balancing (Layer 7 Load Balancing)

It is also called a layer-7 proxy (application layer proxy), because this load balancing belongs to a reverse proxy (that is, the proxy deployed on the server side is not aware of the client)

It is not suitable for traffic applications such as download stations and video stations
. If the bottleneck lies in service computing capabilities, you can consider doing application layer balancing
.

Other functions of the seven-layer proxy besides load balancing:

  • Supports caching capabilities similar to CDN
  • Implement intelligent routing and provide special services based on URL or specific users
  • Resist security tools and filter attack packets in advance
  • link management

4.4 Load Balancing Strategy

  • Round robin balance
    Allocation in turn, from 1 to N to 1
    is applicable to all server hardware configurations are exactly the same, service requests need to be relatively balanced
  • Weight polling
    The number of polls in the cycle is allocated according to the weight of the server
  • Stochastic equilibrium
    is suitable for a relatively balanced distribution with a sufficiently large amount of data
  • Weight random balance
    Increase the random rate with high weight
  • Consistent hash balance
    is suitable for servers that may often drop or join, and can avoid the situation where all hash keys are updated
  • Response speed balance
    Regularly detect the response speed of each server and assign weights according to the speed
  • Balance the least number of connections
    Assign weights according to the number of connections, suitable for long-term processing services such as FTP, etc.
  • Software equalizers include operating system kernel-based LVS, application-based Nginx, KeepAlive, HAProxy
  • Hardware balancers include hardware load balancing products provided by companies such as F5 and A10

5 Server cache

Reasons for introducing caching:

  • Slow down the CPU calculation pressure
  • Cache IO pressure
    These two mitigations are only pressure relief at peak times. If the normal response is very slow, then even using the cache is meaningless.

5.1 Several properties of the cache

The cache needs to be selected. When selecting the type, you need to choose the matching cache according to the actual scene.

throughput

The ConcurrentHashMap improved by JDK8 is the cache container with the highest throughput in concurrent scenarios, but its capabilities other than throughput are very weak.

Cache status update ideas:

  • GuavaCache: Synchronous processing mechanism, updated together when accessing data, segmented locking to reduce competition
  • Caffeine: an asynchronous log submission mechanism, refer to the database log, and there is a ring buffer that tolerates lossy state changes. The read performance is very fast, and it uses more reads and less writes.

Hit Rate and Elimination Strategies

Three basic elimination options:

  • FIFO: First in, first out, simple to implement, but the cache hit rate for high-frequency access is low, the more often it is used, the more likely it will enter the queue first
  • LRU: Prioritize elimination of the oldest unaccessed data based on time, using HashMap+linked list List, but each cache must record time, and may eliminate data that has not been accessed in a short period of time and has high value
  • LFU: Eliminate the least frequent use first, based on the number of uses, which can solve the shortcomings of LRU.
    Own disadvantages:
  1. Each cache specially maintains a counter for the number of times to be updated. The maintenance cost is high and there is a problem of locking (the update time of LRU does not need to consider locking, just cover the latest one directly)
  2. If a certain cache has a high access level in a certain period, which is an order of magnitude higher than other caches, and will not be used later, it is very difficult to eliminate

In order to solve the above two shortcomings, there are two new strategies:

  • TinyLFU: Solve the overhead of modifying counters, use Sketch to analyze and access data, use a small amount of data to estimate the characteristics of all data, and use sliding time windows, heat decay, etc.
  • W-Tinfy-LFU: Combining the characteristics of LRU+LFU, considering heat and time.

distributed capability

Distributed cache introduces JbossCache, a replicated cache, and Memcached, a centralized cache.

The disadvantage of jbosscache is that the writing performance is too poor. It is easy because the network synchronization speed cannot keep up with the writing speed, resulting in the accumulation of too many pending objects in the memory and causing OMM

memcached is implemented in C language. The advantage lies in the high read and write performance. The disadvantage is that the data structure is too compact and it relies heavily on serialization for cross-language transmission. If one of the 100 fields is updated, all 100 fields must be sent. go out to update

Redis basically defeated various distributed caches and became the first choice.

For distributed caches such as redis, they will not pursue consistency C.
If consistency C is necessary, then distributed coordination frameworks such as zk or etcd should be used (but they are generally not used as caches, because throughput under high concurrency Quantity is too low, no availability)

In-process caching and distributed caching are usually used in combination, but data inconsistency between the two is prone to occur, and the write maintenance strategy makes the cache opaque to developers.
One setting principle is that the data in the distributed cache is the main change, and the data in the in-process cache is the priority for access.
The general approach is to push notifications in the distributed cache when data changes to invalidate the first-level cache.
When accessing the cache, a packaged first-level and second-level joint query interface is provided, so that developers are not aware of the first-level and second-level cache.

5.2 Caching risk

cache penetration

A large number of non-existent caches come in
, either to support null values ​​​​for non-existent data caches
, or to introduce Bloom filters

cache breakdown

At the same time, there are many requests at the same time, accessing data in the database but not in the cache, at this time may directly penetrate the database (the cache is delayed in effect) and can use locks and
queues to complete synchronization
. For hot caches, pre-process or configure in advance Strategy

 

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

{{o.name}}
{{m.name}}

おすすめ

転載: my.oschina.net/u/4526289/blog/8696158