【Distributed】Basic Theory of Distributed

1. CAP

  • Consistency : All nodes have the same data at the same time
  • Availability : Ensure that each request has a response regardless of success or failure, that is, the service is always available within the normal response time
  • Partition tolerance : When a distributed system encounters a node or network partition failure, it can still provide external services that meet consistency or availability

Distributed systems satisfy at most two of consistency, availability, and partition fault tolerance.
Assuming that two distributed service networks are disconnected, only old data can be returned to ensure availability, and consistency can only be blocked waiting for network recovery.

consistency

From the client's point of view, consistency mainly refers to the issue of how to obtain updated data during multiple concurrent accesses.
From the perspective of the server, it is how the updates are distributed to the entire system to ensure that the data is eventually consistent.
Strong consistency: For relational databases, it is required that the updated data can be seen by subsequent accesses.
Weak consistency: can tolerate subsequent partial or full access to the updated data
. Eventual consistency: requires access to the updated data after a period of time.

Two, BASE

The BASE theory is the result of a trade-off between the consistency and availability of CAP. The core idea is that even if strong consistency cannot be achieved, the application can use a suitable method to achieve final consistency . Basicly
Available : Distributed systems are emerging In the event of a failure, partial availability is allowed to be lost, that is, the core is guaranteed to be available. For example, some users are directed to the downgrade page, and only downgrade services are provided.
Soft State : Allows the system to have an intermediate state that does not affect the overall availability of the system. Distributed storage generally has three copies of a piece of data, and the delay in allowing copy synchronization between different nodes is the embodiment of soft state.
Eventual consistency : Eventual consistency refers to the state in which all data copies in the system can finally reach a consistent state after a certain period of time.

Differences and connections between ACID and BASE

ACID is a traditional database design concept that pursues a strong one model. BASE supports large-scale distributed systems and achieves high availability by sacrificing strong consistency.
ACID and BASE represent two opposite design philosophies. In a specific distributed system design scenario, the system primary key has different requirements for consistency, so ACID and BASE will be used in combination with each other.

3. Load Balancing Algorithm

  1. Round-robin method : Allocate requests to the back-end servers in turn in turn, and treat each server on the back-end in a balanced manner, regardless of the actual number of connections to the server and the current system load
  2. Random method : Through the random algorithm of the system, one of the servers is randomly selected for access according to the list size value of the back-end server. The actual effect is close to the polling method in the case of a large number of visits.
  3. Source address hashing : According to the IP address obtained from the client, a value is calculated through a hash function, and the value is used to modulo the size of the server list. When the source address hashing method is used for clients with the same IP address, when the service list does not change, they will be mapped to the same backend server for access each time.
  4. Weighted round-robin method : Different back-end servers may have different machine configurations and current system loads, and therefore have different compressive capabilities. Configure higher weights for machines with high configuration and low load to process more requests. Weighted round robin can distribute requests to backends in order and weight
  5. Weighted random method : same as weighted polling method, except that it is not sequential, but random
  6. Minimum connection number method : The minimum connection number algorithm is more flexible and intelligent. According to the current connection situation of the backend service, the server with the least backlog connection is dynamically selected to process the current request. Improve the efficiency of backend utilization as much as possible, and distribute the load to each server reasonably.

4. Current limiting algorithm

There are three common current limiting algorithms: counter algorithm, leaky bucket algorithm, and token bucket algorithm.

1. Counter (fixed window)

During a period of time, the maximum number of processing requests is fixed, and the excess is not processed. For example, specifying the size of the thread pool, the size of the database connection pool, and the number of nginx connections all belong to the implementation of the counter algorithm
algorithm : set a counter counter, +1 each time a request comes, and deny access if it is greater than the set value. If the interval between the request and the previous one is more than one minute, the counter is reset.

Critical problem : Suppose the user sends 100 requests instantaneously at 0:59, and then sends another 100 requests instantaneously at 1:00. So in fact, the user instantly sent 200 requests within 1 second. May overwhelm the service.

2. Sliding window

The sliding window divides the time period into N small periods, records the number of visits in each small period, and deletes the expired small periods according to the time sliding.
The smaller the period, the smoother the sliding window and the more accurate the current limiting statistics.

3. Leaky bucket

The water enters the leaky bucket from the water inlet, and the leaky bucket flows out at a certain speed. When the water inflow rate is too large, the total water volume in the bucket is greater than the bucket capacity and it will overflow directly, and the request will be rejected.
insert image description here
Algorithms are often used to limit traffic on external interfaces, protecting other people's systems, such as banking interfaces.
Disadvantages of the algorithm : the instantaneous traffic is large, and most of them will be discarded. The export speed is fixed, and it cannot flexibly respond to the improvement of back-end capabilities. For example, dynamic expansion, the back-end traffic is increased from 1000QPS to 1w QPS. Leaky bucket no way

4. Token Bucket

Tokens are generated at a set rate and put into the bucket. Users have to apply for tokens every time they request. If the tokens are insufficient, the request is rejected.
The number of tokens is strongly related to the issuance speed. If the issuance speed of the tokens is faster than the application speed, the token bucket will be filled with tokens until the tokens fill the entire token bucket:
insert image description here
you can get a lot in the face of instantaneous large traffic . Tokens, algorithms are usually used to limit the traffic that is accessed and protect the system.
The advantages are: it can easily cope with back-end expansion, and by changing the token issuance speed, the export burst traffic can be processed.

5. Distributed transaction solutions

Two-phase commit (2PC): MySQL transactions are committed in two phases through the logging system. A two-phase protocol with a transaction manager coordinating multiple resource managers.

Three-phase submission (3PC): One more pre-preparation, first ask if it is OK, execute the operation, and then submit.

6. Distributed Session Solution

  1. Session replication : save a copy for each server, wasting server resources
  2. Stored in the client : The user saves the session information in a cookie, and each HTTP request is accompanied by a cookie. The cookie has a maximum limit and has security risks.
  3. Hash consistency : load the same ip to the same server, the disadvantage is that the horizontal expansion is re-hash, the server flash session will be cleared, and you need to log in again
  4. Unified storage : unified storage in Redis and DB, adding a network call.

7. The difference between Session and Token

When the user accesses the server with the username and password through the browser for the first time, the server will verify the user data. After the verification is successful, the session data will be written on the server side and the sessionId will be returned to the client. The browser will save the sessionId in the cookie. When the user accesses the browser again, the sessionId will be carried, and the server will use the sessionId to obtain the session data and query the user information, so as to maintain the login status.
Disadvantages:

  1. Server pressure increases : Usually sessions are stored in memory. When the number of users increases, the server pressure increases.
  2. CSRF cross-site request forgery attack : session is based on cookie for user identification. If the cookie is intercepted by a malicious website, it will be attacked by cross-site request forgery.
  3. Low scalability : If you need to build multiple servers for offloading in the future, although each server has the same logic, session data is not shared, and different servers cannot obtain user login information.

The main difference between token and session is:

  1. After the authentication is successful, the current user data will be encrypted, an encrypted string token will be generated, and returned to the client (the server will not save it)
  2. The token value received by the browser is stored in Local Storage and is not automatically carried like a cookie.
  3. When accessing the server again, the server decrypts the transmitted token value, and after the decryption is completed, the user data query is performed. If the query is successful, the authentication is passed to achieve state maintenance.

Eight, high concurrent system design

Single independent deployment of services
Inventory preheating, using semaphore to check the database, putting data in advance such as redis
dynamic and static separation
Gateway interception
Current limiting, circuit breaker, downgrade
Queue peak clipping, instead of grabbing the database, only grabbing the semaphore in redis.

Guess you like

Origin blog.csdn.net/weixin_44179010/article/details/124105390