Distributed system: high concurrency

Table of contents

1. High concurrency

1.1. Definition     

1.2. Terminology      

2. What is the goal of high concurrency system design

2.1. Macro goals

2.2. Micro goals

3. Evolution of high concurrency solutions

3.1. Initial concurrency solutions

3.2. Application-level solutions

3.3. The ultimate solution - distributed

4. System architecture

4.1. Solutions

4.1.1. Page layer

4.1.2. Network layer, application layer

4.1.3. Service layer

4.1.4. Storage layer DB

5. Best practice solution for high concurrency

5.1. High-performance practical scheme 1

5.2. High availability practice plan 2

5.3. Highly scalable practice solution 3

reference


1. High concurrency

1.1. Definition     

        High concurrency refers to the ability of the system to handle a large number of requests at the same time. In software development, in order to cope with high concurrency, it is usually necessary to optimize system architecture design, code optimization, caching strategy, etc.

1.2. Terminology      

 Commonly used indicators related to high concurrency include response time (Response Time), throughput (Throughput), query rate per second QPS (Query Per Second), and the number of concurrent users.

Response time: The time it takes for the system to respond to a request. For example, it takes 200ms for the system to process an HTTP request, and this 200ms is the response time of the system.

Throughput: The number of requests processed per unit time. TPS and QPS are common quantitative indicators of throughput

QPS : The number of response requests per second. It refers to the number of queries that a server can respond to per second. In the Internet field, the distinction between this indicator and throughput is not so obvious. QPS is just a simple query statistics, it is not recommended to use QPS to describe the overall performance of the system

TPS: the number of transactions per second, a transaction refers to the process in which the client sends a request to the server and the server responds,

Taking the definition of a single interface as an example of a transaction, each transaction includes the following three processes:

  • (1) Send a request to the server
  • (2) Server's own internal processing (including application server, database server, etc.)
  • (3) The server returns the result to the client

If more than N times of 3 processes can be completed per second, TPS is N

Number of Concurrent Users: The number of users who are carrying normal system functions at the same time. This value can be obtained by analyzing the number of access logs in the machine within 1s

The difference between QPS and TPS:

(1) If it is a pressure test on a query interface, and this interface will not request other interfaces internally, then TPS = QPS, otherwise, TPS ≠ QPS

(2) If it is a capacity scenario, assuming that N interfaces are all query interfaces, and this interface will not request other interfaces internally, QPS = N * TPS 

2. What is the goal of high concurrency system design

2.1. Macro goals

① High performance: Performance reflects the parallel processing capability of the system. With limited hardware investment, improving performance means saving costs.

② High availability: Indicates the time when the system can serve normally. One is non-stop and trouble-free throughout the year; the other has online accidents and downtime every now and then, and users must choose the former. In addition, if the system can only be 90% available, it will also greatly drag down the business.

③High expansion: Indicates the expansion capability of the system, whether the expansion can be completed in a short period of time during peak traffic, and can more smoothly undertake peak traffic, such as double 11 events, celebrity divorces and other hot events.

2.2. Micro goals

Performance indicators: Performance indicators can be used to measure existing performance problems and serve as the evaluation basis for performance optimization. Generally speaking, the interface response time within a period of time is used as an indicator.

        ①Average response time: the most commonly used, but the defect is obvious, not sensitive to slow requests. For example, if there are 10,000 requests, 9900 of them are 1ms, and 100 are 100ms, then the average response time is 1.99ms. Although the average time consumption only increases by 0.99ms, the response time of 1% requests has increased by 100 times.

        ②Percentile values ​​such as TP90 and TP99: sort the response time from small to large, and TP90 represents the response time at the 90th percentile. The larger the percentile value, the more sensitive it is to slow requests.

        ③ Throughput: It is inversely proportional to the response time. For example, if the response time is 1ms, the throughput is 1000 times per second.

Availability index: high availability means that the system has a high ability to operate without failure, availability = average failure time / total system running time

Scalability indicator : In the face of burst traffic, it is impossible to temporarily modify the architecture. The fastest way is to increase the processing capacity of the system linearly by adding machines. .

Therefore, high scalability needs to be considered: middleware such as service clusters, databases, caches and message queues, load balancing, bandwidth, dependent third parties, etc. When the concurrency reaches a certain level, each of the above factors may become scalable. bottleneck point.

3. Evolution of high concurrency solutions

There are two main methods for networked distributed architecture design to improve system concurrency: vertical ( vertical ) expansion (Scale Up) and horizontal (horizontal) expansion (Scale Out)

Horizontal (horizontal) expansion: increase the number of servers and linearly expand system performance. Horizontal expansion is required for system architecture design

Vertical (vertical) expansion: Improve system performance and capacity by increasing hardware resources such as server processing power, memory, and storage space.

3.1. Initial concurrency solutions

Vertical expansion: improve the processing capacity of a single machine

There are two ways to expand vertically:

(1) Enhance stand-alone hardware performance: by increasing the number of CPU cores such as 32 cores, upgrading the network card to 10 Gigabit, upgrading the mechanical hard disk to solid state, expanding the hard disk capacity such as 2T, and expanding the system memory; the CPU is upgraded from 32 bits to 64 bits; replacement is free Tomcat, using commercial weblogic; optimizing the Linux kernel; purchasing high-performance servers, etc.

(2) Improve the performance of stand-alone architecture: use cache to reduce IO times, use asynchrony to increase single-service throughput, and use lock-free data structures to reduce response time

With the continuous increase of business, server performance soon reached the bottleneck

3.2. Application-level solutions

  1) Web page HTML static (requires CMS project support)

  2) Image server separation (common solution)

  3) Caching (common solution) The best policy is distributed caching

  4) Mirror image (more downloads) 

Because there will always be a limit to stand-alone performance, horizontal (horizontal) expansion will eventually need to be introduced

3.3. The ultimate solution - distributed

        Due to the rapid growth of business volume, access volume and data flow of each core part of the existing network, its processing capacity and calculation intensity also increase correspondingly, making it impossible for a single server device to bear it. In this case, if you throw away the existing equipment and do a lot of hardware upgrades, it will cause a waste of existing resources, and if you face the next increase in business volume, this will lead to another high cost of hardware upgrades. Cost input, even the equipment with excellent performance cannot meet the needs of the current business growth. 

        In order to solve this problem, a distributed system architecture is adopted , and the concurrent processing capability is further improved through cluster deployment. This architecture can connect different server devices together to jointly undertake the growth of business volume. At the same time, load balancing can also be used to balance the load among different servers to improve the performance and reliability of the entire system.

        Cloud computing technology can also be used to deploy services on the cloud platform and dynamically expand computing resources according to changes in actual business volume, so as to better meet the challenges of business growth.

4. System architecture
 

As shown in the figure, the core layer of the system is simply divided into: page layer, network layer, application layer, service layer, and persistence layer.

If the load balancing layer uses high-performance Nginx, we can estimate the maximum concurrency of Nginx to be: 10W+, and the unit here is 10,000.

Assume that we use Tomcat at the application layer, and the maximum concurrency of Tomcat can be estimated to be about 800, and the unit here is hundreds.

Assuming that the cache of the persistence layer uses Redis and the database uses MySQL, the maximum concurrency of MySQL can be estimated at about 1000, in units of thousands. The maximum concurrency of Redis can be estimated to be about 5W, in units of tens of thousands.

Therefore, the concurrency of the load balancing layer, the application layer, and the persistence layer are different. So, in order to improve the overall concurrency and caching of the system, what solutions can we usually adopt?

(1) System expansion

        System expansion includes vertical expansion and horizontal expansion, adding equipment and machine configuration, which is effective in most scenarios.

(2) Cache

        Local cache or centralized cache to reduce network IO and read data based on memory. Works in most scenarios.

(3) Read and write separation

        Use read-write separation, divide and conquer, increase the parallel processing capability of the machine

4.1. Solutions

        Combined with the actual business, to deal with high concurrency, you can design a stable and reliable system from the vertical and horizontal directions, supplemented by corresponding business strategies.

Decrease the user's request from the client to the db layer layer by layer

4.1.1. Page layer

  • Separation of light and heavy logic: Take the seckill as an example, separate the grabbing from the account;
    • Grabbing is a relatively light operation. After the inventory deduction is successful, it can be successful
    • Accounting is a relatively heavy operation, which needs to involve transaction operations
  • User diversion: Take the hourly flash sale event as an example, within 1 minute, open the entrance to users one after another, break up all user requests within 60 seconds, and the request can be reduced by an order of magnitude; similarly, 12306 buys tickets, Double Eleven flash sale , to smoothly distribute the instantaneous flow to a longer period of time, which is the so-called peak clipping
  • Page simplification: When the flash sale starts, the page display needs to be simplified, and only the functions related to the flash sale are reserved at this moment. For example, when the seckill starts, the page may not display recommended products.
  • By limiting the rate of user requests, the access traffic during the peak period is smoothly distributed to the entire time period, so as to reduce the pressure on the server and improve the quality of service
  • Retry strategy: If the user spike fails, frequent retries will aggravate the backend avalanche. How to retry it? According to the convention of backend return code, there are two methods:
    • Retrying errors is not allowed, at this time both the ui and the copy need to have a prompt. Also do not allow retries
    • Retryable errors require strategic retries, such as binary backoff. At the same time, copywriting and ui need hints.
  • UI and copywriting: Before and after the seckill starts, all user exceptions need to have well-designed ui and copywriting prompts. For example: [The current event is too popular, please try again later] [Your goods are stuck on the road, please check later] etc.
  • The front-end randomly discarding requests can be used as a downgrading solution. When the user traffic is far greater than the system capacity, a random discard flag is manually issued, and the user's local client starts randomly discarding requests. ​​​​​​​​

4.1.2. Network layer, application layer

  • All requests need to be authenticated to verify legal identity
    • If it is a long link service, the authentication granularity can be at the session level; if it is a short link service, it is necessary to deal with such high concurrent traffic, such as cache, etc.
  • According to the capacity of the backend system, a global current limiting function is required. There are usually two methods:
    • After setting N, dynamically obtain the machine deployment status M, and then issue the single-machine current limit value N/M. It is required to request uniform access and deploy machines uniformly.
    • Maintain a global key and create a key with a timestamp. If there is a hot key problem, you can add a finer-grained key or update the key regularly.
  • Frequency control is required for single user/single ip, mainly to prevent hackers and malicious users. If the seckill is conditional, for example, you need to complete the xxx task to unlock the qualification, and for the step of obtaining the qualification, you can conduct a security scan to identify black and malicious users.

4.1.3. Service layer

  • The logic layer should first enter the verification logic, such as the validity of the parameters, whether they are qualified, and if the user fails, return quickly to avoid the request from piercing through to the db.
  • Asynchronous order replenishment, for users who have already deducted the Lightning Deal qualification, if the delivery fails, the usual two methods are:
    • Transaction rollback, rollback this behavior, prompt the user to try again. This cost is particularly high, and if the user retry is combined with the previous retry strategy, the user experience will not be smooth.
    • Asynchronous redo, record the user's log this time, prompt the user [check it later, it is in delivery], and start the asynchronous order replenishment in the background after the peak value. Requires services to support idempotence
  • For shipped inventory, hotkeys need to be handled. The usual practice is to maintain multiple keys, and each user will go to a certain query inventory. For the scene where a large number of people grab red envelopes, they can be allocated in advance.

4.1.4. Storage layer DB

For the business model, the requirements for db need to ensure several principles:

  • reliability
    • Active/standby: The active/standby can be switched to each other, and it is generally required to cross computer rooms in the same city
    • Disaster recovery in different places: When an abnormality occurs in one place, the data can be restored, and the master can be selected in another place
    • Data needs to be persisted to disk, or a cooler device
  • consistency
    • For seckill, strict consistency is required, and the master and backup are generally required to be strictly consistent.

5. Best practice solution for high concurrency

5.1. High-performance practical scheme 1

Cluster deployment: reduce the pressure on a single machine through load balancing.

Multi-level cache, including static data using CDN, local cache, distributed cache, etc., as well as the processing of hotspot keys, cache penetration, cache concurrency, and data consistency in cache scenarios.

Sub-database sub-table and index optimization, and solve complex query problems with the help of search engines. Consider the use of NoSQL databases, such as HBase, TiDB, etc., but the team must be familiar with these components and have strong operation and maintenance capabilities.

Asynchronous, asynchronous processing of secondary processes or time-consuming operations through multi-threading, MQ, and even delayed tasks.

Asynchronous processing : Process some time-consuming operations asynchronously, such as sending emails, generating reports, etc., to avoid blocking the main thread and improve the concurrency of the system.

For current limiting, you need to consider whether the business allows current limiting (for example, the seckill scenario is allowed), including front-end current limiting, Nginx access layer current limiting, and server-side current limiting. The traffic is peak-shaving and valley-filling , and the traffic is accepted through MQ.

Concurrent processing, parallelizing serial logic through multithreading.

Pre-calculation, such as the scene of grabbing red envelopes, can calculate the amount of red envelopes in advance and cache them, and use them directly when sending out red envelopes.

Cache preheating , preheating data to local cache or distributed cache in advance through asynchronous tasks.

Reduce the number of IOs, such as batch reading and writing of databases and caches, RPC batch interface support, or kill RPC calls through redundant data.

Reduce the packet size during IO, including using lightweight communication protocols, appropriate data structures, removing redundant fields in the interface, reducing the size of the cache Key, compressing the cache Value, etc.

Program logic optimization, such as prepending the judgment logic that blocks the execution flow with a high probability, optimizing the calculation logic of the For loop, or adopting a more efficient algorithm.

Use of various pooling technologies and pool size settings, including HTTP request pool, thread pool (set core parameters considering CPU-intensive or IO-intensive), database and Redis connection pool, etc.

JVM optimization, including the size of the new generation and the old generation, the selection of GC algorithms, etc., to reduce the GC frequency and time consumption as much as possible.

For lock selection, optimistic locks are used in scenarios with more reads and fewer writes, or consider reducing lock conflicts through segmented locks.

5.2. High availability practice plan 2

Peer node failover, both Nginx and the service governance framework support access to another node after a node fails.

Failover of non-peer nodes, through heartbeat detection and implementation of master-standby switchover (such as redis sentinel mode or cluster mode, MySQL master-slave switchover, etc.).

Timeout settings, retry strategies, and idempotent design at the interface level.

Degradation processing: Guarantee core services, sacrifice non-core services, and fuse when necessary; or when there is a problem with the core link, there is an alternative link.

Current limiting processing: Directly reject or return error codes to requests that exceed the system's processing capacity. Message reliability assurance in MQ scenarios, including the retry mechanism on the Producer side, the persistence on the Broker side, and the Ack mechanism on the Consumer side.

Grayscale release can support small-traffic deployment based on the machine dimension, observe system logs and business indicators, and push the full volume after the operation is stable.

Monitoring and alarming: a comprehensive monitoring system, including the most basic monitoring of CPU, memory, disk, and network, as well as monitoring of Web servers, JVM, databases, various middleware, and business indicators.

Disaster recovery drill: Similar to the current "chaos engineering", some destructive measures are performed on the system to observe whether local failures will cause usability problems

5.3. Highly scalable practice solution 3

Reasonable layered architecture: For example, the most common layered architecture of the Internet mentioned above, in addition, microservices can be further layered according to the data access layer and business logic layer (but performance needs to be evaluated, there will be network In the case of one more hop).

Splitting of the storage layer: vertical splitting according to the business dimension, and further horizontal splitting according to the data feature dimension (sub-database and sub-table) .

Splitting of the business layer: The most common split is based on business dimensions (such as commodity services and order services in e-commerce scenarios), or split according to core interfaces and non-core interfaces, or split according to request sources (such as To C and To B, APP and H5). Distributed Trace, full-link stress testing, and flexible transactions are all technical points to consider

​​​​​​​Reference

https://mp.weixin.qq.com/s/uex9zkf2uPeTp56cfv4dHA
https://mp.weixin.qq.com/s/fDn4iHWuBEfzvNnVrWud2w
https://www.cnblogs.com/hanease/p/15863393.html
https://www.cnblogs.com/sy270321/p/12503504.html
https://zhuanlan.zhihu.com/p/109742840?from_voters_page=true&utm_id=0
https://juejin.cn/post/6865202367672320014

Guess you like

Origin blog.csdn.net/qq_20957669/article/details/130743469