High concurrent system-high performance

Article from: Alibaba's billion-level concurrent system design (2021 version)

Link: https://pan.baidu.com/s/1lbqQhDWjdZe1CBU-6U4jhA Extraction code: 8888 

table of Contents

Three goals of high concurrency system design: high performance, high availability, and scalability

Performance optimization principles

Performance metrics

Performance optimization under high concurrency

Course summary


When it comes to Internet system design, the words you may hear most are the "three highs", that is, "high concurrency", "high performance", and "high availability" . They are the eternal theme of Internet system architecture design. In the first two classes, I took you to understand the meaning and significance of high-concurrency system design and the principle of layered design. Next, I want to take you to understand the goals of high-concurrency system design as a whole, and then on this basis, enter Our topic today: How to improve the performance of the system?

Three goals of high concurrency system design: high performance, high availability, and scalability

High concurrency refers to the use of design methods to enable the system to handle more concurrent user requests, that is, to undertake more traffic. It is the background and prerequisite of all architecture design, and it is meaningless to talk about performance and usability without it. Obviously, you can achieve millisecond response time and five nines (99.999%) availability in two different scenarios, one request per second and ten thousand requests per second, regardless of the difficulty of design or the solution. The complexity is not one level.

Performance and availability are the factors that we must consider when designing a highly concurrent system. Performance reflects the user experience of the system. Imagine two systems that also undertake 10,000 requests per second. One has a response time of milliseconds and the other has a response time of seconds. The experience they bring to users is definitely different.

Availability means the time the system can serve users normally. Let's compare it again, or two systems that bear 10,000 times per second, one can achieve no downtime and trouble-free throughout the year, and the other can be maintained at intervals of three to five times. If you are a user, which system would you choose to use? Answer it goes without saying.

Another familiar term is "scalability", which is also a factor that needs to be considered in the design of highly concurrent systems. Why? Let me give a specific example. There are two types of traffic: normal traffic and peak traffic . The peak traffic may be several times or even dozens of times the usual traffic. When dealing with peak traffic, we usually need to make more preparations in architecture and solutions. This is why Taobao will spend more than half a year preparing for Double Eleven, and when faced with hot events such as "star divorce", the seemingly impeccable Weibo system still has services unavailable. The easy-to-expandable system can quickly complete the expansion in a short time and bear the peak traffic more stably.

High performance, high availability, and scalability are the three goals we pursue when designing high-concurrency systems. I will spend three classes to show you how to design high-performance, high-availability and ease under high concurrency and large traffic. Expanded system.

After understanding these contents, we formally enter today's topic: How to improve the performance of the system?

Performance optimization principles

"UI". Performance is the key to the success of system design, and achieving high performance is also a challenge to the programmer's personal ability. But before understanding the methods to achieve high performance, let's clarify the principles of performance optimization.

  1. First of all, performance optimization must not be blind, it must be problem-oriented. Getting rid of the problem, blindly optimizing early will increase the complexity of the system and waste the developer's time. Also, because some optimizations may compromise the business, it will also hurt the business.
  2. Secondly, performance optimization also follows the "eight-two principle", that is, you can use 20% of your energy to solve 80% of performance problems . Therefore, we must grasp the main contradictions in the optimization process and prioritize the main performance bottlenecks .
  3. Third, performance optimization also needs data support. In the optimization process, you must always understand how much your optimization reduces response time and how much throughput improves .
  4. Finally, the process of performance optimization is continuous. Highly concurrent systems are usually systems with relatively complex business logic, and performance problems in such systems usually have multiple reasons. Therefore, when we do performance optimization, we must have clear goals. For example, the response time is 10ms with a throughput of 10,000 requests per second. Then we need to continuously look for performance bottlenecks and formulate optimization solutions until the goal is reached. until.

Under the guidance of the above four principles, mastering the troubleshooting methods and optimization methods of common performance problems will definitely make you more comfortable in designing high-concurrency systems.

Performance metrics

The third principle of performance optimization mentioned that we need to have metrics for performance. Only with data can we clarify the current performance problems, and we can also use data to evaluate the effect of performance optimization. Therefore, it is very important to specify performance metrics. Generally speaking, the indicator for measuring performance is the response time of the system interface, but a single response time is meaningless. You need to know what the performance is like over a period of time. Therefore, we need to collect the response time data during this period, and then calculate the characteristic values ​​according to some statistical methods. These characteristic values ​​can represent the performance during this period. Our common eigenvalues ​​have the following categories.

average value

As the name implies, the average value is the sum of the response time data of all requests during this period and then dividing by the total number of requests . The average value can reflect the performance during this period to a certain extent, but its sensitivity is relatively poor . If there are a small number of slow requests during this period, the average value cannot be truthfully reflected.

For example, suppose we have 10,000 requests within 30s, and the response time of each request is 1ms, then the average response time during this period is also 1ms. At this time, when the response time of 100 requests becomes 100ms, the overall response time is (100 * 100 + 9900 * 1) / 10000 = 1.99ms. You see, although the average increase is less than 1ms, the actual situation is that the response time of 1% of the requests (100/10000) has increased by 100 times. Therefore, the average value can only be used as a reference for measuring performance .

Max

This is better understood, that is, the longest response time of all requests during this period, but its problem is that it is too sensitive. Take the above example as an example. If only one of the 10,000 requests has a response time of 100ms, then the maximum response time of the request during this period is 100ms, and the performance loss is one percent of the original. Obviously inaccurate.

Quantile

There are many quantile values, such as 90th, 95th, and 75th. Taking the 90th percentile as an example, we sort the response time of requests during this period from smallest to largest. If there are a total of 100 requests, then the response time at the 90th place is the 90th percentile value. The quantile value excludes the impact of occasional extremely slow requests on the data and can well reflect the performance of this period of time. The larger the quantile value, the more sensitive to the impact of slow requests .

In my opinion, the quantile value is the most suitable for use as the response time statistical value within the time period, and it is also used most in actual work . In addition, the average value can also be used as a reference value.

As I mentioned above, it doesn't make sense to talk about performance without concurrency. We usually use throughput or the number of simultaneous online users to measure concurrency and traffic, and we use throughput more often. But you have to know that these two indicators are in a reciprocal relationship. This is well understood. When the response time is 1s, the throughput is 1 time per second, and the response time is shortened to 10ms, then the throughput rises to 100 times per second. Therefore, when we measure performance, we generally consider both throughput and response time . For example, when we set performance optimization goals, we usually express it like this: With 10,000 requests per second, the response time 99th percentile is below 10ms.

So, how long is the response time more appropriate? This cannot be generalized.

From the perspective of user experience, 200ms is the first demarcation point: the response time of the interface is within 200ms, and the user does not feel the delay , as if it happened instantaneously. And 1s is another dividing point: when the response time of the interface is within 1s, although the user can feel some delay, it is acceptable. After more than 1s, the user will feel obviously waiting. The longer the waiting time, The worse the user experience. Therefore, the response time of the 99th quantile of the health system usually needs to be controlled within 200ms, and the proportion of requests that do not exceed 1s is more than 99.99% . Now that you understand the performance metrics, let's take a look at how we achieve high performance as concurrency increases.

Performance optimization under high concurrency

Suppose, you have a system now, in which there is only one processing core, the response time of the executed tasks is 10ms, and its throughput is 100 times per second. So how do we optimize performance to improve the concurrency of the system? There are two main ideas: one is to increase the number of processing cores of the system, and the other is to reduce the response time of a single task .

1. Increase the number of processing cores of the system

Increasing the number of processing cores of the system is to increase the parallel processing capabilities of the system . This idea is the easiest way to optimize performance. Take the previous example, you can increase the number of processing cores of the system to two, and add a process, so that the two processes run on different cores. In theory, the throughput of your system can be doubled. Of course, in this case, throughput and response time are not the inverse relationship, but: throughput = number of concurrent processes / response time. Amdahl's law in the computer field was proposed by Gene Amdahl in 1967. It describes the relationship between the number of concurrent processes and response time, meaning that under a fixed load, the speedup of parallel computing, that is, the efficiency improvement after parallelization, can be expressed by the following formula: (Ws + Wp) / ( Ws + Wp/s), where Ws represents the serial calculation amount in the task, Wp represents the parallel calculation amount in the task, and s represents the number of parallel processes.

From this formula, we can derive another formula: 1/(1-p+p/s) where s still represents the number of parallel processes, and p represents the proportion of the parallel part of the task. When p is 1, that is, when fully parallel, the speedup is equal to the number of parallel processes; when p is 0, that is, when fully serial, the speedup is 1, which means that there is no speedup at all; when s approaches infinity When the speedup is equal to 1/(1-p), you can see that it is exactly proportional to p. In particular, when p is 1, the speed-up ratio approaches infinity.

The derivation process of the above formula is a bit complicated, you only need to remember the conclusion.

We seem to have found a silver bullet to solve the problem. Can an unlimited increase in the number of processing cores improve performance without limitation, thereby improving the system's ability to handle high concurrency? Unfortunately, as the number of concurrent processes increases, parallelism Task competition for system resources will become more serious. Continue to increase the number of concurrent processes at a certain critical point, but will cause the decline of system performance, this is the inflection point model in the performance test .

From the figure, you can find that when the number of concurrent users is in the light pressure zone, the response time is stable, and the throughput is linearly related to the number of concurrent users. And when the number of concurrent users is in the heavy pressure zone, the system resource utilization reaches the limit, the throughput begins to decline, and the response time also rises slightly. At this time, when the pressure on the system is increased, the system enters the inflection point area, is in an overload state, the throughput drops, and the response time increases significantly. Therefore, when we evaluate system performance, we usually need to do stress testing, the purpose is to find the "inflection point" of the system, so as to know the carrying capacity of the system, it is also easy to find the bottleneck of the system, and continue to optimize the system performance .

After talking about improving parallelism, let's look at another way to optimize performance: reduce the response time of a single task.

2. Reduce the response time of a single task

To reduce the response time of a task, you must first see whether your system is CPU-intensive or IO-intensive, because different types of system performance optimization methods are not the same.

In a CPU-intensive system, a large number of CPU operations need to be processed, so choosing a more efficient algorithm or reducing the number of operations is an important optimization method for such systems . For example, if the main task of the system is to calculate the hash value, then choosing a higher-performance hash algorithm can greatly improve the performance of the system. The main way to find such problems is to find the method or module that consumes the most CPU time through some Profile tools , such as Linux's perf, eBPF, etc.

An IO-intensive system means that most of the operation of the system is waiting for IO to complete, where IO refers to disk IO and network IO. Most of the systems we know are IO-intensive, such as database systems, cache systems, and Web systems. The performance bottleneck of this type of system may lie within the system, or it may be dependent on other systems, and there are two main ways to find this type of performance bottleneck.

The first category is the use of tools. Linux has a rich tool set that can fully meet your optimization needs, such as network protocol stacks, network cards, disks, file systems, memory, and so on. There are many uses of these tools, and you can gradually accumulate in the process of troubleshooting. In addition, some development languages ​​also have analysis tools for language characteristics. For example, the Java language has its own memory analysis tools.

Another type of method is to find performance problems through monitoring. In monitoring, we can make time-sharing statistics for each step of the task, so as to find which step of the task consumes more time. This part will be specifically introduced in the evolution chapter, so I won't expand it here.

Then we have found the bottleneck of the system, how do we optimize? The optimization scheme will vary with the problem. For example, if the database access is slow, it depends on whether there is a lock table, whether there is a full table scan, whether the index is added properly, whether there is a JOIN operation, whether to add a cache, etc.; if it is The problem of the network depends on whether there is room for optimization of network parameters, whether there is a large number of timeout retransmissions, and whether the network card has a large number of packet loss, etc.

All in all, "the soldiers will block the water and cover the earth", we need to develop different performance optimization schemes to deal with different performance problems.

Course summary

Today, I took you to understand the principles of performance, metrics, and the basic ideas for optimizing performance under high concurrency. Performance optimization is a big topic. A short talk is not enough. So I will introduce some of these aspects in detail in later courses, such as how we use cache to optimize the read performance of the system, how Use message queues to optimize system write performance and so on. Sometimes you will be at a loss when you encounter performance problems. You can get some inspiration from today's course. Here I will summarize a few points for you:

  1. Data first: You must complete the performance monitoring system before launching a new system ;
  2. Master some performance optimization tools and methods, which requires continuous accumulation in the work;
  3. Basic computer knowledge is very important, such as network knowledge, operating system knowledge, etc. Only with the basic knowledge can you grasp the key to performance problems in the optimization process, and also be able to do well in the performance optimization process.

Guess you like

Origin blog.csdn.net/sanmi8276/article/details/113086111