[Performance optimization] Interview questions - what is performance? how to optimize

What is performance?

Performance = 1/response time
The reciprocal of the time, we need a standard to measure the performance of the computer. There are two main indicators in this standard.
1. Response time or Execution time. If you want to improve the performance index of response time, you can understand it as making the computer "run faster".
2. Throughput (Throughput) or bandwidth (Bandwidth), if you want to improve this indicator, you can understand it as letting the computer "move more".
So, response time refers to how much time it takes us to execute a program. The less time spent, the better the natural performance.

The throughput rate refers to how many things we can handle within a certain time frame. The "things" here are the data processed or the program instructions executed in the computer.

How to improve performance?

Compared with moving things, if our response time is short and we run fast, we can make more trips back and forth and move more. Therefore, shortening the response time of the program generally improves the throughput. Besides reducing the response time, is there any other way we can do it? Of course, for example, we can also find a few more people to move together, which is similar to modern servers with 8 cores and 16 cores. With more people and more power, and processing data at the same time, more data can be processed per unit time, and the throughput rate will naturally increase.
There are many ways to increase throughput. Most of the time, we just need to add more machines and pile up more hardware. But the improvement of response time is not so easy,

cpu clock

But when measured by time, there are two problems.
The first is that the time is not "accurate". It is possible that it took 45ms this time and 53ms the next time. We count the time by using similar to "pinch a stopwatch", recording the time when the program finishes running minus the time when the program starts running. This time is also called Wall Clock Time or Elapsed Time, which is the time when the clock hanging on the wall goes away during the running of the program. But because there is a process switch. When some programs are running, they may need to read data from the network and hard disk, and wait for the network and hard disk to read the data and give it to the memory and CPU. Therefore, in order to accurately count the running time of a certain program, and then compare the actual performance of the two programs, we have to eliminate these times.

There is a command called time under Linux, which can help us to calculate how much time the program actually spends on the CPU under the same Wall Clock Time.

Let's simply run the time command. It will return three values, the first is real time, which is what we call Wall Clock Time, that is, the time elapsed during the entire process of running the program; the second is user time, that is, the CPU is running your program , the time to run instructions in user mode; the third is sys time, which is the time when the CPU is running your program and running instructions in the operating system kernel. The CPU execution time actually spent by the program (CPU Time) is user time plus sys time.

(base) [root@localhost shapan_alg]# time seq 1000000 | wc -l![在这里插入图片描述](https://img-blog.csdnimg.cn/16439b85480a4ae8b1372c5c5a5621ff.png)

1000000

real    0m0.037s
user    0m0.014s
sys     0m0.004s
(base) [root@localhost shapan_alg]#

In the example I gave, you can see that the program actually took 0.101s, but the CPU time was only 0.031+0.016 = 0.047s. Less than half of the time running the program is actually spent on the program.

Second, even if we've got CPU time, we can't necessarily "compare" the performance difference of two programs directly. Even on the same computer, the CPU may run at full load or underclocked, and it will naturally take more time to underclock.
In addition to the CPU, the performance indicator of time will also be affected by other related hardware such as the motherboard and memory. Therefore, we need to disassemble the "time" indicator that we can perceive, and turn the CPU execution time of the program into the product of the number of CPU clock cycles (CPU Cycles) and the clock cycle time (Clock Cycle).
Program CPU execution time = CPU clock cycles × clock cycle time
The simplest solution to improve performance is to naturally shorten the clock cycle time, that is, increase the main frequency. In other words, just change to a better CPU. However, this is something that we software engineers cannot control, so we moved our attention to another factor of multiplication-the number of CPU clock cycles. If you can reduce the number of CPU clock cycles your program needs, you can also improve program performance.
For the number of CPU clock cycles, we can do another decomposition and turn it into "the number of instructions × the average number of clock cycles per instruction (Cycles Per Instruction, referred to as CPI)". Different instructions require different cycles. Both addition and multiplication correspond to one CPU instruction, but multiplication requires more cycles than addition, so it is naturally slower. After splitting in this way, the CPU execution time of our program can become the product of these three parts.

Program CPU execution time = number of instructions × CPI × Clock Cycle Time

Therefore, if we want to solve the performance problem, it is really to optimize all three.

The clock cycle time is the main frequency of the computer, which depends on the computer hardware. The well-known Moore's Law has been continuously increasing the main frequency of our computers. For example, the main frequency of the 80386 I used at the beginning was only 33MHz, and now the notebook computer in hand has 2.8GHz, and the main frequency has been increased by nearly 100 times.
The average number of clock cycles per instruction, CPI, is how many CPU cycles are required for an instruction. When explaining the CPU structure later, we will see that modern CPUs use pipeline technology (Pipeline) to make an instruction require as few CPU Cycles as possible. Therefore, the optimization of CPI is also an important part of computer composition and architecture.
The number of instructions represents how many instructions are needed to execute our program and which instructions are used. This often leaves the challenge to the compiler. The same code, when compiled into computer instructions, has various representations.

Guess you like

Origin blog.csdn.net/weixin_40293999/article/details/130262684