Kunpeng performance optimization ten tricks - Kunpeng NUMA About five steps and tuning processor

1.1 Kunpeng processor NUMA Profile

With the rapid development of modern information society, intelligent, more and more devices to access the Internet, networking, car networking, which gave birth to the huge computing needs. But the problem with power consumption and cooling wall two major constraints greatly influenced the development of a single accounting forces. In order to meet the world's fast-growing smart operator demand for power, multi-core architecture as the most important direction of evolution.

Conventional scheme uses multi-core SMP (Symmetric Multi-Processing) technology, i.e., a symmetric multi-processor architecture, shown in Figure 1-1. In symmetric multi-processor architecture, the status of each processor is equal, same authority to use the memory. Any program or a process, a thread can be assigned to any one processor, with the support of the operating system, can achieve very good load balancing, so that overall system performance, throughput has greatly improved. However, due to the multiple cores using the same bus access memory, with the growth of the number of cores, the bus will become performance bottlenecks and scalability constraints of the system.

FIG symmetric multi-processor SMP architecture 1-1

111.png

 

Kunpeng processor supports NUMA (Non-uniform memory access, non-uniform memory access) architecture, can solve the technical constraints on the number of SMP CPU core. NUMA architecture form a plurality of core node (the Node), each corresponding to a node by On-chip Network communication, implemented using Hydra Interface between a CPU between the different symmetric multiprocessor (the SMP), a CPU node high bandwidth low latency communications between the sheet shown in Figure 1-2. In NUMA architecture, the entire memory space is distributed physically, all these memory of the set is the global memory of the whole system. Each core memory access time depends on the position with respect to the memory of the processor, accesses the local memory (internal node) will be faster. Linux kernel from the 2.5 version began to support NUMA architecture, the operating system now also provides a wealth of tools and interfaces, help us with the nearest access memory optimization and configuration. Therefore, the use of computer systems Kunpeng processor achieved through appropriate tuning, both to reach a good performance, but also to solve the bottleneck problem of SMP bus architecture, providing greater multi-core scalability, as well as better and more flexible computing power.

Figure 1-2 NUMA architecture

2222.png

 

1.2 Performance Tuning five-step method

Performance optimization Table 1-1 usually by five steps.

Table 1-1 General Procedure Performance Optimization

No.

step

Explanation

1

Establish benchmarks

Before the beginning of optimization or monitor, we must first establish a baseline data and optimization objectives. This includes a reference hardware configuration, network, the test model, system operational data (the CPU / memory / the IO / network throughput / latency, etc.). We need to do a comprehensive evaluation and monitoring system in order to better analyze system performance bottlenecks, and performance changes after the implementation of measures to optimize the system. Optimization that is based on objective performance goals of the current hardware and software architecture of the system is expected to achieve. Performance tuning is a long process, at the beginning of the optimization work, it is easy to identify bottlenecks and optimize the implementation of effective measures to optimize the results are often very significant, but more to the latter difficulty the greater optimization, optimization measures more difficult to find, the effect will become increasingly weak. Therefore, we recommend that there is a reasonable balance.

2

Stress testing and monitoring bottleneck

Use peak workloads or specialized stress testing tool for stress testing the system. Some performance monitoring tools used to observe the state of the system. During the stress test, the proposed detailed records of systems and processes running, accurate historical record will help to analyze bottlenecks and optimize confirm the effectiveness of the measures.

3

Identify bottlenecks

Objective stress testing and monitoring system is to identify bottlenecks. System bottlenecks often too busy in the CPU, IO wait, wait for the network and other aspects appear. Note that, the analysis is to identify bottlenecks entire test system comprising a test tool, test tool networking between the system under test, network bandwidth. There are a lot of "performance crisis" project fact is that these are easily overlooked aspect of test tools, testing, networking, etc. caused when performance optimization should first spend a little more time to troubleshoot these links.

4

Implementation optimization

After determining the bottleneck, then it should be optimized. This article summarizes the common system bottlenecks and optimization measures in the project where the team I encountered. We need to note that the process of tuning system is advancing despite twists and turns, not all of the optimization measures will have a positive effect, negative optimization is also often encountered. So we are ready to optimization measures, but also should be ready to roll back measures to optimize operating instructions. Avoid the implementation of a number of optimization measures lead to irreversible restore the environment and waste a lot of time and effort.

5

Confirm optimization results

After the implementation of optimization measures, restart stress test, related tools ready monitoring system, to confirm optimization results. Negative effects of optimization measures to be rolled back in time, adjust and optimize the program. If there is a positive effect of optimization, but did not reach the optimization target, repeat steps 2 "stress tests and monitoring bottleneck" as the opt goals, you need all the effective optimization measures and parameters are summarized, archive, enter the subsequent production version of the system released and preparation work.

 

In the less experienced or tuning the hardware and software of the system is not very understanding, you can use the reference model five-step method of gradually started work performance tuning. For engineers have extensive experience in tuning, or to have the system performance bottleneck in-depth insight of experts, you may also use other methods or processes unfold optimization.

Guess you like

Origin www.cnblogs.com/huaweicloud/p/12166354.html