Linux performance optimization (12)-CPU performance tuning

1. Application optimization

(1) Compiler optimization. Properly enable compiler optimization options to improve performance during the compilation phase. gcc provides optimization options -On will automatically optimize the code of the application.
(2) Algorithm optimization. Using a lower complexity algorithm can significantly speed up the processing speed. In the case of relatively large data, O(nlogn) sorting algorithms (such as fast sorting, merge sorting, etc.) can be used instead of O(n^2) sorting algorithms (such as bubbling, insertion sorting, etc.).
(3) Asynchronous processing. Using asynchronous processing can prevent the program from being blocked because it is waiting for a certain resource, thereby improving the concurrent processing capability of the program. By replacing polling with event notifications, you can avoid the CPU-consuming problem of polling.
(4) Multithreading replaces multiprocess. Compared with process context switching, thread context switching does not switch the process address space, so the cost of context switching can be reduced.
(5) Make good use of cache. Frequently accessed data or steps in the calculation process can be cached in the memory, and can be directly obtained from the memory the next time it is used, speeding up the processing speed of the program.

2. System optimization

1. CPU binding

(1) CPU binding: Binding a process to one or more CPUs can improve the hit rate of the CPU cache and reduce the context switching problem caused by cross-CPU scheduling.
(2) CPU isolation: Group the CPUs and allocate processes to them through the CPU Affinity mechanism. Specifies that the CPU is exclusively occupied by a specific process and is not allowed to be used by other processes.

2. Process CPU resource limit

Using Linux cgroups to set the upper limit of the process CPU usage can prevent system resources from being exhausted due to an application's own problems.

3. Process priority adjustment

Use nice to adjust the process priority, a positive value lowers the priority, and a negative value raises the priority. Appropriately lowering the priority of non-core applications and increasing the priority of core applications can ensure that core applications are processed first.

4. Interrupt load balancing

Regardless of soft interrupt or hard interrupt, the interrupt handler may consume a lot of CPU resources. Turn on the irqbalance service or configure smp_affinity to automatically load balance the interrupt processing process to multiple CPUs.

5. NUMA optimization

The NUMA architecture processor will be divided into multiple nodes, each node has its own local memory space. NUMA optimization allows the CPU to only access local memory as much as possible.