Linux performance optimization (15)-CPU binding

One, isolated CPU

1. Introduction to isolated CPU

For CPU-intensive tasks with high CPU load, it is recommended to set CPU Affinity to improve task execution efficiency, avoid CPU context switching, and increase CPU Cache hit rate.
By default, the Linux kernel scheduler can use any CPU core. If a specific task (process/thread) needs to monopolize a CPU core and you don’t want other tasks (process/thread) to use it, you can isolate the specified CPU without letting others. Process usage.

2. The characteristics of isolated CPU

The isolated CPU can effectively improve the real-time performance of tasks running on the isolated CPU. While ensuring the running of tasks on the isolated CPU, it will reduce the CPU resources that other tasks can run. Therefore, the computer CPU resources need to be planned.

3. Isolate CPU settings

The isolcpus startup parameter in Linux Kernel is used to isolate one or more CPUs in the SMP balanced scheduling algorithm, and the specified process is placed in an isolated CPU to run through the CPU Affinity setting.
isolcpus= cpu_number [, cpu_number ,...]
(1) Modify the grub configuration file. The
default grub configuration is /etc/default/grub, and isolcpus=11,12,13,14,15 is added to the GRUB_CMDLINE_LINUX value. All CPU cores must be separated by commas, and area ranges are not supported.
GRUB_CMDLINE_LINUX="isolcpus=1,2 crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet"
(2) Update grub
and regenerate the grub boot file /boot/grub/grub.cfg, and restart the system to take effect.

update-grub
update-grub2
grub-mkconfig -o /boot/grub/grub.cfg

Once the Linux Kernel is started with the isolcpus parameter, the Linux Kernel task balance scheduler will no longer schedule the process to the specified CPU core. Users usually need to use the taskset or cset command to bind the process to the CPU core.

2. Introduction to CPU Binding

1. Introduction to CPU core

Hyper-Threading technology (Hyper-Threading) uses special hardware instructions to simulate two logical cores (CPU cores) into two physical chips, so that a single processor can use thread-level parallel computing, and is compatible with multi-threaded operating systems and The software reduces the idle time of the CPU and improves the operating efficiency of the CPU.
The physical CPU is the CPU installed on the computer motherboard.
A logical CPU is a physical CPU core on a physical CPU. Usually, a physical CPU has multiple physical cores, that is, there are multiple logical CPUs. If Intel Hyper-Threading Technology (HT) is supported, the logical CPU can be divided into twice the number of CPU Cores.
cat /proc/cpuinfo|grep "physical id"|sort -u|wc -l
View the number of physical CPUs
cat /proc/cpuinfo|grep "cpu cores"|uniq
View the number of cores in each physical CPU (ie the number of cores)
cat /proc/cpuinfo|grep "processor"|wc -l
View the number of logical CPUs
cat /proc/cpuinfo|grep "name"|cut -f2 -d:|uniq
View the name and model of the CPU View the logical CPU that the
ps -eo pid,args,psr
process is running

2. Introduction to CPU Binding

CPU binding is to set the corresponding CPU affinity for the process or thread to ensure that the process or thread will only run on the CPU with the corresponding flag bit set, thereby improving the efficiency of the application's use of the CPU. If the application can run on multiple CPUs, the operating system will frequently switch applications between the CPUs, causing the CPU cache to become invalid, reducing the hit rate of the cache, and leading to a decrease in CPU usage efficiency. Using CPU binding technology can avoid CPU Cache failure to a certain extent and improve system performance.
CPU affinity is a scheduling property (scheduler property), which can bind a process to one or a group of CPUs.
Under the SMP (Symmetric Multi-Processing) architecture, the Linux scheduler (scheduler) will make the specified process run on the bound CPU according to the CPU affinity setting, instead of running on other CPUs.,
Linux scheduler Also supports natural CPU affinity: The scheduler will try to keep the process running on the same CPU, which means that the process usually does not migrate frequently between processors, and the frequency of process migration means that The load is small.
Because the author of the program knows the program better than the scheduler, we can manually allocate CPU cores to it without taking up CPU0 too much, or let our key process and a bunch of other processes squeeze together, all set the CPU Affinity can make certain programs improve performance.
The Linux kernel process scheduler is inherently soft CPU affinity (affinity), and processes usually do not migrate frequently between processors.
View the CPU allocation of all processes View the CPU allocation
ps -eo pid,cmd,psr
of all threads of the process
ps -To 'pid,lwp,psr,cmd' -p [PID]

3. Features of CPU binding

Binding processes/threads to the CPU can significantly increase the CPU Cache hit rate, thereby reducing memory access loss and improving application performance. I think that under the NUMA architecture, this operation has a greater significance for the improvement of the system operating speed, while under the SMP architecture, this improvement may be relatively small. This is mainly due to the different allocation and use methods of the cache and bus resources between the two. Under the NUMA architecture, each CPU has its own set of resource systems; under the SMP architecture, each core still needs to share these resources.
When each CPU core runs a process, because the resources of each process are independent, there is no need to consider the context when switching between CPU cores; when each CPU core runs a thread, sometimes threads need to share resources, so Shared resources must be copied from one core of the CPU to another, causing additional overhead.

4. Taskset binding process

yum install util-linux
Install the taskset tool to
taskset [options] [mask] -p pid
view the CPU affinity of the process, use the -p option to specify the PID, and print the hexadecimal number by default, if you specify the -cp option to print the CPU core list. The binary form of 3 is 0011, corresponding to -cp prints 0 and 1, indicating that the process can only run on the 0th and 1st cores of the CPU.
taskset -c -p pid
View the CPU affinity of the specified process

taskset -p mask pid
taskset -c [CPU NUMBER] -p PID

Set the CPU Affinity of the specified process. For isolated CPUs, only the first CPU is valid.
Use CPU No. 11, 12, 13, 14, 15 to run the process

taskset -c 11,12,13,14,15 python xx.py
taskset -c 11-15 python xx.py

In Docker containers, isolated CPUs can still be used; when creating a Docker container, you can specify which CPUs can only be used by the container through the parameter --cpuset-cpus, so as to achieve isolated CPUs in the Docker container.

5. Cset binding process

cset set --cpu CPU CPUSET NAME
Define the set of CPU cores. For an independent CPU, only the first CPU core is valid.
cset proc --move --pid=PID,...,PID --toset=CPUSET NAME
Move multiple processes to the specified CPU set

Three, process binding CPU

1. System call API

#define _GNU_SOURCE        
#include <sched.h>
int sched_setaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask);
int sched_getaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask);

Parameters:
pid: process number, if the pid value is 0, it means that the current process is specified.
cpusetsize: The length of the number specified by the mask parameter, usually set to sizeof(cpu_set_t).
mask: CPU mask

2. Programming realization

#include<stdlib.h>
#include<stdio.h>
#include<sys/types.h>
#include<sys/sysinfo.h>
#include<unistd.h>

#define __USE_GNU
#include<sched.h>
#include<ctype.h>
#include<string.h>
#include<pthread.h>

#define THREAD_MAX_NUM 10  //1个CPU内的最多进程数
int CPU_NUM = 0;  //cpu中核数
int CPU = 3; // CPU编号

void* threadFun(void* arg)
{
    cpu_set_t mask;  //CPU核的集合

    CPU_ZERO(&mask);
    // set CPU MASK
    CPU_SET(CPU, &mask);
    //设置当前进程的CPU Affinity
    if (sched_setaffinity(0, sizeof(mask), &mask) == -1)
    {
        printf("warning: could not set CPU affinity, continuing...\n");
    }
    cpu_set_t affinity;   //获取在集合中的CPU
    CPU_ZERO(&affinity);
    // 获取当前进程的CPU Affinity
    if (sched_getaffinity(0, sizeof(affinity), &affinity) == -1)
    {
        printf("warning: cound not get Process affinity, continuing...\n");
    }
    int i = 0;
    for (i = 0; i < CPU_NUM; i++)
    {
        if (CPU_ISSET(i, &affinity))//判断线程与哪个CPU有亲和力
        {
            printf("this thread %d is running processor : %d\n", *((int*)arg), i);
        }
    }

    return NULL;
}

int main(int argc, char* argv[])
{
    int tid[THREAD_MAX_NUM];
    pthread_t thread[THREAD_MAX_NUM];
    // 获取核数
    CPU_NUM = sysconf(_SC_NPROCESSORS_CONF);
    printf("System has %i processor(s). \n", CPU_NUM);
    int i = 0;
    for(i=0;i<THREAD_MAX_NUM;i++)
    {
        tid[i] = i;
        pthread_create(&thread[i],NULL,threadFun, &tid[i]);
    }
    for(i=0; i< THREAD_MAX_NUM; i++)
    {
        pthread_join(thread[i],NULL);
    }
    return 0;
}

Compile:
gcc -o test test.c -pthread
Run results:

System has 4 processor(s). 
this thread 1 is running processor : 3
this thread 0 is running processor : 3
this thread 4 is running processor : 3
this thread 9 is running processor : 3
this thread 7 is running processor : 3
this thread 5 is running processor : 3
this thread 6 is running processor : 3
this thread 8 is running processor : 3
this thread 3 is running processor : 3
this thread 2 is running processor : 3

3. Taskset binds the process to the CPU

(1) Bind the process to the specified CPU

taskset -pc CPU_NUMBER  PID
taskset -p PID

Check the CPU Affinity of the process
(2) When the process starts, bind to the CPU to
taskset -c CPU_NUMBER PROGRAM&
start the PROGRAM program to run in the background, bind the process to the CPU_NUMBER core, and
taskset -p PID
check the CPU Affinity of the process

Four, thread binding CPU

1. System call API

#define _GNU_SOURCE            
#include <pthread.h>
int pthread_setaffinity_np(pthread_t thread, size_t cpusetsize, const cpu_set_t *cpuset);
int pthread_getaffinity_np(pthread_t thread, size_t cpusetsize, cpu_set_t *cpuset)

Parameters:
pthead: thread object
cpusetsize: the length of the number specified by the mask parameter, usually set to sizeof(cpu_set_t).
mask: CPU mask

2. Programming realization

#include<stdlib.h>
#include<stdio.h>
#include<sys/types.h>
#include<sys/sysinfo.h>
#include<unistd.h>

#define __USE_GNU
#include<sched.h>
#include<ctype.h>
#include<string.h>
#include<pthread.h>

#define THREAD_MAX_NUM 10  //1个CPU内的最多进程数
int CPU_NUM = 0;  //cpu中核数
int CPU = 3; // CPU编号

void* threadFun(void* arg)
{
    cpu_set_t affinity;   //获取在集合中的CPU
    CPU_ZERO(&affinity);
    pthread_t thread = pthread_self();
    // 获取当前进程的CPU Affinity
    if (pthread_getaffinity_np(thread, sizeof(affinity), &affinity) == -1)
    {
        printf("warning: cound not get Process affinity, continuing...\n");
    }
    int i = 0;
    for (i = 0; i < CPU_NUM; i++)
    {
        if (CPU_ISSET(i, &affinity))//判断线程与哪个CPU有亲和力
        {
            printf("this thread %d is running processor : %d\n", *((int*)arg), i);
        }
    }

    return NULL;
}

int main(int argc, char* argv[])
{
    int tid[THREAD_MAX_NUM];
    pthread_t thread[THREAD_MAX_NUM];
    // 获取核数
    CPU_NUM = sysconf(_SC_NPROCESSORS_CONF);
    printf("System has %i processor(s). \n", CPU_NUM);
    cpu_set_t mask;  //CPU核的集合

    CPU_ZERO(&mask);
    // set CPU MASK
    CPU_SET(CPU, &mask);

    int i = 0;
    for(i=0;i<THREAD_MAX_NUM;i++)
    {
        tid[i] = i;
        pthread_create(&thread[i],NULL,threadFun, &tid[i]);
        //设置当前进程的CPU Affinity
        if (pthread_setaffinity_np(thread[i], sizeof(mask), &mask) != 0)
        {
            printf("warning: could not set CPU affinity, continuing...\n");
        }
    }
    for(i=0; i< THREAD_MAX_NUM; i++)
    {
        pthread_join(thread[i],NULL);
    }
    return 0;
}

Compile:
gcc -o test test.c -pthread
Run results:

System has 4 processor(s). 
this thread 0 is running processor : 3
this thread 1 is running processor : 3
this thread 2 is running processor : 3
this thread 3 is running processor : 3
this thread 5 is running processor : 3
this thread 4 is running processor : 3
this thread 6 is running processor : 3
this thread 9 is running processor : 3
this thread 7 is running processor : 3
this thread 8 is running processor : 3

Guess you like

Origin blog.51cto.com/9291927/2594336