[Reprint] Linux how to Bind Processes CPU core to improve performance

Linux How to Bind Processes CPU core to improve performance

https://www.jianshu.com/p/f59d7df06432

 

In the Linux system, scheduled handover process is done automatically by the kernel, in a multi-core CPU, it is possible to switch back and forth process execution on a different CPU core, which is not very favorable for the CPU cache. why? Look at Intel i5 CPU cache a simple diagram:

 
CPU cache simplified schematic

In the multi-core CPU structure, each core has its own L1, L2 cache, L3 cache is shared. If a process switch back and forth between the core, the core of each cache hit rate will be affected. Conversely, if the process regardless of the schedule, are always executed on a core, so its data L1, L2 cache hit rate can be significantly improved.

1. How to set the CPU core binding process

In the Linux system, you can use CPU_ * family of functions and  sched_setaffinity() you can bind, follow these steps:

  1. Use CPU_ family of functions must be defined _GNU_SOURCE macro tells the compiler to enable these functions:
#define _GNU_SOURCE
  1. First a statement  cpu_set_t, and then use the  CPU_ZERO()data initialization bit:
cpu_set_t mask;
CPU_ZERO(&mask); 

cpu_set_tIs actually a bit string, each bit represents the process if you want to bind with a CPU core.

  1. Next, the process is bound to certain CPU core, which use CPU_SET()to set cpu_set_t in the corresponding bit position, for example, we want the process to run only on the core or core 1 5:
CPU_SET(1, &mask); CPU_SET(5, &mask); 
  1. Finally, sched_setaffinityto complete the actual binding:
sched_setaffinity(0, sizeof(cpu_set_t), &mask); 

Set up is not difficult. Verify that how we bind it really worked? Let's do an experiment:

It assumes that a dual-core machine, this program we played 20 process, assigned a number from 0 to start the process of each process (note that this value is the process of starting our own number, not the process pid), odd number binding process binding on the Core 0 execution, the process is bound to the even-numbered Core 1 execution.

我们用for让进程循环,用 sched_getcpu() 函数获得当前进程运行在哪个CPU核心上,每次for循环检查下进程是否真的在分配的核心执行。

#define _GNU_SOURCE
#include <sched.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> void run(int c, int n) { cpu_set_t mask; CPU_ZERO(&mask); CPU_SET(n, &mask); sched_setaffinity(0, sizeof(cpu_set_t), &mask); int i; for (i = 0; i != 10000; i++) { printf("%d-%d\n", c, sched_getcpu()); } } int main() { int i; for (i = 0; i != 20; i++) { int pid = fork(); if (pid == 0) { run(i, i % 2); exit(0); } } } 

执行上面的程序,就会打印每个进程绑定的CPU核号,进程与核号的关系肯定不会变。如果把 sched_setaffinity() 注释掉,CPU进程就失去绑定。

2. 设置亲和性后的性能测试

设置了进程与CPU绑定后,我们来看看是否能真的带来性能的提升。修改上面的run()函数,每个进程创建一个数组,然后计算数组中值的累加,创建数组的意图是保证进程用到了CPU核心的L1、L2缓存:

void run(int c, int n) { cpu_set_t mask; CPU_ZERO(&mask); CPU_SET(n, &mask); sched_setaffinity(0, sizeof(cpu_set_t), &mask); struct timeval tv; gettimeofday(&tv, NULL); long begin = tv.tv_sec * 1000 + tv.tv_usec / 1000; int i; int arr[N]; for (i = 0; i != N; i++) { arr[i] = i; } long sum = 0; for (i = 0; i != N; i++) { sum += arr[i]; } gettimeofday(&tv, NULL); long end = tv.tv_sec * 1000 + tv.tv_usec / 1000; printf("%ld\n", end - begin); } 

然后执行20次程序,10次没有CPU绑定,10次有CPU绑定,记录每个进程的耗时毫秒数,就有下面的结果:

 
CPU绑定测试

P1~P20是进程号,A1~A10列是没有CPU绑定的情况,B1~B10列是有CPU绑定的情况,耗时越久单元格越红。可见绑定了CPU的情况下性能有近10%的提升。

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12166220.html