NUMA's Choice

  Today's machines have multiple CPUs and multiple memory blocks. In the past, we regarded the memory block as a large piece of memory, and the access messages of all CPUs to this shared memory were the same. This is the SMP model commonly used before. However, as the number of processors increases, shared memory may cause more and more memory access conflicts, and if memory access reaches a bottleneck, performance cannot increase accordingly. NUMA (Non-Uniform Memory Access) is a model introduced in such an environment. For example, a machine has 2 processors and 4 memory blocks. We combine 1 processor and two memory blocks and call it a NUMA node, so that the machine will have two NUMA nodes. In physical distribution, the physical distance between the processor and the memory block of a NUMA node is smaller, so the access is also faster. For example, this machine will be divided into two processors (cpu1, cpu2) on the left and right, and two memory blocks (memory1.1, memory1.2, memory2.1, memory2.2) will be placed on both sides of each processor, so that the NUMA node1 It is faster for cpu1 to access memory1.1 and memory1.2 than to access memory2.1 and memory2.2. Therefore, if the NUMA mode can be used to ensure that the CPU in the node only accesses the memory blocks in the node, then the efficiency is the highest.

When running the program, use numactl -m and -physcpubind to specify which cpu and which memory to run the program on. Playing with cpu-topology gives a table, when the program uses only one node resource and the comparison table using multiple node resources (almost the gap between 38s and 28s). So it makes sense to limit the program to run in the numa node.

But then again, it must be good to formulate numa? --numa's trap. SWAP's Crime and Punishment article talks about a numa trap. The phenomenon is that when your server still has memory, it is found that it has already started to use swap, and it has even caused the machine to stagnate. This may be due to the limitation of numa. If a process restricts it to only use the memory of its own numa node, then when the memory of its own numa node is used up, it will not use the memory of other numa nodes, and will start to use it. swap, or even worse, when the machine is not set to swap, it may crash directly! So you can use numactl --interleave=all to unrestrict numa node.



In summary, the conclusion drawn is that the use of NUMA is determined according to the specific business.

If your program is going to use a large amount of memory, you should mostly choose to turn off the limit of numa node (or turn off numa from hardware). Because at this time your program is very likely to encounter numa traps.

Also, if your program does not take up a lot of memory, but requires faster program runtime. Most of you should choose the method of restricting access to this numa node for processing



-----------------
  When the numa of the os layer is closed, opening the numa of the bios layer will affect the performance, QPS It will drop by 15-30%;

  when the numa at the bios level is turned off, no matter whether the numa at the os level is turned on or not, it will not affect the performance.

      Install numactl: 
      #yum install numactl -y
      #numastat is equivalent to cat /sys/devices/system/node/node0/numastat and records details about all memory nodes in the system in the /sys/devices/system/node/ folder. #numactl --hardware List NUMA nodes on the system

      #numactl --show View binding information For


details , please refer to: https://www.cnblogs.com/wjoyxt/p/4804081.html

https://jingyan.baidu.com/ article/17bd8e525461ba85ab2bb8ec.html

http://www.cnblogs.com/zhoujinyi/p/3479801.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325339675&siteId=291194637