[Performance] Interrupt binding and viewing|irqbalance interrupt load balancing|CPU bottleneck

Common commands 


```
# 查看当前运行情况
service irqbalance status
 
# 终止服务
service irqbalance stop

取消开机启动:
 chkconfig irqbalance off

# irqbalance -h
```

```
/proc/interrupts 文件中可以看到各个 cpu 上的中断情况。
/proc/irq/#/smp_affinity_list 可以查看指定中断当前绑定的 CPU
获得各网卡中断的当前 cpu 的整体情况
cat /proc/interrupts | grep eth0- | cut -d: -f1 | while read i; do echo -ne irq":$i\t bind_cpu: "; cat /proc/irq/$i/smp_affinity_list; done | sort -n -t' ' -k3
```

Linux Tuning: Virtualization Tuning (irqbalance NIC Interrupt Binding)* 2_Charlie Wang's Blog-CSDN Blog_irqbalance linux

irqbalance_adaptiver's blog-CSDN blog_irqbalance service

irqbalance unbalance interrupt? |The setting of irqbalance has no effect?

https://access.redhat.com/solutions/677073)

1, the kernel is guaranteed to be greater than the version kernel-2.6.32-358.2.1.el6

2. Make sure the version of irqbalance is new enough

3,Root Cause

  1. Previously, the irqbalance daemon did not consider the NUMA node assignment for an IRQ (interrupt request) for the banned CPU set. Consequently, irqbalance set the affinity incorrectly when the IRQBALANCE_BANNED_IRQS variable was set to a single CPU. In addition, IRQs could not be assigned to a node that had no eligible CPUs. Node assignment has been restricted to nodes that have eligible CPUs as defined by the unbanned_cpus bitmask, thus fixing the bug. As a result, irqbalance now sets affinity properly, and IRQs are assigned to the respective nodes correctly. (BZ#1054590, BZ#1054591)

  2. Prior to this update, the dependency of the irqbalance daemon was set incorrectly referring to a wrong kernel version. As a consequence, irqbalance could not balance IRQs on NUMA systems. With this update, the dependency has been fixed, and IRQs are now balanced correctly on NUMA systems. Note that users of irqbalance packages have to update kernel to 2.6.32-358.2.1 or later in order to use the irqbalance daemon in correct manner. (BZ#1055572, BZ#1055574)

  3. Prior to its latest version, irqbalance could not accurately determine the NUMA node it was local to or the device to which an IRQ was sent. The kernel affinity_hint values were created to work around this issue. With this update, irqbalance is now capable of parsing all information about an IRQ provided by the sysfs() function. IRQ balancing now works correctly, and the affinity_hint values are now ignored by default not to distort the irqbalance functionality. (BZ1093441, BZ1093440)

diagnostic steps

  • Interrupts are not balancing:

               CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
     59: 1292013110          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-0
     60:  851840288          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-1
     61:  843207989          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-2
     62:  753317489          0          0          0          0          0   PCI-MSI-edge      eth0-rxtx-3
    
    $ grep eth /proc/interrupts 
     71:    2073421    5816340          ...lots of zeroes...   PCI-MSI-edge      eth11-q0
     72:     294863     114392          ...lots of zeroes...   PCI-MSI-edge      eth11-q1
     73:      63206     234005          ...lots of zeroes...   PCI-MSI-edge      eth11-q2
     74:     238342      72189          ...lots of zeroes...   PCI-MSI-edge      eth11-q3
     79:    1491483        699          ...lots of zeroes...   PCI-MSI-edge      eth9-q0
     80:          1     525546          ...lots of zeroes...   PCI-MSI-edge      eth9-q1
     81:    1524075          5          ...lots of zeroes...   PCI-MSI-edge      eth9-q2
     82:          9    1869645          ...lots of zeroes...   PCI-MSI-edge      eth9-q3
    
  • The irqbalance service is turned on and running:

    $ chkconfig | grep irqb
    irqbalance      0:off   1:off   2:off   3:on    4:on    5:on    6:off
    $ grep irqb ps
    root      1480  0.0  0.0  10948   668 ?        Ss   Oct31   4:27 irqbalance
    
  • There's no additional irqbalance config:

    $ egrep -v "^#" /etc/sysconfig/irqbalance 
    $ grep: /etc/sysconfig/irqbalance: No such file or directory
    
  • Interrupts are allowed to land on other/all CPU cores:

    (If you want irqbalance to assign interrupts to cpus, you need to set affinity to allow interrupts to execute on those cpus)

    $ for i in {59..62}; do echo -n "Interrupt $i is allowed on CPUs "; cat /proc/irq/$i/smp_affinity_list; done
    Interrupt 59 is allowed on CPUs 0-5
    Interrupt 60 is allowed on CPUs 0-5
    Interrupt 61 is allowed on CPUs 0-5
    Interrupt 62 is allowed on CPUs 0-5
    
    $ for i in {71..82}; do echo -n " IRQ $i: "; cat /proc/irq/$i/smp_affinity_list; done
    IRQ 71: 1,3,5,7,9,11,13,15,17,19,21,23
    IRQ 72: 0,2,4,6,8,10,12,14,16,18,20,22
    IRQ 73: 1,3,5,7,9,11,13,15,17,19,21,23
    IRQ 74: 0,2,4,6,8,10,12,14,16,18,20,22
    IRQ 79: 0,2,4,6,8,10,12,14,16,18,20,22
    IRQ 80: 1,3,5,7,9,11,13,15,17,19,21,23
    IRQ 81: 0,2,4,6,8,10,12,14,16,18,20,22
    IRQ 82: 1,3,5,7,9,11,13,15,17,19,21,23
    
  • Processors do not share cache locality, which stops irqbalance from working by design

    (make sure processors don't share cache locations as that will hinder irqbalance from working)

    $ for i in {0..3}; do for j in {0..7}; do echo -n "cpu$j, index $i: "; cat /sys/devices/system/cpu/cpu$j/cache/index$i/shared_cpu_map; done; done
    cpu0, index 0: 00000001
    cpu1, index 0: 00000002
    cpu2, index 0: 00000004
    cpu3, index 0: 00000008
    cpu4, index 0: 00000010
    cpu5, index 0: 00000020
    cpu6, index 0: 00000040
    cpu7, index 0: 00000080
    cpu0, index 1: 00000001
    cpu1, index 1: 00000002
    cpu2, index 1: 00000004
    cpu3, index 1: 00000008
    cpu4, index 1: 00000010
    cpu5, index 1: 00000020
    cpu6, index 1: 00000040
    cpu7, index 1: 00000080
    cpu0, index 2: 00000001
    cpu1, index 2: 00000002
    cpu2, index 2: 00000004
    cpu3, index 2: 00000008
    cpu4, index 2: 00000010
    cpu5, index 2: 00000020
    cpu6, index 2: 00000040
    cpu7, index 2: 00000080
    cpu0, index 3: 00000001
    cpu1, index 3: 00000002
    cpu2, index 3: 00000004
    cpu3, index 3: 00000008
    cpu4, index 3: 00000010
    cpu5, index 3: 00000020
    cpu6, index 3: 00000040
    cpu7, index 3: 00000080
    
  • Top users of CPU & MEM

    Raw

    USER    %CPU    %MEM   RSS 
    oracle  204.9%  12.2%  5.34 GiB
    
  • Oracle instance is uninteruptible sleep & Defunct processes:

    Raw

    USER      PID    %CPU  %MEM  VSZ-MiB  RSS-MiB  TTY    STAT  START  TIME    COMMAND  
    oracle    17631  10.5  0.2   260      57       ?      Ds    22:10  3:17    ora_j000_INDWS
    
  • The irqbalance service reports "Resource temporarily unavailable" error in lsof -b:

    Raw

    lsof | grep -i irqbalance
    lsof: avoiding stat(/usr/sbin/irqbalance): -b was specified.
    irqbalanc  1480        0  txt   unknown                           /usr/sbin/irqbalance (stat: Resource temporarily unavailable)
    irqbalanc  1480        0  mem       REG    8,2             423023 /usr/sbin/irqbalance (stat: Resource temporarily unavailable)
    

however -b or the presence of an NFS mount prevent lsof from running stat() and instead runs statsafely(), so this message is expected:

Raw

statsafely(path, buf)
        char *path;                     /* file path */
        struct stat *buf;               /* stat buffer address */
{
        if (Fblock) {
            if (!Fwarn)
                (void) fprintf(stderr,
                    "%s: avoiding stat(%s): -b was specified.\n",
                    Pn, path);
            errno = EWOULDBLOCK;
            return(1);
        }
        return(doinchild(dostat, path, (char *)buf, sizeof(struct stat)));
}

when the -b option is specified, Fblock gets TrueEWOULDBLOCK (=EAGAIN) is returned.

Interrupt Description

Since the hard interrupt processing program cannot be interrupted, if it takes too long to execute, the CPU will not be able to respond to other hardware interrupts, so the kernel introduces soft interrupts to move the time-consuming part of the hard interrupt processing function to the soft interrupt processing function. .

The ksof ti rqd process in the kernel is specially responsible for the processing of soft interrupts.

When it receives the soft interrupt, it will call the processing function corresponding to the corresponding soft interrupt: for the soft interrupt thrown by the network card driver module, ksoftirqd will call the net_rx_action function of the network module.

From < The soft interrupt KSOFTIRQD in the LINUX kernel network - Embedded Technology - Electronics Enthusiasts Network >

Data packet--->network card--DMA-->memory-hardware interrupt-->ksof ti rqd (soft interrupt ksoftirqd. The soft interrupt system can be imagined as a series of kernel threads, one for each CPU)

Softirq actually runs as a kernel thread, each CPU corresponds to a softirq kernel thread, and this softirq kernel thread is called ksoftirqd/CPU number.

$ ps aux | grep softirq

root         7  0.0  0.0      0     0 ?        S    Oct10   0:01 [ksoftirqd/0]

root        16  0.0  0.0      0     0 ?        S    Oct10   0:01 [ksoftirqd/1]

The names enclosed in square brackets are generally kernel threads.

Through the top command, you will notice the kernel thread ksoftirqd/0, which means that the softirq thread runs on CPU 0, as shown in the figure below.

 When viewing the CPU utilization, the si field corresponds to the softirq soft interrupt

RSS (Receive Side Scaling) is a hardware feature of the network card, which implements multiple queues and can distribute different interrupt times to different CPUs.

A single-core CPU cannot fully meet the needs of network cards. With the support of multi-queue network card drivers, each queue is bound to different cores through interrupts to meet the needs of network cards and reduce the load on CPU0.

Check whether the network card supports queues and how many queues it supports

awk '$NF~/ens3f0np0/{print $1,$NF}' /proc/interrupts

awk '$NF~/ens3/{print $1,$NF}' /proc/interrupts

37 : eth0-TxRx-0

38: eth0-TxRx-1

39: eth0-TxRx-2

40: eth0-TxRx-3

41: eth0-TxRx-4

42: eth0-TxRx-5

43: eth0-TxRx-6

44: eth0-TxRx-7

The above network card eth0 supports queues as an example, and 37-44 are interrupt numbers. Generally, mainstream network card drivers now support multiple queues, and the default is 7 queues.

Check which CPU handles each interrupt number

break binding

The network card supports the queue, which means that we can bind the relationship between the interrupt number and the processing CPU,

By default, the Linux system uses the irqbalance service to optimize interrupt allocation. It can automatically collect data and schedule interrupt requests, but its allocation and scheduling mechanism is extremely uneven. It is not recommended to enable it. In order to understand interrupt binding, we turn off the irqbalance service and manually adjust the binding. determine the relationship.

## Related configuration files:

The CPU affinity configuration for interrupt IRQ_ID has two equivalent configuration files

/proc/irq/interrupt number/smp_affinity, hexadecimal

/proc/irq/interrupt number/smp_affinity_list, decimal,

smp_affinity and smp_affinity_list are equivalent, just modify one of them (the other one will change accordingly), smp_affinity_list modification is more convenient

*) Regarding smp_affinity, it is a hexadecimal bitmask separated by commas. The CPU mask has a maximum of 64 bits. If the number of cores exceeds 32, two 32-bit masks can be used separated by commas, such as: 00000001,0000ff00 )

For example, 0000,00000020 means that the irq is assigned to CPU5. (0x20=00100000, the fifth cpu)

echo 0x0004 > /proc/irq /50/smp_affinity

/proc/softirqs provides the running status of softirqs

[root@node33 ~]# cat /proc/softirqs|awk '{print $1,$2,$95}'|column -t

                        CPU0 CPU1 CPU94

HI:                    0          0

TIMER:           2109 14097  72722

NET_TX:        1          168

NET_RX:      14         25408

BLOCK:         0          0

IRQ_POLL:   0          0

TASKLET: 10 4

SCHED:       0          0

HRTIMER:   0          0

RCU:       148145894  25453537

Note the following two points:

First, pay attention to the type of soft interrupt, the first column

Second, pay attention to the distribution of the same soft interrupt on different CPUs, that is, the content of the same line

The first column is the interrupt ID number, the CPU N column is the number of interrupt responses on the nth CPU core, the penultimate column is the interrupt type, and the last column is the description.

Use the echo command to write the CPU mask into the /proc/irq/interrupt ID/smp_affinity file to modify the CPU affinity of an interrupt. For example

echo 0x0004 > /proc/irq /50/smp_affinity

2.2 Configure irqblance to avoid interruption

The interrupt balance daemon (irqbalance daemon) periodically distributes interrupts equally and fairly to each CPU core, which is enabled by default. One way we can directly turn off irqblance, interrupts will not be automatically assigned to each core; the other way is to customize the interrupt balance strategy, which can exclude delay-sensitive cores from the strategy, that is, not receive interrupts, or set interrupts The affinity of the interrupt will not be allocated to the delay-sensitive core, which will be introduced one by one below.

close irqblance

Closing this daemon process will allow all interrupt responses to be handled by the CPU0 core.

View the running status of the daemon process: systemctl status irqbalance

Close the daemon: systemctl stop irqbalance

Cancel the process so that it will not restart after booting: systemctl disable irqbalance

Specify the CPU to leave irqblance

We can remove the specified CPU cores from the list of interrupt balance daemons by modifying the /etc/sysconfig/irqbalance configuration file, that is, the daemon will no longer assign interrupts to these CPU cores.

Open the /etc/sysconfig/irqbalance file, find the "#IRQBALANCE_BANNED_CPUS=" position, cancel the comment, and then fill in the CPU hexadecimal mask after the equal sign, for example

IRQBALANCE_BANNED_CPUS=0000ff00

The mask here can have up to 64 bits. If the system has more than 32 cores, you can add two 32-bit masks after the equal sign and separate them with commas, for example

IRQBALANCE_BANNED_CPUS=00000001,0000ff00

It is to isolate the 9 cores 8~15 and 33.

What are the interruptions in the system

View hard interrupts

cat /proc/interrupts

1

View softirq

cat /proc/softirqs

1

Taking the author as an example, you can see that the number of interrupts corresponding to soft interrupts in the author’s single-core system is as follows. The corresponding interrupt types in the first column are respectively

NET_RX indicates that the network reception is interrupted

NET_TX indicates that the network transmission is interrupted

TIMER means timer interrupt

RCU means RCU lock interrupt

SCHED means kernel scheduling interrupt

Of course, we can also take a look at the real-time change rate of the soft interrupt

watch -d cat /proc/softirqs

We all know that the final processing of soft interrupts is handed over to the kernel thread for processing. We might as well type the following command to view

ps aux |grep softirq

As shown in the figure below, since the author’s server is single-core, there is only one kernel thread, and its name is also very simple [ksoftirq/cpu number]

Original link: https://blog.csdn.net/shark_chili3007/article/details/114441820

Interrupt Statistical Method

#define common variables

INTERFACE=ens3f1np1

PCI_ADDR=$(ethtool -i ${INTERFACE}|grep "bus-info"|awk '{print $2}')

SIZE=1M_all_cpu

RW=write

#1 Record the number of interruptions before and after io

#forward:

cat /proc/interrupts|grep -E "$PCI_ADDR"|awk '{print $1,$(31+2),$(63+2),$(95+2),$(127+2)}'|column -t > cpu31-63-95-127irq-${SIZE}-${RW}-0.txt

#back:

cat /proc/interrupts|grep -E "$PCI_ADDR"|awk '{print $1,$(31+2),$(63+2),$(95+2),$(127+2)}'|column -t > cpu31-63-95-127irq-${SIZE}-${RW}-1.txt

 

# * Explanation: PCI_ADDR gets the pci address of the network card. awk '{print $1,$(31 +2 ),$(63 +2 ),$(95 +2 ),$(127 +2 )}' means to print the first column (interrupt number), CPU31, CPU63, CPU95 , The interrupt of CPU127 is listed.

# 2 Calculations

paste cpu31- 63 - 95 - 127 irq- ${SIZE} - ${RW} - 0 .txt cpu31- 63 - 95 - 127 irq- ${SIZE} - ${RW} - 1 .txt > uinte.txt # Merge the front and back documents horizontally

cat uinte.txt|awk '{print $1 , $7 - $2 , $8 - $3 , $9 - $4 , $10 - $5 }' |column -t|tr ':' ' ' > cpu31- 63 - 95 -127irq- ${ SIZE} - ${RW} -result.txt  #Column subtraction (the latter statistics minus the previous statistics)

#3 Filter Results

#Take out the statistics of interrupts on 31 cpus

cat cpu31-63-95-127irq-${SIZE}-${RW}-result.txt |awk '$2 > 0 {print $0 }'

other:

Remove the colon cat cpu31-63-95-127irq-1M-write-result.txt|tr ':' ' ' > uinte.txt

Open the result file with sumlime , press and hold the middle mouse button, you can select by column and paste it to excel

View the number of CPU interrupts

View the number of interrupts for each core CPU of a multi-core CPU

# mpstat -I SUM -P ALL 1 3

Linux 5.4.0-40-generic (verify-new-511kernel)     08/28/2021     _x86_64_    (72 CPU)

09:09:30 AM  CPU    intr/s

09:09:31 AM  all  18762.00

09:09:31 AM    0    253.00

09:09:31 AM    1    256.00

09:09:31 AM    2    253.00

09:09:31 AM    3    253.00

09:09:31 AM    4    254.00

09:09:31 AM    5    260.00

illustrate:

pstat  [-I {SUM| CPU | SCPU}][-P {|ALL}] [internal [count]]

parameter explanation

-I view interrupts {SUM means summarizing the number of interrupts on each CPU | SCPU the number of individual software interrupts}

-P {|ALL} indicates which CPU to monitor, and the value of cpu is in [0,cpu number-1]

Internal The interval between two adjacent samples,

count The number of sampling times, count can only be used with delay

When the interrupts are concentrated on a certain CPU, the CPU will be fully loaded. It is recommended to distribute the interrupts to each CPU equally.

-----------------------------------

CPU interrupt number view and network card interrupt binding core

https://blog.51cto.com/u_15080020/4188117

test draft

INTERFACE=ens3f1np1

PCI_ADDR=$(ethtool -i ${INTERFACE}|grep "bus-info"|awk '{print $2}')

SIZE=1M_all_cpu

RW=write

Record the number of interruptions before and after 1 io

#forward:

cat /proc/interrupts|grep -E "$PCI_ADDR"| awk '{for(i=65; i<=NF; i++){ $i="" }; print $0 }'| column -t > cpu-irq-${SIZE}-${RW}-0.txt

#back:

cat /proc/interrupts|grep -E "$PCI_ADDR"| awk '{for(i=65; i<=NF; i++){ $i="" }; print $0 }'| column -t > cpu-irq-${SIZE}-${RW}-1.txt

2 calculations

paste cpu-irq-${SIZE}-${RW}-0.txt cpu-irq-${SIZE}-${RW}-1.txt > uinte.txt #Merge the two documents horizontally

cat uinte.txt|awk '{print $1,$66-$2,$67-$3,$68-$4,$69-$5,$70-$6,$71-$7,$72-$8,$73-$9,$74-$10,$75-$11,$76-$12,$77-$13,$78-$14,$79-$15,$80-$16,$81-$17,$82-$18,$83-$19,$84-$20,$85-$21,$86-$22,$87-$23,$88-$24,$89-$25,$90-$26,$91-$27,$92-$28,$93-$29,$94-$30,$95-$31,$96-$32,$97-$33,$98-$34,$99-$35,$100-$36,$101-$37,$102-$38,$103-$39,$104-$40,$105-$41,$106-$42,$107-$43,$108-$44,$109-$45,$110-$46,$111-$47,$112-$48,$113-$49,$114-$50,$115-$51,$116-$52,$117-$53,$118-$54,$119-$55,$120-$56,$121-$57,$122-$58,$123-$59,$124-$60,$125-$61,$126-$62,$127-$63,$128-$64}' |column -t|tr ':' ' ' > cpu-irq-${SIZE}-${RW}-result.txt  #列相减(后面的统计减去前面的统计)

 cat cpu-irq-1M_all_cpu-write-result.txt|awk '{ for(i=36;i<=96;i++){$i=""} {print $0}}'|column -t

cat cpu-irq-1M_all_cpu-write-result.txt|awk '{ for(i=1;i<=35;i++){$i=""} {print $0}}'|column -t

cat cpu-irq-1M_all_cpu-write-result.txt|awk '{ for(i=1; i<=2; i++){ $i="" }; for(i=7; i<=NF;i++){ $i="" }; print $0 }'|column -t

65+2 -

LCXSTR="cat uinte.txt|awk '{print \$1";

for i in {2..64};do after=$(($i+64));LCXSTR=${LCXSTR}",\$${after}-\$$i";done

$LCXSTR=$LCXSTR"}'"

echo $LCXSTR

LCXSTR="";

for i in {1..96};do LCXSTR=$LCXSTR",${i}";done

echo $LCXSTR

Test - Break View, Bind and Set

==================================================== =================================irqbalance setting

View the running status of the daemon process: systemctl status irqbalance

Close the daemon: systemctl stop irqbalance

Cancel the process so that it will not restart after booting: systemctl disable irqbalance

==================================================== ================================== Network card parameter setting

View and set the number of queues used by the network card

Only for receiving

 ethtool -l ens3f1np1 #View the number of network card queues

 ethtool -L ens3f1np1 #Set the number of network card queues

 ethtool -L ens3f1np1  rx 8

 ethtool -L ens3f1np1  tx 8

 ethtool -L ens3f1np1  combined  8

There are several types of queues: RX, TX, Combined, etc. Some network cards only support Combined queues (sending and receiving are common, this is called a combined queue)

*Lowercase l for viewing, uppercase L for setting

[root@node32 2023-3-14-test]#  ethtool -L ens3f1np1  combined  8

[root@node32 2023-3-14-test]# ethtool -l ens3f1np1

Channel parameters for ens3f1np1:

Pre-set maximums:

RX:             n/a

TX:             n/a

Other:          n/a

Combined:       63

Current hardware settings:

RX:             n/a

TX:             n/a

Other:          n/a

Combined:       8

[root@node32 2023-3-14-test]#

View and set the queue length of the network card

 ethtool -g ens3f1np1 #View the length of the network card queue, Pre-set maximums: the maximum supported value, Current hardware settings: the value currently set

 ethtool -G ens3f1np1 rx 4096 #Set the length of the network card queue

*Lowercase l for viewing, uppercase L for setting

Notice:

①: Not all network cards support viewing and modifying the network card queue length through ethtool.

②: This operation will also shut down and start the network card, so the connection related to this network card will also be interrupted.

View and set the RSS hash of the network card (hash mapping of packets to queues) / set the weight

Check:

ethtool -x ens3f1np1 

set weight

Set all interrupts to be evenly distributed among the first N RX queues

 ethtool -X ethx  equal  N

Set custom weights with ethtool -X

 sudo ethtool -X eth0 weight  6  2    

*The above commands give different weights to rx queue 0 and rx queue 1: 6 and 2, so queue 0 receives more. Note that the queue is generally bound to the CPU, so this also means that the corresponding CPU will spend more time slices on receiving packets.

 sudo ethtool -X ens1f0np0 weight 8 4 2 1 #Set the weight of the first 4 queues to 8 4 2 1

 sudo ethtool -X ens1f0np0 weight 2 1 2 2 3 3 2 4 #Set the weight for the first 8 queues 2 1 2 2 3 3 2 4

set hash field

 Adjust RX hash field for network flows

Check

1). rss hash function

Get tcp4 hash method to enable 5-tuple hash by default

#ethtool -n ens3f1np1 rx-flow-hash tcp4

TCP over IPV4 flows use these fields for computing Hash flow key:

IP SA

IP YES

L4 bytes 0 & 1 [TCP/UDP src port] #The 0th and 1st bytes of the L4 layer -- both uint16 src port

L4 bytes 2 & 3 [TCP/UDP dst port] #The second and third bytes of the L4 layer -- both uint16 dst port

set up

ethtool -N ens3f1np1 rx-flow-hash tcp4 sdfn

s Hash on Src address

d Hash on Dst address

f Hash on bytes 0 and 1 of the Layer 4 header of the rx packet.

n Hash on bytes 2 and 3 of the Layer 4 header of the rx packet.

Set upd4 based on 5-tuple hash, the default is off

# ethtool -N eth2 rx-flow-hash udp4 sdfn

==================================================== ================================== Interrupt Queue View

==== View the interrupt and queue of a network card

INTERFACE=ens3f1np1

PCI_ADDR=$(ethtool -i ${INTERFACE}|grep "bus-info"|awk '{print $2}')

cat /proc/interrupts|grep -E "${INTERFACE}|$PCI_ADDR"|awk '{print $1,$NF}'

[root@node33 ~]#

1335: ens3f0np0-0

1336: ens3f0np0-1

1337: ens3f0np0-2

1338: ens3f0np0-3

11383: ens3f0np0-48

……

1393: ens3f0np0-58

1394: ens3f0np0-59

1395: ens3f0np0-60

1396: ens3f0np0-61

1397: ens3f0np0-62

==== View interrupts and queues of all network cards

#Check the CPU binding status of the network port interruption (the network port in the interrupts file is displayed as the PCI address) lcx

for if in $(ifconfig|grep -E "eno|ens|enp|bond" -A 1|grep -w "inet" -B 1|awk -F ":" '{print $1}'|grep -vE "\--|inet"); \

do \

echo "eth:${if}=============================";\

PCI_ADDR=$(ethtool -i ${if}|grep "bus-info"|awk '{print $2}');\

cat /proc/interrupts | grep -E "${if}|${PCI_ADDR}" | cut -d: -f1 | while read i; do echo -ne irq":$i\t bind_cpu: "; \

cat /proc/irq/$i/smp_affinity_list; done | sort -n -t' ' -k3;\

done

==== View the interrupt number of all network cards - queue - bound CPU

**Copy the following, paste it into the window, generate the getEthIrqBind.sh script, execute the script + network card name: getEthIrqBind.sh ens3f1np1

cat > getEthIrqBind.sh << "EOF"

INTERFACE=$1

if [[ "$1" == "" ]];then

echo "please input interface.

example:./$(basename $0) eth0,

sort by col 1:./$(basename $0) eth0 1

sort by col 2:./$(basename $0) eth0 2"

exit 1

fi

if [[ "$2" != "" ]];then

sort_col=$2

else

sort_col=1

fi

oldIFS=$IFS;IFS=$'\n'

#Possible interruption file may be displayed by network card name or by PCI:

PCI_ADDR=$(ethtool -i ${INTERFACE}|grep "bus-info"|awk '{print $2}')

IRQ_LIST=($(cat /proc/interrupts|grep -E "${INTERFACE}|${PCI_ADDR}"|awk '{print $1,$NF}'))

echo "irq:      queue:     bind_cpu:"

for line in ${IRQ_LIST[@]}

do

  irq=$(echo "$line"|awk -F":" '{print $1}' )

  queue=$(echo "$line"|awk -F":" '{print $2}' )

  

  queue=$(echo ${queue}|awk -F "@" '{print $1}')

  echo "$irq     $queue     $(cat /proc/irq/$irq/smp_affinity_list)"

done| sort -n -k${sort_col}|column -t;

IFS=$oldIFS

EOF

chmod +x  getEthIrqBind.sh

 ./getEthIrqBind.sh ens3f1np1

==================================================== ================================== Break Binding

#Evenly bind the interrupt number of the specified network port ETH to the CPU specified by CPU_LIST lcx

cat > ./balance.sh << 'EOF'

if [[ "$1" == "" ]];then

echo "please input interface.

excample:./$(basename $0) eth0"

exit 1

fi

ETH=$1

PCI_ADDR=$(ethtool -i ${ETH}|grep "bus-info"|awk '{print $2}')

IRQ_LIST=($(cat /proc/interrupts|grep -E "${ETH}|${PCI_ADDR}"|awk '{print $1,$NF}'|awk -F ':' '{print $1}'))

fi

if [[ ${#IRQ_LIST[@]} -eq 0 ]] ;then

echo "I can't find irq number list."

fi

CPU_LIST=(40 41 62 63)   #<-----------------------------------CPU_LIST

index=0

cpu_num=${#CPU_LIST[@]}

for it in ${IRQ_LIST[@]}

do

((index++))

cpu_list_index=$((${index}%${cpu_num}))

#echo "irq:$it --bind-to--> cpu:${CPU_LIST[${cpu_list_index}]}" #show  result,but  not really set

echo ${CPU_LIST[${cpu_list_index}]}  >  /proc/irq/${it}/smp_affinity_list

done

EOF

chmod +x ./balance.sh

After copying and pasting the generated script, execute ./balance.sh network port name such as: ./balance.sh eth0

==================================================== ================================== Data Analysis

============64 CPU interrupt data interception and statistics

INTERFACE=ens3f1np1

PCI_ADDR=$(ethtool -i ${INTERFACE}|grep "bus-info"|awk '{print $2}')

SIZE=1M_4_cpu

RW=write-47

Record the number of interruptions before and after 1 io

#forward:

cat /proc/interrupts|grep -E "$PCI_ADDR"| awk '{for(i=65; i<=NF; i++){ $i="" }; print $0 }'| column -t > cpu-irq-${SIZE}-${RW}-0.txt

#back:

cat /proc/interrupts|grep -E "$PCI_ADDR"| awk '{for(i=65; i<=NF; i++){ $i="" }; print $0 }'| column -t > cpu-irq-${SIZE}-${RW}-1.txt

*Explanation: awk '{for(i=65; i<=NF; i++){ $i="" }; print $0 }' The columns after 65 columns are set to empty, that is, only the columns before 65 columns are taken.

2 calculations

paste cpu-irq-${SIZE}-${RW}-0.txt cpu-irq-${SIZE}-${RW}-1.txt > uinte.txt #Merge the two documents horizontally

cat uinte.txt|awk '{print $1,$66-$2,$67-$3,$68-$4,$69-$5,$70-$6,$71-$7,$72-$8,$73-$9,$74-$10,$75-$11 ,$76-$12,$77-$13,$78-$14,$79-$15,$80-$16,$81-$17,$82-$18,$83-$19,$84-$20,$85-$21,$86-$22,$87-$23,$88 -$24,$89-$25,$90-$26,$91-$27,$92-$28,$93-$29,$94-$30,$95-$31,$96-$32,$97-$33,$98-$34,$99-$35,$100-$36 ,$101-$37,$102-$38,$103-$39,$104-$40,$105-$41,$106-$42,$107-$43,$108-$44,$109-$45,$110-$46,$111-$47,$112-$48,$113 -$49,$114-$50,$115-$51,$116-$52,$117-$53,$118-$54,$119-$55,$120-$56,$121-$57,$122-$58,$123-$59,$124-$60,$125-$61 ,$126-$62,$127-$63,$128-$64}' |column -t|tr ':' ' ' > cpu-irq-${SIZE}-${RW}-result.txt #Column subtraction (behind stat minus previous stat)

Check:

#Display the first 35 columns

 cat cpu-irq-1M_all_cpu-write-result.txt|awk '{ for(i=34;i<=95;i++){$i=""} {print $0}}'|column -t

#After displaying 35 columns

 cat cpu-irq-1M_all_cpu-write-result.txt|awk '{ for(i=1;i<=33;i++){$i=""} {print $0}}'|column -t

# display 3 to 6 columns

cat cpu-irq-1M_all_cpu-write-result.txt|awk '{ for(i=1; i<=2; i++){ $i="" }; for(i=7; i<=NF;i++){ $i="" }; print $0 }'|column -t

============4 CPU interrupt data interception and statistics

INTERFACE=ens3f1np1

#PCI_ADDR=$(ethtool -i ${INTERFACE}|grep "bus-info"|awk '{print $2}')

PCI_ADDR=ens3f1np1

SIZE=1M_4_cpu

RW=write-54-8K-hash-sdnf

#forward:

cat /proc/interrupts|grep -E "$PCI_ADDR"|awk '{print $1,$(40+2),$(41+2),$(62+2),$(63+2)}'|column -t > cpu40-41-62-63irq-${SIZE}-${RW}-0.txt

#back:

cat /proc/interrupts|grep -E "$PCI_ADDR"|awk '{print $1,$(40+2),$(41+2),$(62+2),$(63+2)}'|column -t > cpu40-41-62-63irq-${SIZE}-${RW}-1.txt

#* interpret lcx:

#PCI_ADDR Get the pci address of the network card. awk '{print $1,$(31+2),$(63+2),$(95+2),$(127+2)}' means to print the first column (interrupt number), CPU31, CPU63, CPU95 , The interrupt system of CPU127 (CPU starts from 0, so CPU31 is $(31+2).

#2 Calculations

paste cpu40-41-62-63irq-${SIZE}-${RW}-0.txt cpu40-41-62-63irq-${SIZE}-${RW}-1.txt > uinte.txt #put before and after Merge two documents horizontally lcx

cat uinte.txt|awk '{print $1,$7-$2,$8-$3,$9-$4,$10-$5}' |column -t|tr ':' ' ' > cpu40-41-62-63irq-${ SIZE}-${RW}-result.txt #Column subtraction (the latter statistics minus the previous statistics lcx)

cat cpu40-41-62-63irq-${SIZE}-${RW}-result.txt 

==================================================== ================================= CPU statistics

mpstat -P ALL 2 30 View all CPU usage, collect once every 2 seconds, collect 30 times

 

Guess you like

Origin blog.csdn.net/bandaoyu/article/details/129234911
Recommended