ksoftirqd/n 占用cpu 100%

1.背景

当来自设备的中断时,操作系统会暂停它正在执行的操作并开始寻址该中断。

在某些情况下,IRQ一个接一个地非常快,操作系统无法在另一个到达之前完成一个服务。当高速网卡在短时间内收到大量数据包时,就会发生这种情况。

因为操作系统在到达时无法处理IRQ(因为它们一个接一个地到达得太快),操作系统会将它们排队等待稍后由名为ksoftirqd的特殊内部进程处理。

如果ksoftirqd占用的CPU时间超过一小部分,则表示机器处于严重的中断负载下。

2.解决方法

可以用cat /proc/interrupts来查看设备造成的中断情况

例如某天我的是这样

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:         44          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          1          1          0          0          0          0          1   IO-APIC   1-edge      i8042
  8:          1          0          0          0          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 12:          0          0          0          0          2          1          1          0   IO-APIC  12-edge      i8042
 14:         16      35619    2171878      33951    2172032      35710    2172366      35859   IO-APIC  14-edge      ata_piix
 15:          0          0          0          0          0          0          0          0   IO-APIC  15-edge      ata_piix
 16:       1492       1492       1492   38096398       1489       1492       1496       1491   IO-APIC  16-fasteoi   ioc0
 19:         38         38         37         38         37         39         38         38   IO-APIC  19-fasteoi   radeon
 20:          3          3          4          4          5          3          4          4   IO-APIC  20-fasteoi   uhci_hcd:usb3, uhci_hcd:usb5
 21:          4          3          4          3          5          4          4          4   IO-APIC  21-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb4
 24:          0          0          0          0          0          0          0          0   PCI-MSI 32768-edge      PCIe PME
 25:          0          0          0          0          0          0          0          0   PCI-MSI 49152-edge      PCIe PME
 26:          0          0          0          0          0          0          0          0   PCI-MSI 65536-edge      PCIe PME
 27:          0          0          0          0          0          0          0          0   PCI-MSI 81920-edge      PCIe PME
 28:          0          0          0          0          0          0          0          0   PCI-MSI 98304-edge      PCIe PME
 29:          0          0          0          0          0          0          0          0   PCI-MSI 114688-edge      PCIe PME
 30:          0          0          0          0          0          0          0          0   PCI-MSI 458752-edge      PCIe PME
 31:       1380       1346       1353       1339       1339       1357       1329       1360   PCI-MSI 3670016-edge      eno2
 32:   85511967   85513711   85083641   83994544   85084225   85501922   85093691   85515746   PCI-MSI 1572864-edge      eno1
NMI:     234694     241328     233682     235747     227371     232679     239850     230941   Non-maskable interrupts
LOC:  809615451  844980307  827224481  872760389  798336595  856256586  824716208  858800687   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:     234694     241328     233682     235747     227371     232679     239850     230941   Performance monitoring interrupts
IWI:     234691     241323     233680     235741     227368     232677     239846     230941   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:   50636771   36959166   32328971   32915351   29242823   29889193   29928230   31502838   Rescheduling interrupts
CAL:    4443659    7479033    5592654    1507128    4465404    8053047    5214542    1501877   Function call interrupts
TLB:    6476601    6460251    6517362    6225272    6466684    6345076    6491077    6193616   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:      22724      22724      22724      22724      22724      22724      22724      22724   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event

注意到序号32那爆炸的中断数量,可以断定是网卡瘫痪了。只需要ifdown eno1 && ifup eno1重启网卡即可。

发布了48 篇原创文章 · 获赞 4 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/m0_37313888/article/details/101000640
今日推荐