QPI: Rx detected CRC error - successful LLR without Phy re-init

五一前的事情,监控突然报一台数据库主机重启了,登陆到系统时已重启完成,查看message日志,报错如下

Apr 28 13:56:33 hydb1 kernel: mce: [Hardware Error]: Machine check events logged
Apr 28 13:56:33 hydb1 kernel: mce: [Hardware Error]: Machine check events logged
Apr 28 13:56:33 hydb1 mcelog: Hardware event. This is not a software error.
Apr 28 13:56:33 hydb1 mcelog: MCE 0
Apr 28 13:56:33 hydb1 mcelog: CPU 20 BANK 21
Apr 28 13:56:33 hydb1 mcelog: MISC 1df87b000d9eff
Apr 28 13:56:33 hydb1 mcelog: TIME 1682661382 Fri Apr 28 13:56:22 2023
Apr 28 13:56:33 hydb1 mcelog: MCG status:
Apr 28 13:56:33 hydb1 mcelog: MCi status:
Apr 28 13:56:33 hydb1 mcelog: Error overflow
Apr 28 13:56:33 hydb1 mcelog: Corrected error
Apr 28 13:56:33 hydb1 mcelog: MCi_MISC register valid
Apr 28 13:56:33 hydb1 mcelog: MCA: BUS error: 2 20 Level-3 Generic Generic Other-transaction Request-did-not-timeout
Apr 28 13:56:33 hydb1 mcelog: QPI: Rx detected CRC error - successful LLR without Phy re-init
Apr 28 13:56:33 hydb1 mcelog: STATUS c80001c000310e0f MCGSTATUS 0
Apr 28 13:56:33 hydb1 mcelog: MCGCAP 7000c16 APICID 40 SOCKETID 2
Apr 28 13:56:33 hydb1 mcelog: CPUID Vendor Intel Family 6 Model 79
Apr 28 13:56:33 hydb1 mcelog: Hardware event. This is not a software error.
Apr 28 13:56:33 hydb1 mcelog: MCE 1
Apr 28 13:56:33 hydb1 mcelog: CPU 20 BANK 21
Apr 28 13:56:33 hydb1 mcelog: MISC 1df87b000d9eff
Apr 28 13:56:33 hydb1 mcelog: TIME 1682661382 Fri Apr 28 13:56:22 2023
Apr 28 13:56:33 hydb1 mcelog: MCG status:
Apr 28 13:56:33 hydb1 mcelog: MCi status:
Apr 28 13:56:33 hydb1 mcelog: Error overflow
Apr 28 13:56:33 hydb1 mcelog: Corrected error
Apr 28 13:56:33 hydb1 mcelog: MCi_MISC register valid
Apr 28 13:56:33 hydb1 mcelog: MCA: BUS error: 2 20 Level-3 Generic Generic Other-transaction Request-did-not-timeout
Apr 28 13:56:33 hydb1 mcelog: QPI: Rx detected CRC error - successful LLR without Phy re-init
Apr 28 13:56:33 hydb1 mcelog: STATUS c800008000310e0f MCGSTATUS 0
Apr 28 13:56:33 hydb1 mcelog: MCGCAP 7000c16 APICID 40 SOCKETID 2
Apr 28 13:56:33 hydb1 mcelog: CPUID Vendor Intel Family 6 Model 79
Apr 28 13:56:33 hydb1 mcelog: Hardware event. This is not a software error.
Apr 28 13:56:33 hydb1 mcelog: MCE 2
Apr 28 13:56:33 hydb1 mcelog: CPU 20 BANK 21
Apr 28 13:56:33 hydb1 mcelog: MISC 1ff87b000d9eff
Apr 28 13:56:33 hydb1 mcelog: TIME 1682661382 Fri Apr 28 13:56:22 2023

因硬件还在维保期,立刻打原厂400处理,节后才给回复,需要升级固件,安排停机时间做固件升级,升级后观察4天了,当前运行正常

猜你喜欢

转载自blog.csdn.net/kevinyu998/article/details/130631895
PHY
CRC