故障案例-ESXI6.7 EP13 紫屏分析

产品版本信息。
Huawei RH2288H V3 | BIOS: 3.87 | Date (ISO-8601): 2018-02-02
VMware ESXi 6.5.0 build-5969303
ESXi 6.5 U1 ESXi 6.5 U1 7/27/2017 5969303 N/A

下面是紫萍发生时的stacktrace,显示LINT1/NMI 导致的紫萍,应该是硬件问题。
2020-07-22T19:47:32.067Z cpu0:66825)@BlueScreen: LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed. This may be a hardware problem; please contact your hardware vendor.
2020-07-22T19:47:32.068Z cpu0:66825)Code start: 0x41802ca00000 VMK uptime: 127:07:45:14.433
2020-07-22T19:47:32.068Z cpu0:66825)0x4380c0002c60:[0x41802caed451]PanicvPanicInt@vmkernel#nover+0x545 stack: 0x41802caed451
2020-07-22T19:47:32.068Z cpu0:66825)0x4380c0002d00:[0x41802caed4dd]Panic_NoSave@vmkernel#nover+0x4d stack: 0x4380c0002d60
2020-07-22T19:47:32.068Z cpu0:66825)0x4380c0002d60:[0x41802caea7ae]NMICheckLint1@vmkernel#nover+0x19a stack: 0x0
2020-07-22T19:47:32.069Z cpu0:66825)0x4380c0002e20:[0x41802caea844]NMI_Interrupt@vmkernel#nover+0x94 stack: 0x0
2020-07-22T19:47:32.069Z cpu0:66825)0x4380c0002ea0:[0x41802cb2c531]IDTNMIWork@vmkernel#nover+0x99 stack: 0x0
2020-07-22T19:47:32.069Z cpu0:66825)0x4380c0002f20:[0x41802cb2d9c1]Int2_NMI@vmkernel#nover+0x19 stack: 0x418040000000
2020-07-22T19:47:32.069Z cpu0:66825)0x4380c0002f40:[0x41802cb3d044]gate_entry_@vmkernel#nover+0x0 stack: 0x0
2020-07-22T19:47:32.070Z cpu0:66825)0x43916849bcf0:[0x41802ca8b9c2]Power_ArchSetCState@vmkernel#nover+0x106 stack: 0x7fffffffffffffff
2020-07-22T19:47:32.070Z cpu0:66825)0x43916849bd20:[0x41802ccc49d3]CpuSchedIdleLoopInt@vmkernel#nover+0x39b stack: 0x1
2020-07-22T19:47:32.070Z cpu0:66825)0x43916849bd90:[0x41802ccc728a]CpuSchedDispatch@vmkernel#nover+0x114a stack: 0x410000000001
2020-07-22T19:47:32.071Z cpu0:66825)0x43916849bec0:[0x41802ccc8502]CpuSchedWait@vmkernel#nover+0x27a stack: 0x100000000000000
2020-07-22T19:47:32.071Z cpu0:66825)0x43916849bf40:[0x41802ccc85d5]CpuSched_NoEvqWait@vmkernel#nover+0x19 stack: 0x0
2020-07-22T19:47:32.071Z cpu0:66825)0x43916849bf50:[0x41802d5cc345]TcpipDispatch@(tcpip4)#+0x345 stack: 0x6
2020-07-22T19:47:32.071Z cpu0:66825)0x43916849bfe0:[0x41802ccc91b5]CpuSched_StartWorld@vmkernel#nover+0x99 stack: 0x0
2020-07-22T19:47:32.075Z cpu0:66825)base fs=0x0 gs=0x418040000000 Kgs=0x0

IPMI日志相同时间点有下面一个event.
162 2020-07-22T19:47:38 2 111 (Unknown) 2 (System Event) 83 Assert + Slot/Connector Fault Status

下一步:
需要服务器硬件厂商做进一步排查

上一篇:Linux之rhcs【红帽6的高可用】


下一篇:一分钟了解阿里云产品:DDoS高防IP概述