ハードウェア監視の波(ipmitoolを、MegaCli)

ハードウェア監視の波(ipmitoolを、MegaCli)

ipmitoolのとMegaCliツールが監視するのはなぜ?

ipmitoolのを一から一のサーバ波の情報と管理カード内の情報、例えば、管理カード値のファンの状態を見て

システムファンで関連する情報へのアクセスのipmitool

ipmitool sdr list | grep FAN[0-6]
FAN0_F_Speed     | 4224 RPM          | ok
FAN0_R_Speed     | 3744 RPM          | ok
FAN1_F_Speed     | 4224 RPM          | ok
FAN1_R_Speed     | 3744 RPM          | ok
FAN2_F_Speed     | 4320 RPM          | ok
FAN2_R_Speed     | 3840 RPM          | ok
FAN3_F_Speed     | 4224 RPM          | ok
FAN3_R_Speed     | 3840 RPM          | ok

ここから、彼に出力されたデータは、ファンの状態を読み取り、表示を開始することができます。
差分デルファンを監視し、異常状態で示すリード値デルファンが返される場合波が監視しています。波の監視では唯一のファンのオンラインステータスを監視しています。こうしたファンの値が200で読んで、しかしファンまだオンラインときのようにファン異常の値を読みます。デルは異常状態、意志アラームを持っています。Waveはしません(ただし、一般的に、これはめったに起こりません、私は見ていません)

ipmitoolの監視

  • プロセッサの状態を監視し
    、彼のステータスが列を表示しても大丈夫ではありません見るにはここが、存在することによって、このコラムを検出
    ipmitoolのSDR ELIST | grepを-i CPU [ 0-2] _status
    対応管理カード
CPU0_Status      | 7Dh | ok  |  3.0 | Presence detected
CPU1_Status      | 7Eh | ok  |  3.0 | Presence detected
  • 表示メモリ状態
    注:ここでの波はもっと嫌です、センサーの彼の記憶
    の名前は、CPU(HA ----インテュイット)であるので、彼のセンサーの名前を見てたときにはgrep必要。彼のセンサーのモデルとモデル名の間に矛盾しています。だから、それは互換性がある必要があるときに監視を行います。
    ここでは彼の状態値は、存在が検出されます

ipmitoolのSDR ELIST | grepの-i CPU [0-1] _C [0-1] D [0-1]

CPU0_C0D0        | 83h | ok  | 32.0 | Presence Detected
CPU0_C0D1        | 84h | ok  | 32.1 |
CPU0_C1D0        | 85h | ok  | 32.2 | Presence Detected
CPU0_C1D1        | 86h | ok  | 32.3 |
CPU1_C0D0        | 8Fh | ok  | 32.12 | Presence Detected
CPU1_C0D1        | 90h | ok  | 32.13 |
CPU1_C1D0        | 91h | ok  | 32.14 | Presence Detected
CPU1_C1D1        | 92h | ok  | 32.15 |
  • ハードドライブソケットのステータス表示
    はgrep -iディスク| ipmitoolのSDR ELISTを
下边是硬盘插槽
DISK0_Status     | B4h | ok  |  4.0 | Drive Present
DISK1_Status     | B5h | ok  |  4.1 | Drive Present
DISK2_Status     | B6h | ok  |  4.2 | Drive Present
DISK3_Status     | B7h | ok  |  4.3 | Drive Present
DISK4_Status     | B8h | ok  |  4.4 | Drive Present
DISK5_Status     | B9h | ok  |  4.5 | Drive Present
DISK6_Status     | BAh | ok  |  4.6 | Drive Present
DISK7_Status     | BBh | ok  |  4.7 | Drive Present
DISK8_Status     | BCh | ok  |  4.8 | Drive Present
DISK9_Status     | BDh | ok  |  4.9 | Drive Present
DISK10_Status    | BEh | ok  |  4.10 | Drive Present
DISK11_Status    | BFh | ok  |  4.11 | Drive Present
DISK12_Status    | C0h | ok  |  4.12 |
DISK13_Status    | C1h | ok  |  4.13 |
DISK14_Status    | C2h | ok  |  4.14 |
DISK15_Status    | C3h | ok  |  4.15 |
DISK16_Status    | C4h | ok  |  4.16 |
DISK17_Status    | C5h | ok  |  4.17 |
DISK18_Status    | C6h | ok  |  4.18 |
DISK19_Status    | C7h | ok  |  4.19 |
DISK20_Status    | C8h | ok  |  4.20 |
DISK21_Status    | C9h | ok  |  4.21 |
DISK22_Status    | CAh | ok  |  4.22 |
DISK23_Status    | CBh | ok  |  4.23 |
DISK24_Status    | D4h | ok  |  4.24 |
下边是硬盘背板插槽
DISK0_R_Status   | CCh | ok  |  4.0 | Drive Present
DISK1_R_Status   | CDh | ok  |  4.1 | Drive Present
DISK2_R_Status   | CEh | ok  |  4.2 |
DISK3_R_Status   | CFh | ok  |  4.3 |
DISK4_R_Status   | D0h | ok  |  4.4 |
DISK5_R_Status   | D1h | ok  |  4.5 |
DISK6_R_Status   | D2h | ok  |  4.6 |
DISK7_R_Status   | D3h | ok  |  4.7 |
  • 電力情報
    ipmitoolをSDR ELIST | grepを-i PSU [ 0-1] _status
PSU0_Status      | 74h | ok  | 10.0 | Presence detected
PSU1_Status      | 75h | ok  | 10.0 | Presence detected
  • 风扇状态信息
    ipmitool sdr elist| grep -i fan[0-9]_Present
FAN0_Present     | 60h | ok  | 29.0 | Device Present
FAN1_Present     | 61h | ok  | 29.1 | Device Present
FAN2_Present     | 62h | ok  | 29.2 | Device Present
FAN3_Present     | 63h | ok  | 29.3 | Device Present
  • 温度情况监控
    ipmitool sdr elist| grep -i temp
Inlet_Temp       | 00h | ok  | 12.0 | 22 degrees C
Outlet_Temp      | 01h | ok  | 55.1 | 32 degrees C
CPU0_Temp        | 06h | ok  |  3.0 | 28 degrees C
CPU1_Temp        | 07h | ok  |  3.0 | 26 degrees C
CPU0_DIMM_Temp   | 0Eh | ok  | 32.0 | 34 degrees C
CPU1_DIMM_Temp   | 0Fh | ok  | 32.0 | 32 degrees C
CPU0_VR_Temp     | 02h | ok  |  3.0 | 31 degrees C
CPU1_VR_Temp     | 03h | ok  |  3.1 | 30 degrees C
PCH_Temp         | 16h | ok  |  3.0 | 44 degrees C
OCP_Temp         | 29h | ns  | 11.0 | No Reading
NVME_Temp        | 28h | ns  | 11.1 | No Reading
PSU0_Temp        | 1Ch | ok  | 32.0 | 28 degrees C
PSU1_Temp        | 1Dh | ok  | 32.0 | 27 degrees C
RAID0_Temp       | 17h | ok  | 11.0 | 58 degrees C
RAID1_Temp       | 18h | ns  | 11.1 | No Reading
RAID2_Temp       | 19h | ns  | 11.2 | No Reading
RAID3_Temp       | 1Ah | ns  | 11.3 | No Reading
GPU0_Temp        | 20h | ns  | 11.0 | No Reading
GPU1_Temp        | 21h | ns  | 11.1 | No Reading
GPU2_Temp        | 22h | ns  | 11.2 | No Reading
GPU3_Temp        | 23h | ns  | 11.3 | No Reading
GPU4_Temp        | 24h | ns  | 11.4 | No Reading
GPU5_Temp        | 25h | ns  | 11.5 | No Reading
GPU6_Temp        | 26h | ns  | 11.6 | No Reading
GPU7_Temp        | 27h | ns  | 11.7 | No Reading
PCIE_SSD0_Temp   | A7h | ns  | 11.0 | No Reading
PCIE_SSD1_Temp   | A8h | ns  | 11.1 | No Reading
PCIE_SSD2_Temp   | A9h | ns  | 11.2 | No Reading
PCIE_SSD3_Temp   | AAh | ns  | 11.3 | No Reading
PCIE_SSD4_Temp   | ABh | ns  | 11.4 | No Reading
PCIE_SSD5_Temp   | ACh | ns  | 11.5 | No Reading
PCIE_SSD6_Temp   | ADh | ns  | 11.6 | No Reading
PCIE_SSD7_Temp   | AEh | ns  | 11.7 | No Reading
M.2_Inlet_Temp   | 05h | ok  | 55.0 | 28 degrees C
Rear_HDDBP_Temp  | 2Ah | ns  | 11.0 | No Reading
SWITCH0_Temp     | 4Ah | ns  | 11.0 | No Reading
SWITCH1_Temp     | 4Bh | ns  | 11.1 | No Reading
HDD_Max_Temp     | 2Bh | ok  | 11.0 | 32 degrees C

阵列监控

MegaCli64具体其他的使用可以百度一下

  • 硬盘信息输出
    sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll -NoLog| egrep -iv "exit|Adapter"
Enclosure Device ID: 8 # id
Slot Number: 13 # 磁盘插槽
Enclosure position: 0
Device Id: 14
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]  #设备大小
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Firmware state: Online, Spun Up # 磁盘的状态 就是监控磁盘的这个值的状态
SAS Address(0): 0x56c92bf001fa0bcd
Connected Port Number: 0(path0)
Inquiry Data: V6J3J9SS            HGST HUS726T4TALA6L4                    VLGAW41G
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: Unknown
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :27C (80.60 F) # 温度
  • 虚拟硬盘的信息获取
    他可能有很多的阵列,现在只是拿出其中一个说
    sudo /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aAll -NoLog| egrep -iv "exit|Adapter"
Virtual Drive: 9 (Target Id: 9)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0  # 这里就是raid0
Size                : 3.637 TB
State               : Optimal # 这个是这个整列的状态,阵列的监控就是监控的这个值
Strip Size          : 64 KB  # 这个是他的条带 
Number Of Drives    : 1
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
Access Policy       : Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Bad Blocks Exist: No
Number of Spans: 1
Span: 0 - Number of PDs: 1

# 下边是在这个整列中的磁盘信息,但是这里的磁盘信息需要注意,当磁盘信息是在线或者热备的时候会显示在这下边的列表中。
PD: 0 Information
Enclosure Device ID: 8
Slot Number: 10
Enclosure position: 0
Device Id: 17
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Firmware state: Online, Spun Up
SAS Address(0): 0x56c92bf001fa0bca
Connected Port Number: 0(path0)
Inquiry Data: V6J3J1BS            HGST HUS726T4TALA6L4                    VLGAW41G
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: Unknown
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :28C (82.40 F)
  • 查看阵列卡的详细信息
    sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aAll
BBU status for Adapter: 0

BatteryType: CVPM02
Voltage: 9431 mV
Current: 0 mA
Temperature: 25 C

BBU Firmware Status:

 Charging Status              : None
 Voltage                                 : OK
 Temperature                             : OK
 Learn Cycle Requested                   : No
 Learn Cycle Active                      : No
 Learn Cycle Status                      : OK
 Learn Cycle Timeout                     : No
 I2c Errors Detected                     : No
 Battery Pack Missing                    : No
 Battery Replacement required            : No
 Remaining Capacity Low                  : No
 Periodic Learn Required                 : No
 Transparent Learn                       : No
 No space to cache offload               : No
 Pack is about to fail & should be replaced : No
 Cache Offload premium feature required  : No
 Module microcode update required        : No

Battery state:

GasGuageStatus:
 Fully Discharged        : Yes
 Fully Charged           : Yes
 Discharging             : Yes
 Initialized             : Yes
 Remaining Time Alarm    : No
 Remaining Capacity Alarm: Yes
 Discharge Terminated    : Yes
 Over Temperature        : No
 Charging Terminated     : Yes
 Over Charged            : No

 Pack energy             : 247 J
 Capacitance             : 110
 Remaining reserve space : 0


BBU Design Info for Adapter: 0

Date of Manufacture: 08/06, 2019
Design Capacity: 288 J
Design Voltage: 9500 mV
Serial Number: 1550
Manufacture Name: LSI
Device Name: CVPM02
Device Chemistry: EDLC
Battery FRU: N/A
TMM FRU: N/A
Module Version: 6635-02A


BBU Properties for Adapter: 0

Auto Learn Period: 2412000 Sec
Next Learn time: 634778466 Sec
Learn Delay Interval:0 Hours
Auto-Learn Mode: Enabled

おすすめ

転載: www.cnblogs.com/yanghehe/p/12301417.html