Use MegaCLI detect disk status and disk replacement

https://my.oschina.net/adailinux/blog/2231519

Before writing an article describes how to replace the line of server disk operation process , the whole machine was not replaced the entire disk, but recently another part of the machine disk corruption, raid type 10, after testing, only need to replace broken disk to supplement the documents below.

Installation MegaCLI

Installation packages  Download  .

Installation process

# 首先下载获取安装包
# 解压
$ tar -zxf MegaCli8.07.10.tar.gz
$ cd MegaCli8.07.10/Linux/
$ rpm -ivh Lib_Utils-1.00-09.noarch.rpm MegaCli-8.02.21-1.noarch.rpm

# 加入系统环境
$ ln -s /opt/MegaRAID/MegaCli/MegaCli64 /usr/local/bin/MegaCli $ MegaCli -v MegaCLI SAS RAID Management Tool Ver 8.02.21 Oct 21, 2011 (c)Copyright 2011, LSI Corporation, All Rights Reserved. Exit Code: 0x00 # 安装完成! 
  • Conflict management:

    $ rpm -ivh Lib_Utils-1.00-09.noarch.rpm MegaCli-8.02.21-1.noarch.rpm 
    准备中...                          ################################# [100%]
    	file /opt/lsi/3rdpartylibs/x86_64/libsysfs.so.2.0.2 from install of Lib_Utils-1.00-09.noarch conflicts with file from package srvadmin-storelib-sysfs-9.1.0-2757.12163.el7.x86_64
    
  • The reason: Lib_Utils and Dell server that comes with the package srvadmin conflict directly uninstall it, then install it.

    rpm -e srvadmin-storelib-sysfs-9.1.0-2757.12163.el7.x86_64 --nodeps
    

user's guidance

Basic Usage

# 查raid级别
$ megacli -LDInfo -Lall -aALL 

# 查raid卡信息
$ megacli -AdpAllInfo -aALL 

# 查看硬盘信息
$ megacli -PDList -aALL 

# 查看电池信息
$ megacli -AdpBbuCmd -aAll 

# 查看raid卡日志 $ megacli -FwTermLog -Dsply -aALL  # 显示适配器个数 $ megacli -adpCount  # 显示适配器时间 $ megacli -AdpGetTime –aALL  # 显示所有适配器信息 $ megacli -AdpAllInfo -aAll      # 显示所有逻辑磁盘组信息 $ megacli -LDInfo -LALL -aAll     # 显示所有的物理信息 $ megacli -PDList -aAll      # 查看充电状态 $ megacli -AdpBbuCmd -GetBbuStatus -aALL |grep 'Charger Status'  # 显示BBU状态信息 $ megacli -AdpBbuCmd -GetBbuStatus -aALL  # 显示BBU容量信息 $ megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL  # 显示BBU设计参数 $ megacli -AdpBbuCmd -GetBbuDesignInfo -aALL     # 显示当前BBU属性 $ megacli -AdpBbuCmd -GetBbuProperties -aALL     # 显示Raid卡型号,Raid设置,Disk相关信息 $ megacli -cfgdsply -aALL     ## 磁带状态的变化,从拔盘,到插盘的过程中。 Device           |Normal |Damage  |Rebuild |Normal Virtual Drive    |Optimal|Degraded|Degraded|Optimal Physical Drive   |Online |Failed Unconfigured|Rebuild|Online # 查看物理磁盘状态: $ megacli -PDRbld -ShowProg -PhysDrv  [Enclosure Device ID:Slot Number]  -a0 ## Rebuild 中的物理磁盘状态中会显示:"Firmware state: Rebuild" # 查询 Rebuild 进度: $ megacli -pdrbld -showprog -physdrv[E:S] -aALL ## 返回内容类似于下面这样: Rebuild Progress on Device at Enclosure 32, Slot 5 Completed 77% in 101 Minutes. # 以文本进度条样式显示 Rebuild 进度: $ megacli -pdrbld -progdsply -physdrv[E:S] -aALL ## 屏幕显示类似下面的内容: Rebuild progress of physical drives... Enclosure:Slot               Percent Complete                       Time Elps       032 :05   #######################87 %################*******  01:59:07  Press key to quit... # 查看 RAID 卡 Rebuild 参数: $ megacli -AdpAllinfo -aALL | grep -i rebuild ## 返回结果类似下面这样 Rebuild Rate                     : 30% Auto Rebuild                     : Enabled Rebuild Rate                     : YesForce Rebuild                    : Yes # 设置 RAID 卡 Rebuild 比例为60%: $ megacli -AdpSetProp { RebuildRate -60} -aALL ## 设置成功后返回: Adapter 0: Set rebuild rate to 60% success. 

MegaCLI Usage: http://blog.51cto.com/daixuan/1863567

Important parameters

parameter name meaning
Firmware state Disk Status
Firmware state: Online, Spun Up Disk normal
Firmware state: Unconfigured(good), Spun Up Disk is installed, but not enabled
Firmware state: Unconfigured(bad) Failure, corresponding to the Non-Critical hwcheck
Firmware state: Failed Failure of the corresponding Critical hwcheck
Firmware state: Rebuild Reconstruction, usually displayed when replacing a disk
Enclosure Device ID: 32 device
Slot Number: 1 Disk slots on the server
Adapter #0 Adapter number, corresponding to the parameter -a

Combat: Replace the hard environment under raid10

Under Raid10 environment swap hard disk is very simple, hot swappable, just remove replace it, here are the steps.

The main environmental

Server: R720

System: CentOS7

raid type: raid10

View hard drive information

To more clearly rendering operation, not to simplify the processing of information.

$ MegaCli -PDList -aAll -NoLog
                                     
Adapter #0

Enclosure Device ID: 32
Slot Number: 0
Drive's postion: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: 0
Device Id: 0
WWN: 5000C50076CD09B4
Sequence Number: 1
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 28
Last Predictive Failure Event Seq Number: 4378
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: ES66
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50076cd09b5
SAS Address(1): 0x0
Connected Port Number: 5(path0) 
Inquiry Data: SEAGATE ST3600057SS     ES666SL8SASQ            
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: Foreign 
Foreign Secure: Drive is not secured by a foreign lock key
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :40C (104.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Drive's write cache : Disabled
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : Yes


Enclosure Device ID: 32
Slot Number: 2
Enclosure position: 0
Device Id: 2
WWN: 5000C50076CD05BC
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 0 KB [0x0 Sectors]
Non Coerced Size: 0 KB [0x0 Sectors]
Coerced Size: 0 KB [0x0 Sectors]
Firmware state: Unconfigured(bad)
Device Firmware Level: ES66
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50076cd05bd
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: SEAGATE ST3600057SS     ES666SL8SAVC            
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device
Drive:  Not Supported
Drive Temperature :0C (32.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Drive's write cache : Disabled Port-0 : Port status: Active Port's Linkspeed: Unknown Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 32 Slot Number: 1 Drive's postion: DiskGroup: 0, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 1 WWN: 5000C500983873BC Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online, Spun Up Device Firmware Level: VT31 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c500983873bd SAS Address(1): 0x0 Connected Port Number: 3(path0) Inquiry Data: SEAGATE ST600MP0005 VT31S7M1CSLT FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: Unknown Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :41C (105.80 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 32 Slot Number: 3 Drive's postion: DiskGroup: 0, Span: 1, Arm: 1 Enclosure position: 0 Device Id: 3 WWN: 5000C50076CE2F30 Sequence Number: 2 Media Error Count: 5 Other Error Count: 71 Predictive Failure Count: 15 Last Predictive Failure Event Seq Number: 4379 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online, Spun Up Device Firmware Level: ES66 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c50076ce2f31 SAS Address(1): 0x0 Connected Port Number: 2(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL8SAKA FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :48C (118.40 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : Yes Enclosure Device ID: 32 Slot Number: 4 Drive's postion: DiskGroup: 1, Span: 0, Arm: 0 Enclosure position: 0 Device Id: 4 WWN: 5000C5007E70F0F8 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online, Spun Up Device Firmware Level: ES66 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c5007e70f0f9 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL9F1JB FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :46C (114.80 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 32 Slot Number: 5 Drive's postion: DiskGroup: 1, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 5 WWN: 5000C5007E708E3C Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.375 GB [0x45cc0000 Sectors] Firmware state: Online, Spun Up Device Firmware Level: ES66 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c5007e708e3d SAS Address(1): 0x0 Connected Port Number: 4(path0) Inquiry Data: SEAGATE ST3600057SS ES666SL9F2RB FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :45C (113.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No Exit Code: 0x00 

Apparent from the above information that the server has six disks (Device Id).

Uninstall failed hard

$ MegaCli -PDOffline -PhysDrv[32:2] -a0
$ MegaCli -PDOffline -PhysDrv[32:0] -a0

32 and 2 and a correspondence relationship -a0 above command:

Adapter #0
Enclosure Device ID: 32
Slot Number: 2

Replace a failed hard drive

At this point the failed hard disk already OFFLINE, when you view the site server, hard disk failure is flashing yellow light, green light normal hard drive; unplug the failed hard disk, plug in a good hard drive, the hard disk light flashes green, and rapidly rotating hard disk, the hard disk is expressed rebuild the state, view the status as follows:

$ MegaCli -PDList -aAll -NoLog
...
Enclosure Device ID: 32
Slot Number: 3
...
Firmware state: Rebuild
...

Check rebuild progress

$ MegaCli -PDRbld -ShowProg -PhysDrv[32:2] -aAll

Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 16% in 94 Minutes.

Disk replacement completed

$ MegaCli -PDList -aAll -NoLog | grep 'Firmware state'
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

 

Guess you like

Origin www.cnblogs.com/xiaodoujiaohome/p/11729197.html