使用 MegaCLI 检测磁盘状态并更换磁盘 – 阿dai的个人空间

/ 2019-03-06 10:10

之前写了一篇文章介绍如何更换线上服务器磁盘操作流程,当时是把整体机器的磁盘全部不换掉了,但是最近另一台机器部分磁盘损坏,raid类型为10,经检测,只需要更换坏掉的磁盘即可,补充文档如下。

安装MegaCLI

安装包 下载地址

安装过程

# 首先下载获取安装包   # 解压   $ tar -zxf MegaCli8.07.10.tar.gz   $ cd MegaCli8.07.10/Linux/   $ rpm -ivh Lib_Utils-1.00-09.noarch.rpm MegaCli-8.02.21-1.noarch.rpm      # 加入系统环境   $ ln -s /opt/MegaRAID/MegaCli/MegaCli64 /usr/local/bin/MegaCli    $ MegaCli -v                                        MegaCLI SAS RAID Management Tool  Ver 8.02.21 Oct 21, 2011          (c)Copyright 2011, LSI Corporation, All Rights Reserved.      Exit Code: 0x00   # 安装完成!   
  • 冲突处理:

    $ rpm -ivh Lib_Utils-1.00-09.noarch.rpm MegaCli-8.02.21-1.noarch.rpm    准备中...                          ################################# [100%]   	file /opt/lsi/3rdpartylibs/x86_64/libsysfs.so.2.0.2 from install of Lib_Utils-1.00-09.noarch conflicts with file from package srvadmin-storelib-sysfs-9.1.0-2757.12163.el7.x86_64   
  • 原因: Lib_Utils和Dell服务器自带的包srvadmin冲突,直接将其卸载,然后安装即可。

    rpm -e srvadmin-storelib-sysfs-9.1.0-2757.12163.el7.x86_64 --nodeps   

使用指南

基本用法

# 查raid级别   $ megacli -LDInfo -Lall -aALL       # 查raid卡信息   $ megacli -AdpAllInfo -aALL       # 查看硬盘信息   $ megacli -PDList -aALL       # 查看电池信息   $ megacli -AdpBbuCmd -aAll       # 查看raid卡日志   $ megacli -FwTermLog -Dsply -aALL       # 显示适配器个数   $ megacli -adpCount       # 显示适配器时间   $ megacli -AdpGetTime –aALL       # 显示所有适配器信息   $ megacli -AdpAllInfo -aAll           # 显示所有逻辑磁盘组信息   $ megacli -LDInfo -LALL -aAll          # 显示所有的物理信息   $ megacli -PDList -aAll           # 查看充电状态   $ megacli -AdpBbuCmd -GetBbuStatus -aALL |grep 'Charger Status'       # 显示BBU状态信息   $ megacli -AdpBbuCmd -GetBbuStatus -aALL       # 显示BBU容量信息   $ megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL       # 显示BBU设计参数   $ megacli -AdpBbuCmd -GetBbuDesignInfo -aALL          # 显示当前BBU属性   $ megacli -AdpBbuCmd -GetBbuProperties -aALL          # 显示Raid卡型号,Raid设置,Disk相关信息   $ megacli -cfgdsply -aALL       ## 磁带状态的变化,从拔盘,到插盘的过程中。   Device           |Normal |Damage  |Rebuild |Normal   Virtual Drive    |Optimal|Degraded|Degraded|Optimal   Physical Drive   |Online |Failed Unconfigured|Rebuild|Online      # 查看物理磁盘状态:   $ megacli -PDRbld -ShowProg -PhysDrv  [Enclosure Device ID:Slot Number]  -a0   ## Rebuild 中的物理磁盘状态中会显示:"Firmware state: Rebuild"      # 查询 Rebuild 进度:   $ megacli -pdrbld -showprog -physdrv[E:S] -aALL   ## 返回内容类似于下面这样:   Rebuild Progress on Device at Enclosure 32, Slot 5 Completed 77% in 101 Minutes.      # 以文本进度条样式显示 Rebuild 进度:   $ megacli -pdrbld -progdsply -physdrv[E:S] -aALL   ## 屏幕显示类似下面的内容:   Rebuild progress of physical drives...   Enclosure:Slot               Percent Complete                       Time Elps         032 :05   #######################87 %################*******  01:59:07    Press key to quit...      # 查看 RAID 卡 Rebuild 参数:   $ megacli -AdpAllinfo -aALL | grep -i rebuild   ## 返回结果类似下面这样   Rebuild Rate                     : 30%   Auto Rebuild                     : Enabled   Rebuild Rate                     : YesForce    Rebuild                    : Yes      # 设置 RAID 卡 Rebuild 比例为60%:   $ megacli -AdpSetProp { RebuildRate -60} -aALL   ## 设置成功后返回:   Adapter 0: Set rebuild rate to 60% success.   

MegaCLI使用方法:http://blog.51cto.com/daixuan/1863567

重要参数

参数名称 含义
Firmware state 磁盘状态
Firmware state: Online, Spun Up 磁盘正常
Firmware state: Unconfigured(good), Spun Up 磁盘已安装,但未启用
Firmware state: Unconfigured(bad) 故障, 对应hwcheck的 Non-Critical
Firmware state: Failed 故障, 对应hwcheck的Critical
Firmware state: Rebuild 重建,一般在更换磁盘时显示
Enclosure Device ID: 32 设备
Slot Number: 1 磁盘在服务器上的槽位
Adapter #0 适配器编号,对应 -a 参数

实战:raid10环境下替换硬盘

Raid10环境下换硬盘还是很简单的,支持热插拔,直接拔下换掉就可以了,下面是操作步骤。

主要环境

服务器: R720

系统: CentOS7

raid类型:raid10

查看硬盘信息

为了更加清楚的呈现操作过程,未对信息简化处理。

$ MegaCli -PDList -aAll -NoLog                                           Adapter #0      Enclosure Device ID: 32   Slot Number: 0   Drive's postion: DiskGroup: 0, Span: 0, Arm: 0   Enclosure position: 0   Device Id: 0   WWN: 5000C50076CD09B4   Sequence Number: 1   Media Error Count: 0   Other Error Count: 0   Predictive Failure Count: 28   Last Predictive Failure Event Seq Number: 4378   PD Type: SAS   Raw Size: 558.911 GB [0x45dd2fb0 Sectors]   Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]   Coerced Size: 558.375 GB [0x45cc0000 Sectors]   Firmware state: Unconfigured(good), Spun Up   Device Firmware Level: ES66   Shield Counter: 0   Successful diagnostics completion on :  N/A   SAS Address(0): 0x5000c50076cd09b5   SAS Address(1): 0x0   Connected Port Number: 5(path0)    Inquiry Data: SEAGATE ST3600057SS     ES666SL8SASQ               FDE Enable: Disable   Secured: Unsecured   Locked: Unlocked   Needs EKM Attention: No   Foreign State: Foreign    Foreign Secure: Drive is not secured by a foreign lock key   Device Speed: 6.0Gb/s    Link Speed: 6.0Gb/s    Media Type: Hard Disk Device   Drive Temperature :40C (104.00 F)   PI Eligibility:  No    Drive is formatted for PI information:  No   PI: No PI   Drive's write cache : Disabled   Port-0 :   Port status: Active   Port's Linkspeed: 6.0Gb/s    Port-1 :   Port status: Active   Port's Linkspeed: Unknown    Drive has flagged a S.M.A.R.T alert : Yes         Enclosure Device ID: 32   Slot Number: 2   Enclosure position: 0   Device Id: 2   WWN: 5000C50076CD05BC   Sequence Number: 2   Media Error Count: 0   Other Error Count: 0   Predictive Failure Count: 0   Last Predictive Failure Event Seq Number: 0   PD Type: SAS   Raw Size: 0 KB [0x0 Sectors]   Non Coerced Size: 0 KB [0x0 Sectors]   Coerced Size: 0 KB [0x0 Sectors]   Firmware state: Unconfigured(bad)   Device Firmware Level: ES66   Shield Counter: 0   Successful diagnostics completion on :  N/A   SAS Address(0): 0x5000c50076cd05bd   SAS Address(1): 0x0   Connected Port Number: 1(path0)    Inquiry Data: SEAGATE ST3600057SS     ES666SL8SAVC               FDE Enable: Disable   Secured: Unsecured   Locked: Unlocked   Needs EKM Attention: No   Foreign State: None    Device Speed: Unknown    Link Speed: Unknown    Media Type: Hard Disk Device   Drive:  Not Supported   Drive Temperature :0C (32.00 F)   PI Eligibility:  No    Drive is formatted for PI information:  No   PI: No PI   Drive's write cache : Disabled   Port-0 :   Port status: Active   Port's Linkspeed: Unknown    Port-1 :   Port status: Active   Port's Linkspeed: Unknown    Drive has flagged a S.M.A.R.T alert : No         Enclosure Device ID: 32   Slot Number: 1   Drive's postion: DiskGroup: 0, Span: 0, Arm: 1   Enclosure position: 0   Device Id: 1   WWN: 5000C500983873BC   Sequence Number: 2   Media Error Count: 0   Other Error Count: 0   Predictive Failure Count: 0   Last Predictive Failure Event Seq Number: 0   PD Type: SAS   Raw Size: 558.911 GB [0x45dd2fb0 Sectors]   Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]   Coerced Size: 558.375 GB [0x45cc0000 Sectors]   Firmware state: Online, Spun Up   Device Firmware Level: VT31   Shield Counter: 0   Successful diagnostics completion on :  N/A   SAS Address(0): 0x5000c500983873bd   SAS Address(1): 0x0   Connected Port Number: 3(path0)    Inquiry Data: SEAGATE ST600MP0005     VT31S7M1CSLT               FDE Enable: Disable   Secured: Unsecured   Locked: Unlocked   Needs EKM Attention: No   Foreign State: None    Device Speed: Unknown    Link Speed: 6.0Gb/s    Media Type: Hard Disk Device   Drive Temperature :41C (105.80 F)   PI Eligibility:  No    Drive is formatted for PI information:  No   PI: No PI   Drive's write cache : Disabled   Port-0 :   Port status: Active   Port's Linkspeed: 6.0Gb/s    Port-1 :   Port status: Active   Port's Linkspeed: Unknown    Drive has flagged a S.M.A.R.T alert : No         Enclosure Device ID: 32   Slot Number: 3   Drive's postion: DiskGroup: 0, Span: 1, Arm: 1   Enclosure position: 0   Device Id: 3   WWN: 5000C50076CE2F30   Sequence Number: 2   Media Error Count: 5   Other Error Count: 71   Predictive Failure Count: 15   Last Predictive Failure Event Seq Number: 4379   PD Type: SAS   Raw Size: 558.911 GB [0x45dd2fb0 Sectors]   Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]   Coerced Size: 558.375 GB [0x45cc0000 Sectors]   Firmware state: Online, Spun Up   Device Firmware Level: ES66   Shield Counter: 0   Successful diagnostics completion on :  N/A   SAS Address(0): 0x5000c50076ce2f31   SAS Address(1): 0x0   Connected Port Number: 2(path0)    Inquiry Data: SEAGATE ST3600057SS     ES666SL8SAKA               FDE Enable: Disable   Secured: Unsecured   Locked: Unlocked   Needs EKM Attention: No   Foreign State: None    Device Speed: 6.0Gb/s    Link Speed: 6.0Gb/s    Media Type: Hard Disk Device   Drive Temperature :48C (118.40 F)   PI Eligibility:  No    Drive is formatted for PI information:  No   PI: No PI   Drive's write cache : Disabled   Port-0 :   Port status: Active   Port's Linkspeed: 6.0Gb/s    Port-1 :   Port status: Active   Port's Linkspeed: Unknown    Drive has flagged a S.M.A.R.T alert : Yes            Enclosure Device ID: 32   Slot Number: 4   Drive's postion: DiskGroup: 1, Span: 0, Arm: 0   Enclosure position: 0   Device Id: 4   WWN: 5000C5007E70F0F8   Sequence Number: 2   Media Error Count: 0   Other Error Count: 0   Predictive Failure Count: 0   Last Predictive Failure Event Seq Number: 0   PD Type: SAS   Raw Size: 558.911 GB [0x45dd2fb0 Sectors]   Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]   Coerced Size: 558.375 GB [0x45cc0000 Sectors]   Firmware state: Online, Spun Up   Device Firmware Level: ES66   Shield Counter: 0   Successful diagnostics completion on :  N/A   SAS Address(0): 0x5000c5007e70f0f9   SAS Address(1): 0x0   Connected Port Number: 0(path0)    Inquiry Data: SEAGATE ST3600057SS     ES666SL9F1JB               FDE Enable: Disable   Secured: Unsecured   Locked: Unlocked   Needs EKM Attention: No   Foreign State: None    Device Speed: 6.0Gb/s    Link Speed: 6.0Gb/s    Media Type: Hard Disk Device   Drive Temperature :46C (114.80 F)   PI Eligibility:  No    Drive is formatted for PI information:  No   PI: No PI   Drive's write cache : Disabled   Port-0 :   Port status: Active   Port's Linkspeed: 6.0Gb/s    Port-1 :   Port status: Active   Port's Linkspeed: Unknown    Drive has flagged a S.M.A.R.T alert : No            Enclosure Device ID: 32   Slot Number: 5   Drive's postion: DiskGroup: 1, Span: 0, Arm: 1   Enclosure position: 0   Device Id: 5   WWN: 5000C5007E708E3C   Sequence Number: 2   Media Error Count: 0   Other Error Count: 0   Predictive Failure Count: 0   Last Predictive Failure Event Seq Number: 0   PD Type: SAS   Raw Size: 558.911 GB [0x45dd2fb0 Sectors]   Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]   Coerced Size: 558.375 GB [0x45cc0000 Sectors]   Firmware state: Online, Spun Up   Device Firmware Level: ES66   Shield Counter: 0   Successful diagnostics completion on :  N/A   SAS Address(0): 0x5000c5007e708e3d   SAS Address(1): 0x0   Connected Port Number: 4(path0)    Inquiry Data: SEAGATE ST3600057SS     ES666SL9F2RB               FDE Enable: Disable   Secured: Unsecured   Locked: Unlocked   Needs EKM Attention: No   Foreign State: None    Device Speed: 6.0Gb/s    Link Speed: 6.0Gb/s    Media Type: Hard Disk Device   Drive Temperature :45C (113.00 F)   PI Eligibility:  No    Drive is formatted for PI information:  No   PI: No PI   Drive's write cache : Disabled   Port-0 :   Port status: Active   Port's Linkspeed: 6.0Gb/s    Port-1 :   Port status: Active   Port's Linkspeed: Unknown    Drive has flagged a S.M.A.R.T alert : No      Exit Code: 0x00   

由以上信息可知该服务器有6块磁盘(Device Id)。

卸载故障硬盘

$ MegaCli -PDOffline -PhysDrv[32:2] -a0   $ MegaCli -PDOffline -PhysDrv[32:0] -a0   

上面命令中 322 以及 -a0 的对应关系:

Adapter #0   Enclosure Device ID: 32   Slot Number: 2   

替换故障硬盘

此时故障硬盘已经OFFLINE,在服务器现场查看时,故障硬盘闪烁的是黄灯,正常硬盘的绿灯; 拔下故障硬盘,插上好硬盘,硬盘灯闪烁为绿色,并硬盘快速旋转,表示硬盘正在rebuild状态,查看状态如下:

$ MegaCli -PDList -aAll -NoLog   ...   Enclosure Device ID: 32   Slot Number: 3   ...   Firmware state: Rebuild   ...   

查看rebuild进度

$ MegaCli -PDRbld -ShowProg -PhysDrv[32:2] -aAll      Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 16% in 94 Minutes.   

磁盘更换完成

$ MegaCli -PDList -aAll -NoLog | grep 'Firmware state'   Firmware state: Online, Spun Up   Firmware state: Online, Spun Up   Firmware state: Online, Spun Up   Firmware state: Online, Spun Up   Firmware state: Online, Spun Up   Firmware state: Online, Spun Up   

Shared via Inoreader