Section One : Introduction
strocli是megacli的升级版本,针对于戴尔服务器是perccli,用法完全一致
smartctl可以查看磁盘的主控芯片smart信息
lsscsi可以查看系统的scsi信息,数据来源/proc/scsi/scsi相关
这些工具都是查看磁盘相关信息的常用工具,对于排查磁盘状态和raid卡问题都有帮助
Section Two : Install package
安装一下storcli或者perccli,并且将命令软连接到/usr/bin/目录下,方便使用命令:
其实perccli和storccli就是同一个工具,语法完全一样,只是命令名字不一样,适用的品牌不一样。perccli适用于dell机器,storccli适用于华为、浪潮(其它品牌没有测试过,不确认)。下面指令以perccli做示例。顺便贴个下载链接吧。
https://downloads.dell.com/FOLDER03559396M/1/perccli-1.17.10-1.noarch.rpm
https://downloadmirror.intel.com/26820/eng/StorCli.zip
要安装percli RPM,运行rpm -ivh ,或要升级percli RPM,运行rpm -Uvh 。
perccli64 show 显示raid卡信息
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]# /opt/MegaRAID/perccli/perccli64 show
Status Code = 0
Status = Success
Description = None
Number of Controllers = 1
Host Name = KuaiCDN
Operating System = Linux5.4.196-1.el7.elrepo.x86_64
System Overview :
===============
------------------------------------------------------------------------
Ctl Model Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS EHS ASOs Hlth
------------------------------------------------------------------------
0 PERCH730Mini 8 16 4 0 4 0 Opt On 3 N 0 Opt
------------------------------------------------------------------------
Ctl=Controller Index|DGs=Drive groups|VDs=Virtual drives|Fld=Failed
PDs=Physical drives|DNOpt=DG NotOptimal|VNOpt=VD NotOptimal|Opt=Optimal
Msng=Missing|Dgd=Degraded|NdAtn=Need Attention|Unkwn=Unknown
sPR=Scheduled Patrol Read|DS=DimmerSwitch|EHS=Emergency Hot Spare
Y=Yes|N=No|ASOs=Advanced Software Options|BBU=Battery backup unit
Hlth=Health|Safe=Safe-mode boot
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]#
可以看到只有一个raid卡,ctrl 0也是就是/c0
查看某阵列卡上的的所有磁盘
/opt/MegaRAID/perccli/perccli64 /c0/eall/sall show 看到该磁盘有
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]# /opt/MegaRAID/perccli/perccli64 /c0/eall/sall show
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.
Drive Information :
=================
---------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
---------------------------------------------------------------------------------
32:0 0 Onln 0 465.25 GB SATA SSD Y N 512B Samsung SSD 860 EVO 500GB U
32:1 1 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:2 2 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:3 3 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:4 4 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:5 5 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:6 6 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:7 7 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:8 8 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:9 9 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:10 10 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:11 11 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:12 12 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:13 13 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:14 14 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:15 15 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
---------------------------------------------------------------------------------
EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded
[root@KuaiCDN perccli]#
DG代表drive group,是配置raid建分组的顺序
设备名取 riad group
/opt/MegaRAID/perccli/perccli64 /c0/vall show 看到该磁盘的DG与VD的对应关系如下
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]# /opt/MegaRAID/perccli/perccli64 /c0/vall show
Controller = 0
Status = Success
Description = None
Virtual Drives :
==============
---------------------------------------------------------------
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
---------------------------------------------------------------
0/0 RAID0 Optl RW Yes RWBD - OFF 465.25 GB
1/1 RAID10 Optl RW Yes RWBD - OFF 1.818 TB
2/2 RAID10 Optl RW No RWBD - OFF 1.818 TB
3/3 RAID10 Optl RW No RWBD - OFF 1.818 TB
---------------------------------------------------------------
Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|B=Blocked|Consist=Consistent|
R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]# lsscsi
[0:2:0:0] disk DELL PERC H730 Mini 4.26 /dev/sdb
[0:2:1:0] disk DELL PERC H730 Mini 4.26 /dev/sda
[0:2:2:0] disk DELL PERC H730 Mini 4.26 /dev/sdd
[0:2:3:0] disk DELL PERC H730 Mini 4.26 /dev/sdc
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]# /opt/MegaRAID/perccli/perccli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.
Controller = 0
Status = Success
Description = None
Product Name = PERC H730 Mini
Serial Number = 61R053M
SAS Address = 514187706c645c00
PCI Address = 00:03:00:00
System Time = 08/18/2022 14:02:17
Mfg. Date = 02/05/16
Controller Time = 08/18/2022 14:04:48
FW Package Build = 25.4.0.0015
BIOS Version = 6.29.00.0_4.16.07.00_0x06120100
FW Version = 4.260.00-5092
Driver Name = megaraid_sas
Driver Version = 07.710.50.00-rc1
Current Personality = RAID-Mode
Vendor Id = 0x1000
Device Id = 0x5D
SubVendor Id = 0x1028
SubDevice Id = 0x1F49
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 3
Device Number = 0
Function Number = 0
Drive Groups = 4
TOPOLOGY :
========
--------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type State BT Size PDC PI SED DS3 FSpace
--------------------------------------------------------------------------
0 - - - - RAID0 Optl N 465.25 GB dflt N N dflt N
0 0 - - - RAID0 Optl N 465.25 GB dflt N N dflt N
0 0 0 32:0 0 DRIVE Onln N 465.25 GB dflt N N dflt -
1 - - - - RAID10 Optl N 1.818 TB dflt N N dflt N
1 0 - - - RAID1 Optl N 1.818 TB dflt N N dflt N
1 0 0 32:1 1 DRIVE Onln N 931.0 GB dflt N N dflt -
1 0 1 32:2 2 DRIVE Onln N 931.0 GB dflt N N dflt -
1 0 2 32:3 3 DRIVE Onln N 931.0 GB dflt N N dflt -
1 0 3 32:4 4 DRIVE Onln N 931.0 GB dflt N N dflt -
2 - - - - RAID10 Optl Y 1.818 TB dflt N N dflt N
2 0 - - - RAID1 Optl Y 1.818 TB dflt N N dflt N
2 0 0 32:5 5 DRIVE Onln N 931.0 GB dflt N N dflt -
2 0 1 32:6 6 DRIVE Onln N 931.0 GB dflt N N dflt -
2 0 2 32:7 7 DRIVE Onln N 931.0 GB dflt N N dflt -
2 0 3 32:8 8 DRIVE Onln N 931.0 GB dflt N N dflt -
3 - - - - RAID10 Optl Y 1.818 TB dflt N N dflt N
3 0 - - - RAID1 Optl Y 1.818 TB dflt N N dflt N
3 0 0 32:9 9 DRIVE Onln N 931.0 GB dflt N N dflt -
3 0 1 32:10 10 DRIVE Onln N 931.0 GB dflt N N dflt -
3 0 2 32:11 11 DRIVE Onln N 931.0 GB dflt N N dflt -
3 0 3 32:12 12 DRIVE Onln N 931.0 GB dflt N N dflt -
--------------------------------------------------------------------------
DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
Virtual Drives = 4
VD LIST :
=======
---------------------------------------------------------------
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
---------------------------------------------------------------
0/0 RAID0 Optl RW Yes RWBD - OFF 465.25 GB
1/1 RAID10 Optl RW Yes RWBD - OFF 1.818 TB
2/2 RAID10 Optl RW No RWBD - OFF 1.818 TB
3/3 RAID10 Optl RW No RWBD - OFF 1.818 TB
---------------------------------------------------------------
Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|B=Blocked|Consist=Consistent|
R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency
Physical Drives = 16
PD LIST :
=======
---------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
---------------------------------------------------------------------------------
32:0 0 Onln 0 465.25 GB SATA SSD Y N 512B Samsung SSD 860 EVO 500GB U
32:1 1 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:2 2 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:3 3 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:4 4 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:5 5 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:6 6 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:7 7 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:8 8 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:9 9 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:10 10 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:11 11 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:12 12 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:13 13 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:14 14 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:15 15 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
---------------------------------------------------------------------------------
EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded
BBU_Info :
========
-------------------------------------------------------------------
Model State RetentionTime Temp Mode MfgDate Next Learn
-------------------------------------------------------------------
BBU Optimal 0 hour(s) 29C - 0/00/00 2022/11/01 01:14:45
-------------------------------------------------------------------
[root@KuaiCDN perccli]#
看磁盘的Device id、Slot No. 以及DriveGroup
[root@KuaiCDN perccli]#
[root@KuaiCDN perccli]# /opt/MegaRAID/perccli/perccli64 /c0/eall/sall show
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.
Drive Information :
=================
---------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
---------------------------------------------------------------------------------
32:0 0 Onln 0 465.25 GB SATA SSD Y N 512B Samsung SSD 860 EVO 500GB U
32:1 1 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:2 2 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:3 3 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:4 4 Onln 1 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:5 5 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:6 6 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:7 7 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:8 8 Onln 2 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:9 9 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:10 10 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:11 11 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:12 12 Onln 3 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:13 13 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:14 14 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
32:15 15 GHS - 931.0 GB SATA SSD Y N 512B Samsung SSD 860 EVO 1TB U
---------------------------------------------------------------------------------
EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded
[root@KuaiCDN perccli]#
查看指定硬盘的信息
[root@node-15 ~]# perccli64 /c0/e32/s6 show all
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.
Drive /c0/e32/s6 :
================
—————————————————————————————————-
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
—————————————————————————————————-
32:6 6 Onln 1 931.0 GB SATA HDD N N 512B ST91000640NS U
—————————————————————————————————-
EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded
Drive /c0/e32/s6 - Detailed Information :
=======================================
Drive /c0/e32/s6 State :
======================
Shield Counter = 0
Media Error Count = 46431 很明显的问题发生了46431次介质错误
Other Error Count = 0
Drive Temperature = 31C (87.80 F)
Predictive Failure Count = 126 预测故障次数126次
S.M.A.R.T alert flagged by drive = Yes
Drive /c0/e32/s6 Device attributes :
==================================
SN = 9XGA228L
Manufacturer Id = ATA
Model Number = ST91000640NS
NAND Vendor = NA
WWN = 5000c500918f2f8a
Firmware Revision = AA63
Raw size = 931.512 GB [0x74706db0 Sectors]
Coerced size = 931.0 GB [0x74600000 Sectors]
Non Coerced size = 931.012 GB [0x74606db0 Sectors]
Device Speed = 6.0Gb/s
Link Speed = 12.0Gb/s
NCQ setting = N/A
Write Cache = Enabled
Logical Sector Size = 512B
Physical Sector Size = 512B
Connector Name = 00
Drive /c0/e32/s6 Policies/Settings :
==================================
Drive position = DriveGroup:1, Span:0, Row:0
Enclosure position = 0
Connected Port Number = 0(path0)
Sequence Number = 2
Commissioned Spare = No
Emergency Spare = No
Last Predictive Failure Event Sequence Number = 95183 上一次预测错误的序号95183
Successful diagnostics completion on = N/A
SED Capable = No
SED Enabled = No
Secured = No
Cryptographic Erase Capable = No
Locked = No
Needs EKM Attention = No
PI Eligible = No
Certified = Yes
Wide Port Capable = No
Port Information :
================
————————————————————-
Port Status Linkspeed SAS address
————————————————————-
0 Active 12.0Gb/s 0x500056b33fefe586
————————————————————-
Inquiry Data =
5a 0c ff 3f 37 c8 10 00 00 00 00 00 3f 00 00 00
00 00 00 00 20 20 20 20 20 20 20 20 20 20 20 20
58 39 41 47 32 32 4c 38 00 00 00 00 04 00 20 20
20 20 41 41 33 36 54 53 31 39 30 30 36 30 30 34
53 4e 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 10 80
00 40 00 2f 00 40 00 02 00 02 07 00 ff 3f 10 00
3f 00 10 fc fb 00 10 00 ff ff ff 0f 00 00 07 00
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Note:
通过单个卷组的信息查看,发现了media error,说明了硬盘是有问题的
1
2
3
查看磁盘与系统磁盘分区的对应
[root@node-15 ~]# perccli64 /c0/vall show
Controller = 0
Status = Success
Description = None
Virtual Drives :
==============
——————————————————————————————-
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
——————————————————————————————-
0/0 RAID1 Optl RW Yes RWBD - OFF 931.0 GB
1/1 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
2/2 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
3/3 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
4/4 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
5/5 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
6/6 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
7/7 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
8/8 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
9/9 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
10/10 RAID0 Optl RW Yes RWBD - OFF 931.0 GB
——————————————————————————————-
Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
FWB=Force WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Note:
VD:一般认为是该硬盘在系统里的设备顺序,一般如果只有raid分区,那么VD=0的就是系统里的/dev/sda,VD=1就是/dev/sdb以此类推,但是如果有jbod的分区,先排列jbod分区,如jbod的到了/dev/sdc,VD0则是/dev/sdd,以此类推;
DG:是在raid卡里配置卷组的顺序;
Raid卡日志收集相关命令
storcli64 /c0 show time 显示raid的时间
storcli64 /c0 show alilog logfile=node-x.alilog 获取alilog,所有的log都包括了
storcli64 /c0 show all logfile=node-x.all.log raid卡的信息
storcli64 /c0 show badblocks 磁盘坏道的信息
perccli64 /c0 show events filter=fatal 显示事件级别为fatal的,可以获取所有毁灭性事件的信息,发现磁盘故障或raid卡故障
perccli64 /c0 show cc 数据一致性检测,raid1以上的级别多个盘的数据是需要进行一致性检测的,但是单盘raid0可能是不需要的,是否影响性能不确定
Section Five : Smartctl Get Error info of Disks
Common Commands Usage Description
—scan Scan for devices
—scan-open Scan for devices and try to open each device
-x, —xall Show all information for device
-a, —all Show all SMART information for device
-i, —info Show identity information for device
-d TYPE, —device=TYPE Specify device type to one of: ata, scsi, nvme[,NSID], sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbprolific, usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test
-s VALUE, —smart=VALUE Enable/disable SMART on device (on/off)
-o VALUE, —offlineauto=VALUE(ATA) Enable/disable automatic offline testing on device (on/off)
-S VALUE, —saveauto=VALUE(ATA) Enable/disable Attribute autosave on device (on/off)
-H, —health Show device SMART health status
-c, —capabilities(ATA,NVMe) Show device SMART capabilities
-A, —attributes Show device SMART vendor-specific Attributes and values
-l TYPE, —log=TYPE Show device log. TYPE: error, selftest, selective, directory[,g|s],
xerror[,N][,error], xselftest[,N][,selftest],
background, sasphy[,reset], sataphy[,reset],
scttemp[sts,hist], scttempint,N[,p],
scterc[,N,M], devstat[,N], ssd,
gplog,N[,RANGE], smartlog,N[,RANGE],
nvmelog,N,SIZE
-t TEST, —test=TEST Run test. TEST: offline, short, long, conveyance, force, vendor,N,
select,M-N, pending,N, afterselect,[on|off]
-X, —abort Abort any non-captive test on device
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Get info for /dev/sdf
查看所有设备列表
[root@node-15 ~]# smartctl —scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/sdd -d scsi # /dev/sdd, SCSI device
/dev/sde -d scsi # /dev/sde, SCSI device
/dev/sdf -d scsi # /dev/sdf, SCSI device
/dev/sdg -d scsi # /dev/sdg, SCSI device
/dev/sdh -d scsi # /dev/sdh, SCSI device
/dev/sdi -d scsi # /dev/sdi, SCSI device
/dev/sdj -d scsi # /dev/sdj, SCSI device
/dev/sdk -d scsi # /dev/sdk, SCSI device
/dev/sdl -d scsi # /dev/sdl, SCSI device
/dev/sdm -d scsi # /dev/sdm, SCSI device
/dev/sdn -d scsi # /dev/sdn, SCSI device
/dev/sdo -d scsi # /dev/sdo, SCSI device
/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device
/dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device
/dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device
/dev/bus/0 -d megaraid,9 # /dev/bus/0 [megaraid_disk_09], SCSI device
/dev/bus/0 -d megaraid,10 # /dev/bus/0 [megaraid_disk_10], SCSI device
/dev/bus/0 -d megaraid,11 # /dev/bus/0 [megaraid_disk_11], SCSI device
/dev/bus/0 -d megaraid,12 # /dev/bus/0 [megaraid_disk_12], SCSI device
/dev/bus/0 -d megaraid,13 # /dev/bus/0 [megaraid_disk_13], SCSI device
/dev/bus/0 -d megaraid,14 # /dev/bus/0 [megaraid_disk_14], SCSI device
/dev/bus/0 -d megaraid,15 # /dev/bus/0 [megaraid_disk_15], SCSI device
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Note:
通过前面的章节我们定位到了磁盘/dev/sdf在perccli里的DID即device_id为6,也就是/dev/bus/0 -d megaraid,6
1
2
3
查看磁盘信息
[root@node-15 ~]# smartctl -i -d megaraid,6 /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Constellation.2 (SATA)
Device Model: ST91000640NS
Serial Number: 9XGA228L
LU WWN Device Id: 5 000c50 0918f2f8a
Add. Product Id: DELL(tm)
Firmware Version: AA63
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jan 11 11:28:46 2019 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
查看磁盘的属性信息
一般此处可以用来查看磁盘的整体健康状态指标参数
针对以下输出信息,字段的解释
ID:属性ID,通常是一个1到255之间的十进制或十六进制的数字。
ATTRIBUTE_NAME:硬盘制造商定义的属性名。
FLAG:属性操作标志(可以忽略)。
VALUE:这是表格中最重要的信息之一,代表给定属性的标准化值,在1到253之间。253意味着最好情况,1意味着最坏情况。取决于属性和制造商,初始化VALUE可以被设置成100或200.
WORST:所记录的最小VALUE。
THRESH:在报告硬盘FAILED状态前,WORST可以允许的最小值,也就是WORST如果小于THRESH,磁盘就会报告FAILED。
TYPE:属性的类型(Pre-fail或Oldage)。Pre-fail类型的属性可被看成一个关键属性,表示参与磁盘的整体SMART健康评估(PASSED/FAILED)。如果任何Pre-fail类型的属性故障,那么可视为磁盘将要发生故障。另一方面,Oldage类型的属性可被看成一个非关键的属性(如正常的磁盘磨损),表示不会使磁盘本身发生故障。
UPDATED:表示属性的更新频率。Offline代表磁盘上执行离线测试的时间。
WHEN_FAILED:如果VALUE小于等于THRESH,会被设置成“FAILING_NOW”;如果WORST小于等于THRESH会被设置成“In_the_past”;如果都不是,会被设置成“-”。在“FAILING_NOW”情况下,需要尽快备份重要文件,特别是属性是Pre-fail类型时。“In_the_past”代表属性已经故障了,但在运行测试的时候没问题。“-”代表这个属性从没故障过。
RAW_VALUE:制造商定义的原始值,从VALUE派生。
[root@node-15 ~]# smartctl -A -d megaraid,6 /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x010f 081 038 044 Pre-fail Always In_the_past 151546765
3 Spin_Up_Time 0x0103 094 094 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 21
5 Reallocated_Sector_Ct 0x0133 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 085 060 030 Pre-fail Always - 338813105
9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 18784
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 21
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 1710
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 069 053 045 Old_age Always - 31 (Min/Max 24/40)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 852
194 Temperature_Celsius 0x0022 031 047 000 Old_age Always - 31 (0 14 0 0 0)
195 Hardware_ECC_Recovered 0x001a 117 099 000 Old_age Always - 151546765
197 Current_Pending_Sector 0x0012 084 084 000 Old_age Always - 688
198 Offline_Uncorrectable 0x0010 084 084 000 Old_age Offline - 688
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 8093 (164 214 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1870535293
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1530387871
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
查看磁盘的健康检测状态
Note:
关于以下检测结果,说明检测结果是PASSED的,就是磁盘还可以使用,但是列出了一条检测异常的WORST<THRESH,TYPE是Pre-fail,WHEN_FAILED是In_the_past,说明预测这个盘快坏了。
[root@node-15 ~]# smartctl -H -d megaraid,6 /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x010f 081 038 044 Pre-fail Always In_the_past 151546765
查看磁盘的错误日志
[root@node-15 ~]# smartctl -l error -d megaraid,6 /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 46431 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It “wraps” after 49.710 days.
Error 46431 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — ———————— ——————————
42 00 00 ff ff ff 4f 00 46d+15:15:32.968 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:29.901 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:26.825 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:23.965 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT
Error 46430 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — ———————— ——————————
42 00 00 ff ff ff 4f 00 46d+15:15:29.901 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:26.825 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:23.965 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:18.093 READ VERIFY SECTOR(S) EXT
Error 46429 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — ———————— ——————————
42 00 00 ff ff ff 4f 00 46d+15:15:26.825 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:23.965 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:18.093 READ VERIFY SECTOR(S) EXT
b0 da 00 00 4f c2 00 00 46d+15:15:17.838 SMART RETURN STATUS
Error 46428 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — ———————— ——————————
42 00 00 ff ff ff 4f 00 46d+15:15:23.965 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:18.093 READ VERIFY SECTOR(S) EXT
b0 da 00 00 4f c2 00 00 46d+15:15:17.838 SMART RETURN STATUS
2f 00 01 e0 00 00 40 00 46d+15:15:17.703 READ LOG EXT
Error 46427 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — ———————— ——————————
42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT
42 00 00 ff ff ff 4f 00 46d+15:15:18.093 READ VERIFY SECTOR(S) EXT
b0 da 00 00 4f c2 00 00 46d+15:15:17.838 SMART RETURN STATUS
2f 00 01 e0 00 00 40 00 46d+15:15:17.703 READ LOG EXT
42 00 00 ff ff ff 4f 00 46d+15:15:15.276 READ VERIFY SECTOR(S) EXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
补充
如果没有开启磁盘的smart可以通过-s on device开启<br /> 一般来说如果samrtctl -i 获取info时没有什么信息输出且smart support是允许的可用的,那么说明可能需要做test才能获取到-t short/long,该测试不会破坏硬盘上的数据,但对于存储一般不适用离线offline测试<br /> 收集时可以通过-x -a参数获取更全面的磁盘信息<br /> smartctl是可以配置服务的/etc/smartmontools/smartd.conf<br />1<br />2<br />3<br />4
xmayu
关注
————————————————
版权声明:本文为CSDN博主「xmayu」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/u011775882/article/details/119681284