exadata存储相关学习

以下执行的结果,部分来自真实的exadata,部分来自虚拟机模拟的exadata

/ 是根文件系统
/opt/oracle 存放已安装的exadata存储软件
/var/log/oracle 存放存储节点操作系统,并记录崩溃(crash)日志
/dev/md5和/dev/md6是系统分区,活动(active)和镜像副本
/dev/md7和/dev/md8是exadata安装软件安装点、活动(active)和镜像副本
/dev/md11挂在给/var/log/oracle
在任何给定的时间点,一个存储节点上同时只能挂载4个多设备(multidevice,MD )挂载点

查看分区情况,--以下来自真实环境

[root@exaceladm01 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md5        9.8G  3.1G  6.2G  34% /
tmpfs            32G  4.0K   32G   1% /dev/shm
/dev/md7        2.0G  1.2G  702M  63% /opt/oracle
/dev/md4        110M   24M   79M  24% /boot
/dev/md11       2.3G   26M  2.1G   2% /var/log/oracle
[root@exaceladm01 ~]# mdadm -Q -D /dev/md5
/dev/md5:
        Version : 0.90
  Creation Time : Mon Nov 25 14:58:05 2019
     Raid Level : raid1
     Array Size : 10482304 (10.00 GiB 10.73 GB)
  Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 5
    Persistence : Superblock is persistent

    Update Time : Tue Nov 26 11:16:30 2019
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : f2ee267c:e2e0f67e:04894333:532a878b
         Events : 0.24

    Number   Major   Minor   RaidDevice State
       0      65        5        0      active sync   /dev/sdq5
       1      65       21        1      active sync   /dev/sdr5
[root@exaceladm01 ~]# 

查看lun是否为文件系统分区,查看isSystemLun是否为True。 -- 以下来自虚拟环境

CellCLI> list lun '/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02' detail
	 name:              	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02
	 cellDisk:          	 CD_disk02_cell1
	 deviceName:        	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02
	 diskType:          	 HardDisk
	 id:                	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02
	 isSystemLun:       	 FALSE
	 lunAutoCreate:     	 FALSE
	 lunSize:           	 1G
	 physicalDrives:    	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02
	 raidLevel:         	 "RAID 0"
	 status:            	 normal

CellCLI> 

几个概念说明
DISK -- LUN -- CELLDISK -- GRIDDISK -- ASM Disk  (其中celldisk和griddisk是1:n关系)
存储节点中同时包含传统物理硬盘和闪存模块
celldisk,它可以将一个LUN细分成更小的分区,叫做Griddisk 。
从闪存模块中构建的celldisk可以细分为闪存缓存(flash cache)或grid disk.物理磁盘类的只能细分为griddisk。
只有griddisk可以映射成asm磁盘。

物理盘是第一层抽象,每个物理盘被映射和呈现为一个LUN。在Exadata数据库一体机初始部署时自动创建,不需要人为干预。(所以没有create lun这个命令,从help create lun就可以看出)
将存在的LUN配置成celldisk。存储节点上映射号的LUN,可以创建成celldisk。一旦创建了celldisk,就可以细分成一个或多个griddisk。然后可以由ASM实例采纳,作为ASM磁盘组的候选盘(candidate)备用。

闪存缓存和基于闪存的Grid disk(即Flash Grid Disk)的主要区别相当简单。闪存缓存会自动缓存数据库最近访问的对象。

管理存储的系统用户
每个exadata存储服务器会配置3个默认用户,分别是root,celladmin,cellmonitor
[root@exacell01 ~]# id cellmonitor
uid=1001(cellmonitor) gid=501(cellmonitor) groups=501(cellmonitor),502(cellusers)
[root@exacell01 ~]# id celladmin
uid=1000(celladmin) gid=500(celladmin) groups=500(celladmin),502(cellusers)
[root@exacell01 ~]#

root , 超级用户权限,用来启停存储服务器
celladmin , 用于完成存储节点的管理任务。例如craete,alter,modify,使用cellclihe dcli工具
cellmonitor ,用于存储节点监控任务。

查看cellcli工具所有的可用命令,使用help 。

CellCLI> help

 HELP [topic]
   Available Topics:
        ALTER
        ALTER ALERTHISTORY
        ALTER CELL
        ALTER CELLDISK
        ALTER FLASHCACHE
        ALTER GRIDDISK
        ALTER IBPORT
        ALTER IORMPLAN
        ALTER LUN
        ALTER PHYSICALDISK
        ALTER QUARANTINE
        ALTER THRESHOLD
        ASSIGN KEY
        CALIBRATE
        CREATE
        CREATE CELL
        CREATE CELLDISK
        CREATE FLASHCACHE
        CREATE FLASHLOG
        CREATE GRIDDISK
        CREATE KEY
        CREATE QUARANTINE
        CREATE THRESHOLD
        DESCRIBE
        DROP
        DROP ALERTHISTORY
        DROP CELL
        DROP CELLDISK
        DROP FLASHCACHE
        DROP FLASHLOG
        DROP GRIDDISK
        DROP QUARANTINE
        DROP THRESHOLD
        EXPORT CELLDISK
        IMPORT CELLDISK
        LIST
        LIST ACTIVEREQUEST
        LIST ALERTDEFINITION
        LIST ALERTHISTORY
        LIST CELL
        LIST CELLDISK
        LIST FLASHCACHE
        LIST FLASHCACHECONTENT
        LIST FLASHLOG
        LIST GRIDDISK
        LIST IBPORT
        LIST IORMPLAN
        LIST KEY
        LIST LUN
        LIST METRICCURRENT
        LIST METRICDEFINITION
        LIST METRICHISTORY
        LIST PHYSICALDISK
        LIST QUARANTINE
        LIST THRESHOLD
        SET
        SPOOL
        START

CellCLI> 

列出本存储节点上的所有闪存硬盘 。--以下来自虚拟环境

CellCLI> list LUN where disktype='flashdisk'
	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01	 normal
	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02	 normal
	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH03	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH03	 normal
	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH04	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH04	 normal

CellCLI> 

查看LUN节点的详细信息。-- 以下来自虚拟环境

CellCLI> list lun where celldisk='FD_00_cell1'
	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01	 normal

CellCLI> list lun where celldisk='FD_00_cell1' detail
	 name:              	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01
	 cellDisk:          	 FD_00_cell1
	 deviceName:        	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01
	 diskType:          	 FlashDisk
	 id:                	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01
	 isSystemLun:       	 FALSE
	 lunAutoCreate:     	 FALSE
	 lunSize:           	 1G
	 physicalDrives:    	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01
	 raidLevel:         	 "RAID 0"
	 status:            	 normal

CellCLI> 

查看物理硬盘的详细信息  -- 以下来自虚拟环境

CellCLI> list physicaldisk '/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk01' detail
	 name:              	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk01
	 diskType:          	 HardDisk
	 luns:              	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk01
	 physicalInsertTime:	 2019-11-26T05:47:08+08:00
	 physicalSize:      	 1G
	 status:            	 normal

CellCLI> list physicaldisk '/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02' detail
	 name:              	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02
	 diskType:          	 FlashDisk
	 luns:              	 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02
	 physicalInsertTime:	 2019-11-26T05:47:08+08:00
	 physicalSize:      	 1G
	 status:            	 normal

CellCLI> 

查看celldisk的详细信息。比如celldisk到griddisk的映射关系,大小,状态等等。 -- 以下来自虚拟环境

CellCLI> list griddisk gd20 detail;
	 name:              	 gd20
	 asmDiskgroupName:  	 DATA_ADD
	 asmDiskName:       	 GD20
	 asmFailGroupName:  	 GD20
	 availableTo:       	 
	 cachingPolicy:     	 default
	 cellDisk:          	 cd20
	 comment:           	 
	 creationTime:      	 2019-11-26T10:07:35+08:00
	 diskType:          	 HardDisk
	 errorCount:        	 0
	 id:                	 315f8adf-c9db-4514-a144-89140a61dc84
	 offset:            	 48M
	 size:              	 1.953125G
	 status:            	 active

CellCLI> 

创建celldisk
以下命令可以创建12个cell disk。每个celldisk是一个lun 。遵从默认的命名约定。
create delldisk all harddisk

创建griddisk
以下命令会创建一个grid disk,使用物理盘的最外圈部分的磁道以便获得高性能
create griddisk all harddisk prefix=data,size 500G  

以下命令创建一个grid disk,使用内圈的磁道提供给对IO操作相对不那么重要的应用
create griddisk all prefix=FRA

配置flash griddisk
下面的操作是先删除当前的内存配置,然后用非默认的大小重建
drop flashcache
create flashcache all size=200G
create griddisk all flashdisk

exadata存储配置完毕后,下一步就是配置数据库节点,使其使用grid disk。cellinit.ora和cellip.ora文件必须在数据库节点上配置,以便使其能连接到存储节点去使用griddisk。
cellinit.ora文件中存放了数据库服务器节点的IP地址(也就是计算节点的私有IP)
cellip.ora文件中存放所有存储节点的IP地址 (也就是存储节点的私有IP)

创建asm磁盘组。从两个存储节点cell01和cell02中拿几块grid disk来创建一个存放数据的高荣誉的磁盘组。(略,和普通方式一样。) -- 以下SQL语句摘自官方文档
https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-administering-asm.html#GUID-555BD6CC-6668-4365-A20C-0C119C059EFA

SQL> CREATE DISKGROUP data HIGH REDUNDANCY 

-- These grid disks are on cell01
   DISK 
   'o/*/data_CD_00_cell01',
   'o/*/data_CD_01_cell01',
   'o/*/data_CD_02_cell01',

-- These grid disks are on cell02
   DISK
   'o/*/data_CD_00_cell02',
   'o/*/data_CD_01_cell02',
   'o/*/data_CD_02_cell02',

-- These disk group attributes must be set for cell access
-- Note that this disk group is set for cell only
   ATTRIBUTE 'compatible.rdbms' = '11.2.0.4', 
             'content.type' = 'data',
             'compatible.asm' = '19.0.0.0',
             'au_size' = '4M',
             'cell.smart_scan_capable' = 'TRUE';

管理存储服务器
imageinfo ,获取存储软件当前版本的详细信息。比如kernel版本,OS版本,活动镜像办法,节点的boot分区等等

-- 以下命令结果来自真实环境

root@exacel02 ~]# imageinfo

Kernel version: 2.6.39-400.264.1.el6uek.x86_64 #1 SMP Wed Aug 26 16:42:25 PDT 2015 x86_64
Cell version: OSS_12.1.2.2.0_LINUX.X64_150917
Cell rpm version: cell-12.1.2.2.0_LINUX.X64_150917-1.x86_64

Active image version: 12.1.2.2.0.150917
Active image activated: 2019-11-21 15:26:44 +0800
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7

Cell boot usb partition: /dev/sdac1
Cell boot usb version: 12.1.2.2.0.150917

Inactive image version: undefined
Rollback to the inactive partitions: Impossible
[root@exacel02 ~]# 

imagehistory,查看在本节点上安装过的所有软件版本
[root@exacel02 config]# imagehistory
Version                              : 12.1.2.2.0.150917
Image activation date                : 2019-11-21 15:26:44 +0800
Imaging mode                         : fresh
Imaging status                       : success

[root@exacel02 config]# 

查看并删除存储节点上的旧的告警历史信息。注意3_1,3_2,3_3要一起删除,不能只删除一个,否则会报错。删除后,可以看到只剩余2了。  -- 以下内容来自虚拟环境

CellCLI> list alerthistory
	 1  	 2019-11-26T04:50:18+08:00	 critical	 "RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
	 2  	 2019-11-26T05:39:50+08:00	 critical	 "RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
	 3_1	 2019-11-26T05:41:37+08:00	 warning 	 "Hugepage allocation failure in service cellsrv.  Number of Hugepages allocated is 748, failed to allocate 152"
	 3_2	 2019-11-26T05:49:18+08:00	 warning 	 "Hugepage allocation failure in service cellsrv.  Number of Hugepages allocated is 840, failed to allocate 60"
	 3_3	 2019-11-26T08:23:06+08:00	 clear   	 "Hugepage allocation was successful in service cellsrv."

CellCLI> drop alerthistory 1
Alert 1 successfully dropped


CellCLI> drop alerthistory 3_1

CELL-02643: DROP ALERTHISTORY command did not include all members of the alert sequence for 3_1. All members of the sequence must be dropped together.

CellCLI> drop alerthistory 3_1,3_2,3_3
Alert 3_1 successfully dropped
Alert 3_2 successfully dropped
Alert 3_3 successfully dropped

CellCLI> 
CellCLI> list alerthistory
	 2	 2019-11-26T05:39:50+08:00	 critical	 "RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"

CellCLI> 

存储服务器故障排查
绝大多数的诊断工具都放在/opt/oracle.SupportTool文件夹下。比如
sundiag.sh  --产生的诊断信息在/tmp/sundiag_Filesystem目录下,带时间戳的tar文件
exawatch(也就是替换掉了之前低版本的OSWatch).是随系统自动运行的。收集存储节点上的信息,并存放在/opt/oracle.ExaWatcher/archive目录下 。如果要从ExaWatcher产生的日志中生成或者抽取部分内容,可以使用GetExaWatcherResult.sh脚本。
exacheck -- 这个就不说了,都知道。
CheckHWnFWProfile -- 校验硬件组件和固件组件的详细情况。如果当前的硬件版本和固件版本是其接受的正确版本,则返回success。

存储节点的启停

1 确认节点上的GRIDDisk离线不会影响ASM实例。如果所有列出的Grid disk结果都是yes.表名可以安全地将所有grid disk离线,ASM不会受任何影响。 -- 以下内容来自虚拟环境

list griddisk attributes name,asmdeactivationoutcome 
CellCLI> list griddisk attributes name,asmdeactivationoutcome 
	 DATA_CD_disk01_cell1	 Yes
	 DATA_CD_disk02_cell1	 Yes
	 DATA_CD_disk03_cell1	 Yes
	 DATA_CD_disk04_cell1	 Yes
	 DATA_CD_disk05_cell1	 Yes
	 DATA_CD_disk06_cell1	 Yes
	 DATA_CD_disk07_cell1	 Yes
	 DATA_CD_disk08_cell1	 Yes
	 DATA_CD_disk09_cell1	 Yes
	 DATA_CD_disk10_cell1	 Yes
	 DATA_CD_disk11_cell1	 Yes
	 DATA_CD_disk12_cell1	 Yes
	 gd13                	 Yes
	 gd14                	 Yes
	 gd15                	 Yes
	 gd16                	 Yes
	 gd17                	 Yes
	 gd18                	 Yes
	 gd19                	 Yes
	 gd20                	 Yes

CellCLI> 

2 确认步骤1结果全部为yes后,接下来执行

alter griddisk all inactive 
CellCLI> alter griddisk all inactive  
GridDisk DATA_CD_disk01_cell1 successfully altered
GridDisk DATA_CD_disk02_cell1 successfully altered
GridDisk DATA_CD_disk03_cell1 successfully altered
GridDisk DATA_CD_disk04_cell1 successfully altered
GridDisk DATA_CD_disk05_cell1 successfully altered
GridDisk DATA_CD_disk06_cell1 successfully altered
GridDisk DATA_CD_disk07_cell1 successfully altered
GridDisk DATA_CD_disk08_cell1 successfully altered
GridDisk DATA_CD_disk09_cell1 successfully altered
GridDisk DATA_CD_disk10_cell1 successfully altered
GridDisk DATA_CD_disk11_cell1 successfully altered
GridDisk DATA_CD_disk12_cell1 successfully altered
GridDisk gd13 successfully altered
GridDisk gd14 successfully altered
GridDisk gd15 successfully altered
GridDisk gd16 successfully altered
GridDisk gd17 successfully altered
GridDisk gd18 successfully altered
GridDisk gd19 successfully altered
GridDisk gd20 successfully altered

3 一旦关闭节点上的grid disk。应执行asmdeactivationoutcome来查看输出,并使用list griddisk确认所有的griddisk已经离线

CellCLI> list griddisk
	 DATA_CD_disk01_cell1	 inactive
	 DATA_CD_disk02_cell1	 inactive
	 DATA_CD_disk03_cell1	 inactive
	 DATA_CD_disk04_cell1	 inactive
	 DATA_CD_disk05_cell1	 inactive
	 DATA_CD_disk06_cell1	 inactive
	 DATA_CD_disk07_cell1	 inactive
	 DATA_CD_disk08_cell1	 inactive
	 DATA_CD_disk09_cell1	 inactive
	 DATA_CD_disk10_cell1	 inactive
	 DATA_CD_disk11_cell1	 inactive
	 DATA_CD_disk12_cell1	 inactive
	 gd13                	 inactive
	 gd14                	 inactive
	 gd15                	 inactive
	 gd16                	 inactive
	 gd17                	 inactive
	 gd18                	 inactive
	 gd19                	 inactive
	 gd20                	 inactive

CellCLI> 

4 现在可以安全地关闭,重启和下线这个节点了。使用操作系统命令。

shutdown -h now

备注:如果存储节点关闭很长时间。则需要调整ASM的disk_repair_attribute参数。防止ASM检测到离线超期后将其删除。
alter diskgroup DG_DATA set attribute 'disk_repair_time'='8H'

存储节点的启动

1 alter griddisk all active 
CellCLI> alter griddisk all active
GridDisk DATA_CD_disk01_cell1 successfully altered
GridDisk DATA_CD_disk02_cell1 successfully altered
GridDisk DATA_CD_disk03_cell1 successfully altered
GridDisk DATA_CD_disk04_cell1 successfully altered
GridDisk DATA_CD_disk05_cell1 successfully altered
GridDisk DATA_CD_disk06_cell1 successfully altered
GridDisk DATA_CD_disk07_cell1 successfully altered
GridDisk DATA_CD_disk08_cell1 successfully altered
GridDisk DATA_CD_disk09_cell1 successfully altered
GridDisk DATA_CD_disk10_cell1 successfully altered
GridDisk DATA_CD_disk11_cell1 successfully altered
GridDisk DATA_CD_disk12_cell1 successfully altered
GridDisk gd13 successfully altered
GridDisk gd14 successfully altered
GridDisk gd15 successfully altered
GridDisk gd16 successfully altered
GridDisk gd17 successfully altered
GridDisk gd18 successfully altered
GridDisk gd19 successfully altered
GridDisk gd20 successfully altered

CellCLI> 

2 list griddisk attributes name,asmmodestatus   -- 我这里没有开启计算节点,计算节点一直关闭的。只开了存储节点来模拟,所以结果和正常的 可能不一样。这一步。

CellCLI> list griddisk attributes name,asmmodestatus 
	 DATA_CD_disk01_cell1	 UNKNOWN
	 DATA_CD_disk02_cell1	 UNKNOWN
	 DATA_CD_disk03_cell1	 UNKNOWN
	 DATA_CD_disk04_cell1	 UNKNOWN
	 DATA_CD_disk05_cell1	 UNKNOWN
	 DATA_CD_disk06_cell1	 UNUSED
	 DATA_CD_disk07_cell1	 UNUSED
	 DATA_CD_disk08_cell1	 UNUSED
	 DATA_CD_disk09_cell1	 UNUSED
	 DATA_CD_disk10_cell1	 UNUSED
	 DATA_CD_disk11_cell1	 UNUSED
	 DATA_CD_disk12_cell1	 UNUSED
	 gd13                	 UNKNOWN
	 gd14                	 UNKNOWN
	 gd15                	 UNKNOWN
	 gd16                	 UNKNOWN
	 gd17                	 UNKNOWN
	 gd18                	 UNKNOWN
	 gd19                	 UNKNOWN
	 gd20                	 UNKNOWN

CellCLI> 

3 list cell ,list griddisk,等命令。书上只执行了list cell.

CellCLI> list cell detail
	 name:              	 cell1
	 bbuTempThreshold:  	 60
	 bbuChargeThreshold:	 800
	 bmcType:           	 absent
	 cellVersion:       	 OSS_11.2.3.2.0_LINUX.X64_120713
	 cpuCount:          	 1
	 diagHistoryDays:   	 7
	 fanCount:          	 1/1
	 fanStatus:         	 normal
	 flashCacheMode:    	 WriteThrough
	 id:                	 66c6e844-c66a-4661-8291-359459443084
	 interconnectCount: 	 2
	 interconnect1:     	 eth0
	 iormBoost:         	 0.0
	 ipaddress1:        	 10.10.10.1/24
	 kernelVersion:     	 2.6.39-400.215.10.el5uek
	 makeModel:         	 Fake hardware
	 metricHistoryDays: 	 7
	 offloadEfficiency: 	 1,000.0
	 powerCount:        	 1/1
	 powerStatus:       	 normal
	 releaseVersion:    	 11.2.3.2.0
	 releaseTrackingBug:	 14212264
	 status:            	 online
	 temperatureReading:	 0.0
	 temperatureStatus: 	 normal
	 upTime:            	 0 days, 2:50
	 cellsrvStatus:     	 running
	 msStatus:          	 running
	 rsStatus:          	 running

CellCLI> list cell
	 cell1	 online

CellCLI> 

CellCLI> list griddisk
	 DATA_CD_disk01_cell1	 active
	 DATA_CD_disk02_cell1	 active
	 DATA_CD_disk03_cell1	 active
	 DATA_CD_disk04_cell1	 active
	 DATA_CD_disk05_cell1	 active
	 DATA_CD_disk06_cell1	 active
	 DATA_CD_disk07_cell1	 active
	 DATA_CD_disk08_cell1	 active
	 DATA_CD_disk09_cell1	 active
	 DATA_CD_disk10_cell1	 active
	 DATA_CD_disk11_cell1	 active
	 DATA_CD_disk12_cell1	 active
	 gd13                	 active
	 gd14                	 active
	 gd15                	 active
	 gd16                	 active
	 gd17                	 active
	 gd18                	 active
	 gd19                	 active
	 gd20                	 active

CellCLI> list griddisk attributes name,asmdeactivationoutcome 
	 DATA_CD_disk01_cell1	 Yes
	 DATA_CD_disk02_cell1	 Yes
	 DATA_CD_disk03_cell1	 Yes
	 DATA_CD_disk04_cell1	 Yes
	 DATA_CD_disk05_cell1	 Yes
	 DATA_CD_disk06_cell1	 Yes
	 DATA_CD_disk07_cell1	 Yes
	 DATA_CD_disk08_cell1	 Yes
	 DATA_CD_disk09_cell1	 Yes
	 DATA_CD_disk10_cell1	 Yes
	 DATA_CD_disk11_cell1	 Yes
	 DATA_CD_disk12_cell1	 Yes
	 gd13                	 Yes
	 gd14                	 Yes
	 gd15                	 Yes
	 gd16                	 Yes
	 gd17                	 Yes
	 gd18                	 Yes
	 gd19                	 Yes
	 gd20                	 Yes

CellCLI> 

处理磁盘问题

在存储服务器上发现磁盘问题时,通常会产生以下的动作
1 检测到性能下降时,celldisk和物理硬盘的状态会变化
2 特定celldisk上的所有griddisk都会离线
3 MS服务通知cellsrv服务,告诉它发现问题,接着cellsrv通知ASM实例将griddisk离线
4 存储节点上的MS服务,然后执行一系列的约束检查来判断硬盘是否需要删除
5 如果硬盘通过了性能检测,MS服务通知cellsrv服务去把所有的celldisk和griddisk上线(online)
6 如果硬盘性能检测失败,celldisk和物理硬盘的状态会被改变,并且硬盘会从现有的可用配置中删除。
7 MS服务通知cellsrv服务关于硬盘的问题。介质,cellsrv服务通知ASM实例去删除节点上的所有griddisk。
8 如果配置了ASR,会向oracle技术支持提交硬盘替换的服务请求(SR).
9 可以用热备盘来替换故障盘或是向oracle申请替换硬盘。


当有信息显示硬盘的状态十分糟糕的时候,首要任务是通过存储节点的告警历史信息或者检查存储节点日志信息,定位故障硬盘的具体名称,系统中的位置,物理位置和slot号。同时也要参考下ASM的告警日志,确定ASM已经把故障硬盘离线(已经删除了硬盘),替换硬盘之前ASM已经完成了数据的重新分布(rebalance).
查看告警信息的命令
list alerthistory
list physicaldisk where disktype=harddisk and status=critical detail
list physicaldisk where disktype=harddisk and status like *.*failure.** detail
确认celldisk的相关griddisk已经删除。asm实例已经完成数据的重新分布操作(reblance)
select name,state from v$asm_diskgroup;
select * from v$asm_operation
替换存储节点上的物理盘3分钟后,所有的griddisk和celldisk会被自动重建,随后添加到各自的磁盘组,然后进行数据的重新分布。

END

发布了754 篇原创文章 · 获赞 31 · 访问量 19万+

猜你喜欢

转载自blog.csdn.net/xxzhaobb/article/details/103313774