Author | JiekeXu
Source | Public account JiekeXu DBA road (ID: JiekeXu_IT)
If you need to reprint, please contact us for authorization | (Personal WeChat ID: JiekeXu_DBA)
Hello everyone, I am JiekeXu. I am very happy to meet you again. Today I will take a look at the problem of ASM disk loss due to shutdown and restart, and the database cannot be started. Welcome to click on the blue text "JiekeXu DBA Road" above to follow my public account, marked Star or pin it to the top, more useful information will arrive as soon as possible!
Problem phenomenon
The thing is like this, please take a look. During the Mid-Autumn Festival and the National Day, the test computer room needed to cut off the power due to line changes, so all the test machines needed to be shut down. After waiting for the changes to be completed, one node database of the RAC in the test environment could not be started. The inspection found that The ARCH disk of node 2 is not mounted normally.
Log in to the ASM instance and check the ASM disk and path. It is found that the archive disk ARCH is not mounted and the disk path is ‘/dev/sde’.
su - grid
sqlplus / as sysasm
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0
SQL> set lin 1000 pagesize 999
col PATH for a30
col NAME for a15
col FAILGROUP for a15
select GROUP_NUMBER,DISK_NUMBER,OS_MB/1024,TOTAL_MB/1024,FREE_MB/1024,NAME,FAILGROUP,PATH,FAILGROUP_TYPE,header_status,state from v$asm_disk order by 1;
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB/1024,FREE_MB/1024,USABLE_FILE_MB/1024,REQUIRED_MIRROR_FREE_MB,HOT_USED_MB,COLD_USED_MB/1024 from v$asm_diskgroup;
SQL> SQL> SQL>
GROUP_NUMBER DISK_NUMBER OS_MB/1024 TOTAL_MB/1024 FREE_MB/1024 NAME FAILGROUP PATH FAILGRO HEADER_STATU STATE
------------ ----------- ---------- ------------- ------------ --------------- --------------- ------------------------------ ------- ------------ --------
0 0 100 0 0 /dev/sde REGULAR MEMBER NORMAL
2 2 200 200 5.91796875 DATA_0002 DATA_0002 /dev/sdi REGULAR MEMBER NORMAL
2 0 200 200 5.8828125 DATA_0000 DATA_0000 /dev/sdc REGULAR MEMBER NORMAL
2 1 200 200 5.81640625 DATA_0001 DATA_0001 /dev/sdd REGULAR MEMBER NORMAL
3 0 3 3 2.6640625 OCR_0000 OCR_0000 /dev/sdf REGULAR MEMBER NORMAL
3 1 3 3 2.66015625 OCR_0001 OCR_0001 /dev/sdg REGULAR MEMBER NORMAL
3 2 3 3 2.66015625 OCR_0002 OCR_0002 /dev/sdh REGULAR MEMBER NORMAL
7 rows selected.
SQL>
GROUP_NUMBER NAME STATE TYPE TOTAL_MB/1024 FREE_MB/1024 USABLE_FILE_MB/1024 REQUIRED_MIRROR_FREE_MB HOT_USED_MB COLD_USED_MB/1024
------------ --------------- ----------- ------ ------------- ------------ ------------------- ----------------------- ----------- -----------------
0 ARCH DISMOUNTED 0 0 0 0 0 0
2 DATA MOUNTED EXTERN 600 17.6171875 17.6171875 0 0 582.382813
3 OCR MOUNTED NORMAL 9 7.984375 2.4921875 3072 0 1.015625
Then use the lsblk command at the operating system layer to check that the size of /dev/sde is 100G, and then use the ASM command to mount the ARCH disk but it fails, and an error is reported that the disk is lost.
[root@jiekexu2 dev]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 200G 0 disk
|-sda1 8:1 0 1G 0 part /boot
`-sda2 8:2 0 199G 0 part
|-rootvg-lvroot 253:0 0 191.1G 0 lvm /
`-rootvg-lvswap 253:1 0 15.9G 0 lvm [SWAP]
sdb 8:16 0 8G 0 disk
`-rootvg-lvswap 253:1 0 15.9G 0 lvm [SWAP]
sdc 8:32 0 200G 0 disk
sdd 8:48 0 200G 0 disk
sde 8:64 0 100G 0 disk
sdf 8:80 0 3G 0 disk
sdg 8:96 0 3G 0 disk
sdh 8:112 0 3G 0 disk
sdi 8:128 0 200G 0 disk
sdj 8:144 0 100G 0 disk
sdk 8:160 0 200G 0 disk
sr0 11:0 1 1024M 0 rom
jiekexu2:/home/grid(+ASM2)$ asmcmd
ASMCMD> mount ARCH
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1" (DBD ERROR: OCIStmtExecute)
ASMCMD> exit
Next, through the alter log of the ASM instance, we can indeed see that the ARCH disk has two disks, one is (/dev/sde), and the other is empty () and cannot be seen. So let’s take a look at the ARCH disk and related permissions of node 1.
Node 1/2 permission comparison
Through node 1, it is found that the ARCH disk group has two disks (/dev/sde) and (/dev/sdj), and the group is grid. :asmadmin, but the permissions of node 2 (/dev/sde) belong to the group normally, and the group (/dev/sdj) belongs to root:disk. This is obviously a problem. It is precisely because the group permissions here are incorrect. The first check above When the disk is an ARCH disk, only one piece (/dev/sde) is displayed.
Solve the problem
Let’s check if there is a problem with the udev configuration file (the CRT used here was opened with more, and the last line was not found“OWNER=" A line break appears later, and then we continue to the next step), then we modify the (/dev/sdj) group to grid:asmadmin, trigger the udev rule again, and then try to mount the disk group.
jiekexu2:/home/grid(+ASM2)$ more /etc/udev/rules.d/99-oracle-asmdisks.rules
KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c29bd4635c80a3011779bc1a7f99", SYMLINK+="asmdisks/asmdiskb", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c2947f3b25a26a2bd31496cd2f59", SYMLINK+="asmdisks/asmdiskc", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c29ad3307c8874aaeb6b5c83f2f8", SYMLINK+="asmdisks/asmdiskd", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c29756eabbaea24471e77a46d1bf", SYMLINK+="asmdisks/asmdiske", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c296f7ef4f660d24ecdc32fd5216", SYMLINK+="asmdisks/asmdiskf", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c29ef6de29e246eec5d21ae13f74", SYMLINK+="asmdisks/asmdiskg", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c2995925bb888b965036ad9d9807", SYMLINK+="asmdisks/asmdiski", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c294342f0d7dd282c4a632d82c47", SYMLINK+="asmdisks/asmdiskj", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
jiekexu2:/home/grid(+ASM2)$ ll /dev/sde
brw-rw---- 1 grid asmadmin 8, 64 Oct 1 09:46 /dev/sde
jiekexu2:/home/grid(+ASM2)$ ll /dev/sdj
brw-rw---- 1 root disk 8, 144 Oct 1 09:19 /dev/sdj
jiekexu2:/home/grid(+ASM2)$ exit
logout
[root@jiekexu2 ~]# udevadm control --reload-rules
[root@jiekexu2 ~]# ll /dev/sdj
brw-rw---- 1 root disk 8, 144 Oct 1 09:19 /dev/sdj
[root@jiekexu2 ~]# ll /dev/sde
brw-rw---- 1 grid asmadmin 8, 64 Oct 1 09:46 /dev/sde
[root@jiekexu2 ~]# udevadm trigger
[root@jiekexu2 ~]# ll /dev/sdj
brw-rw---- 1 root disk 8, 144 Oct 1 09:50 /dev/sdj
[root@jiekexu2 ~]# ll /dev/sde
brw-rw---- 1 grid asmadmin 8, 64 Oct 1 09:50 /dev/sde
[root@jiekexu2 ~]# chown grid:asmadmin /dev/sdj
[root@jiekexu2 ~]# ll /dev/sdj
brw-rw---- 1 grid asmadmin 8, 144 Oct 1 09:50 /dev/sdj
Next, we log in to the ASM instance and check the disk path to see that sdj and sde can be displayed normally, and ARCH can also be mounted normally.
su - grid
sqlplus / as sysasm
alter diskgroup ARCH mount;
Log in to the database and find that the database instance has started automatically.
In another set of RAC, the database on node 2 cannot be started, and the DATA disk group cannot be mounted normally. After checking the ASM instance log, it is found that a disk is missing from the DATA disk group, and the missing disk permission has also changed to root:disk. Follow the same method, modify the group and remount to return to normal.
2023-10-01T09:18:17.259212+08:00
ERROR: /* ASMCMD cguid:cae29900570ecf7ebfd17fbf776d2840 cname:jieke-rac-scan nodename:jieke-rac-87 */ALTER DISKGROUP data MOUNT
2023-10-01T09:18:17.650518+08:00
ASM Health Checker found 1 new failures
2023-10-01T10:21:30.690039+08:00
SQL> /* ASMCMD cguid:cae29900570ecf7ebfd17fbf776d2840 cname:jieke-rac-scan nodename:jieke-rac-87 */ALTER DISKGROUP DATA MOUNT
2023-10-01T10:21:30.795687+08:00
NOTE: cache registered group DATA 1/0x36302669
NOTE: cache began mount (not first) of group DATA 1/0x36302669
NOTE: Assigning number (1,0) to disk (/dev/asmdisks/asm-data)
2023-10-01T10:21:31.238030+08:00
GMON querying group 1 at 38 for pid 37, osid 59371
2023-10-01T10:21:31.269748+08:00
NOTE: Assigning number (1,1) to disk ()
2023-10-01T10:21:31.285726+08:00
GMON querying group 1 at 39 for pid 37, osid 59371
2023-10-01T10:21:31.286854+08:00
NOTE: cache dismounting (clean) group 1/0x36302669 (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 59371, image: oracle@jieke-rac-87 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: LGWR not being messaged to dismount
NOTE: cache dismounted group 1/0x36302669 (DATA)
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x36302669
NOTE: cache deleting context for group DATA 1/0x36302669
2023-10-01T10:21:31.386823+08:00
GMON dismounting group 1 at 40 for pid 37, osid 59371
2023-10-01T10:21:31.387863+08:00
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1"
SQL> set lin 1000 pagesize 999
col PATH for a30
col NAME for a15
col FAILGROUP for a15
select GROUP_NUMBER,DISK_NUMBER,OS_MB/1024,TOTAL_MB/1024,FREE_MB/1024,NAME,FAILGROUP,PATH,FAILGROUP_TYPE,header_status,state from v$asm_disk order by 1;
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB/1024,FREE_MB/1024,USABLE_FILE_MB/1024,REQUIRED_MIRROR_FREE_MB,HOT_USED_MB,COLD_USED_MB/1024 from v$asm_diskgroup;
SQL> SQL> SQL>
GROUP_NUMBER DISK_NUMBER OS_MB/1024 TOTAL_MB/1024 FREE_MB/1024 NAME FAILGROUP PATH FAILGRO HEADER_STATU STATE
------------ ----------- ---------- ------------- ------------ --------------- --------------- ------------------------------ ------- ------------ --------
1 0 300 300 100.4375 DATA_0000 DATA_0000 /dev/asmdisks/asm-data REGULAR MEMBER NORMAL
1 1 300 300 100.449219 DATA_0001 DATA_0001 /dev/asmdisks/asm-data01 REGULAR MEMBER NORMAL
2 0 100 100 89.7578125 FRA_0000 FRA_0000 /dev/asmdisks/asm-fra REGULAR MEMBER NORMAL
3 0 50 50 49.8671875 TEST_0000 TEST_0000 /dev/asmdisks/asm-mgmt REGULAR MEMBER NORMAL
5 3 3 3 2.6484375 OCR_0003 OCR_0003 /dev/asmdisks/asm-ocr3 REGULAR MEMBER NORMAL
5 2 3 3 2.63671875 OCR_0002 OCR_0002 /dev/asmdisks/asm-ocr1 REGULAR MEMBER NORMAL
5 1 3 3 2.64453125 OCR_0001 OCR_0001 /dev/asmdisks/asm-ocr2 REGULAR MEMBER NORMAL
7 rows selected.
SQL>
GROUP_NUMBER NAME STATE TYPE TOTAL_MB/1024 FREE_MB/1024 USABLE_FILE_MB/1024 REQUIRED_MIRROR_FREE_MB HOT_USED_MB COLD_USED_MB/1024
------------ --------------- ----------- ------ ------------- ------------ ------------------- ----------------------- ----------- -----------------
1 DATA MOUNTED EXTERN 600 200.886719 200.886719 0 0 399.113281
2 FRA MOUNTED EXTERN 100 89.7578125 89.7578125 0 0 10.2421875
3 JIEKEXU MOUNTED EXTERN 50 49.8671875 49.8671875 0 0 .1328125
5 OCR MOUNTED NORMAL 9 7.9296875 2.46484375 3072 0 1.0703125
After returning from the National Day holiday, I restarted Node 2. After the restart, I found that the disk ownership group changed back to root:disk. Shutting down and restarting the two sets of RAC caused the shared disk group ownership to change. I found that one thing in common is that the changed disks are all in the same group. It is the last disk (ARCH_0001 and DATA_0001), and this last disk was added later, so the problem may be the udev configuration file, but the last time I checked it through the more command in the CRT, there was no problem. This time I opened it with vi It is found that there is a line break in the last line. There is a line break after "OWNER=". There is a line break in the other set and the last line. Use xshell to open it as shown below:
Just wrap it and delete it, then reload the udev rule.
# vi /etc/udev/rules.d/99-oracle-asmdisks.rules
# udevadm control --reload-rules
# udevadm trigger
Disks in DROPPED status
One day a few months ago, it was suddenly discovered that an OCR disk in this RAC failed, the status changed to FORCING, and the disk name was named with an underscore _DROPPED_0000_OCR. No abnormalities were found in the ASM and cluster logs. Check that the CRS cluster and database are in normal status. You can also start and shut down the cluster normally, but the status of this disk is abnormal. The query information is shown in the figure below:
Solution
There is no problem using lsblk to check the three 3G disks of sdg, sdh and sdi and their corresponding permissions.
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 100G 0 disk
|-sda1 8:1 0 1G 0 part /boot
`-sda2 8:2 0 99G 0 part
|-rootvg-lvroot 253:0 0 91.1G 0 lvm /
`-rootvg-lvswap 253:1 0 7.9G 0 lvm [SWAP]
sdb 8:16 0 100G 0 disk
`-u01vg-lvu01 253:2 0 100G 0 lvm /u01
sdc 8:32 0 50G 0 disk
sdd 8:48 0 300G 0 disk
sde 8:64 0 100G 0 disk
sdf 8:80 0 50G 0 disk
sdg 8:96 0 3G 0 disk
sdh 8:112 0 3G 0 disk
sdi 8:128 0 3G 0 disk
sdj 8:144 0 300G 0 disk
sr0 11:0 1 1024M 0 rom
[root@jiekexu-rac1 /]# ll /dev/asmdisks/asm-ocr3
lrwxrwxrwx 1 root root 6 Jul 1 16:51 /dev/asmdisks/asm-ocr3 -> ../sdi
[root@jiekexu-rac1 /]# ll /dev/sdi
brw-rw---- 1 grid asmadmin 8, 128 Jul 1 17:12 /dev/sdi
Simply force the third disk to be added to the OCR disk group.
SQL> alter diskgroup OCR add disk '/dev/asmdisks/asm-ocr3' force;
Diskgroup altered.
Check the status of the disk group again and it is normal. It is just added as a new disk. You can see from the added disk name OCR-0003, because the original name is OCR-0000.
The full text is complete. I hope it can help you who are reading this. If you think this article is helpful to you, you can share it with your friends and colleagues. Share it with anyone you care about, and learn and make progress together~~~
Welcome to follow my public account [JiekeXu DBA Road] and learn new knowledge together as soon as possible! You can find me at the following three addresses. The other addresses are all pirated and infringing articles that have been crawled from me, and the code formats, pictures, etc. are all messed up, making it inconvenient to read. Welcome to my official account or Mo Tianlun address to follow me, and I will be the first to do so. Get the latest news.
———————————————————————————
Public name: JiekeXu DBAnoji< /span>Motiendar: https://www.modb.pro/u/4347————————————————————————————
CSDN: https://blog.csdn.net/JiekeXu
Share several database backup scripts
Oracle table fragmentation check and defragmentation solution
OGG|Oracle GoldenGate 基础2022 年公众号历史文章合集整理
Several problems encountered by Oracle 19c RAC
OGG|Oracle 数据迁移后比对一致性
OGG|Oracle GoldenGate microservice architecture
Oracle query table space usage is extremely slow
Domestic database | TiDB 5.4 stand-alone quick installation first experience
Oracle ADG standby database shutdown, maintenance process and incremental recovery
Linux environment to build MySQL8.0.28 master-slave synchronization environment