Shutting down and restarting causes the ASM disk to be lost and the database cannot be started.

9018fe6ab7688be1bfd0eeb93c9eea35.gif

Author | JiekeXu

Source | Public account JiekeXu DBA road (ID: JiekeXu_IT)

If you need to reprint, please contact us for authorization | (Personal WeChat ID: JiekeXu_DBA)

Hello everyone, I am JiekeXu. I am very happy to meet you again. Today I will take a look at the problem of ASM disk loss due to shutdown and restart, and the database cannot be started. Welcome to click on the blue text "JiekeXu DBA Road" above to follow my public account, marked Star or pin it to the top, more useful information will arrive as soon as possible!

Problem phenomenon

The thing is like this, please take a look. During the Mid-Autumn Festival and the National Day, the test computer room needed to cut off the power due to line changes, so all the test machines needed to be shut down. After waiting for the changes to be completed, one node database of the RAC in the test environment could not be started. The inspection found that The ARCH disk of node 2 is not mounted normally.

86deea2a0a43940ef483ebf8bfee6400.png

Log in to the ASM instance and check the ASM disk and path. It is found that the archive disk ARCH is not mounted and the disk path is ‘/dev/sde’.

su - grid 
sqlplus / as sysasm


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0


SQL> set lin 1000 pagesize 999 
col PATH for a30 
col NAME for a15 
col FAILGROUP for a15 
select GROUP_NUMBER,DISK_NUMBER,OS_MB/1024,TOTAL_MB/1024,FREE_MB/1024,NAME,FAILGROUP,PATH,FAILGROUP_TYPE,header_status,state from v$asm_disk order by 1; 
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB/1024,FREE_MB/1024,USABLE_FILE_MB/1024,REQUIRED_MIRROR_FREE_MB,HOT_USED_MB,COLD_USED_MB/1024 from v$asm_diskgroup; 
SQL> SQL> SQL> 
GROUP_NUMBER DISK_NUMBER OS_MB/1024 TOTAL_MB/1024 FREE_MB/1024 NAME            FAILGROUP       PATH                           FAILGRO HEADER_STATU STATE
------------ ----------- ---------- ------------- ------------ --------------- --------------- ------------------------------ ------- ------------ --------
           0           0        100             0            0                                 /dev/sde                       REGULAR MEMBER       NORMAL
           2           2        200           200   5.91796875 DATA_0002       DATA_0002       /dev/sdi                       REGULAR MEMBER       NORMAL
           2           0        200           200    5.8828125 DATA_0000       DATA_0000       /dev/sdc                       REGULAR MEMBER       NORMAL
           2           1        200           200   5.81640625 DATA_0001       DATA_0001       /dev/sdd                       REGULAR MEMBER       NORMAL
           3           0          3             3    2.6640625 OCR_0000        OCR_0000        /dev/sdf                       REGULAR MEMBER       NORMAL
           3           1          3             3   2.66015625 OCR_0001        OCR_0001        /dev/sdg                       REGULAR MEMBER       NORMAL
           3           2          3             3   2.66015625 OCR_0002        OCR_0002        /dev/sdh                       REGULAR MEMBER       NORMAL


7 rows selected.


SQL> 
GROUP_NUMBER NAME            STATE       TYPE   TOTAL_MB/1024 FREE_MB/1024 USABLE_FILE_MB/1024 REQUIRED_MIRROR_FREE_MB HOT_USED_MB COLD_USED_MB/1024
------------ --------------- ----------- ------ ------------- ------------ ------------------- ----------------------- ----------- -----------------
           0 ARCH            DISMOUNTED                     0            0                   0                       0           0                 0
           2 DATA            MOUNTED     EXTERN           600   17.6171875          17.6171875                       0           0        582.382813
           3 OCR             MOUNTED     NORMAL             9     7.984375           2.4921875                    3072           0          1.015625

cc675a536463e7d3b01ae061b450baf8.png

Then use the lsblk command at the operating system layer to check that the size of /dev/sde is 100G, and then use the ASM command to mount the ARCH disk but it fails, and an error is reported that the disk is lost.

[root@jiekexu2 dev]# lsblk
NAME              MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                 8:0    0   200G  0 disk 
|-sda1              8:1    0     1G  0 part /boot
`-sda2              8:2    0   199G  0 part 
  |-rootvg-lvroot 253:0    0 191.1G  0 lvm  /
  `-rootvg-lvswap 253:1    0  15.9G  0 lvm  [SWAP]
sdb                 8:16   0     8G  0 disk 
`-rootvg-lvswap   253:1    0  15.9G  0 lvm  [SWAP]
sdc                 8:32   0   200G  0 disk 
sdd                 8:48   0   200G  0 disk 
sde                 8:64   0   100G  0 disk 
sdf                 8:80   0     3G  0 disk 
sdg                 8:96   0     3G  0 disk 
sdh                 8:112  0     3G  0 disk 
sdi                 8:128  0   200G  0 disk 
sdj                 8:144  0   100G  0 disk 
sdk                 8:160  0   200G  0 disk 
sr0                11:0    1  1024M  0 rom  


jiekexu2:/home/grid(+ASM2)$ asmcmd 
ASMCMD> mount ARCH
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1"  (DBD ERROR: OCIStmtExecute)
ASMCMD> exit

Next, through the alter log of the ASM instance, we can indeed see that the ARCH disk has two disks, one is (/dev/sde), and the other is empty () and cannot be seen. So let’s take a look at the ARCH disk and related permissions of node 1.

78b81e6e201009a94ee370a4cb343f13.png

Node 1/2 permission comparison
Through node 1, it is found that the ARCH disk group has two disks (/dev/sde) and (/dev/sdj), and the group is grid. :asmadmin, but the permissions of node 2 (/dev/sde) belong to the group normally, and the group (/dev/sdj) belongs to root:disk. This is obviously a problem. It is precisely because the group permissions here are incorrect. The first check above When the disk is an ARCH disk, only one piece (/dev/sde) is displayed.

f521d2f97285d5543f5ecbc28af7da13.png

8ebf4b4f7c075af4d21787646822768a.png

Solve the problem

Let’s check if there is a problem with the udev configuration file (the CRT used here was opened with more, and the last line was not found“OWNER=" A line break appears later, and then we continue to the next step), then we modify the (/dev/sdj) group to grid:asmadmin, trigger the udev rule again, and then try to mount the disk group.

jiekexu2:/home/grid(+ASM2)$ more /etc/udev/rules.d/99-oracle-asmdisks.rules
 KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c29bd4635c80a3011779bc1a7f99", SYMLINK+="asmdisks/asmdiskb", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
 KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c2947f3b25a26a2bd31496cd2f59", SYMLINK+="asmdisks/asmdiskc", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
 KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c29ad3307c8874aaeb6b5c83f2f8", SYMLINK+="asmdisks/asmdiskd", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
 KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c29756eabbaea24471e77a46d1bf", SYMLINK+="asmdisks/asmdiske", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
 KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c296f7ef4f660d24ecdc32fd5216", SYMLINK+="asmdisks/asmdiskf", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
 KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c29ef6de29e246eec5d21ae13f74", SYMLINK+="asmdisks/asmdiskg", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
 KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c2995925bb888b965036ad9d9807", SYMLINK+="asmdisks/asmdiski", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
 KERNEL=="sd*", ACTION=="add|change", SUBSYSTEM=="block", PROGRAM=="/lib/udev/scsi_id -g -u -d /dev/$name", RESULT=="36000c294342f0d7dd282c4a632d82c47", SYMLINK+="asmdisks/asmdiskj", OWNER=
"grid", GROUP="asmadmin", MODE="0660"
jiekexu2:/home/grid(+ASM2)$ ll /dev/sde
brw-rw---- 1 grid asmadmin 8, 64 Oct  1 09:46 /dev/sde
jiekexu2:/home/grid(+ASM2)$ ll /dev/sdj
brw-rw---- 1 root disk 8, 144 Oct  1 09:19 /dev/sdj
jiekexu2:/home/grid(+ASM2)$ exit
logout
[root@jiekexu2 ~]# udevadm control --reload-rules
[root@jiekexu2 ~]# ll /dev/sdj
brw-rw---- 1 root disk 8, 144 Oct  1 09:19 /dev/sdj
[root@jiekexu2 ~]# ll /dev/sde
brw-rw---- 1 grid asmadmin 8, 64 Oct  1 09:46 /dev/sde
[root@jiekexu2 ~]# udevadm trigger
[root@jiekexu2 ~]# ll /dev/sdj
brw-rw---- 1 root disk 8, 144 Oct  1 09:50 /dev/sdj
[root@jiekexu2 ~]# ll /dev/sde
brw-rw---- 1 grid asmadmin 8, 64 Oct  1 09:50 /dev/sde
[root@jiekexu2 ~]# chown grid:asmadmin /dev/sdj
[root@jiekexu2 ~]# ll /dev/sdj
brw-rw---- 1 grid asmadmin 8, 144 Oct  1 09:50 /dev/sdj

Next, we log in to the ASM instance and check the disk path to see that sdj and sde can be displayed normally, and ARCH can also be mounted normally.

su - grid
sqlplus / as sysasm
alter diskgroup ARCH mount;

e09519b10a83e84cd2e8f9d3669560b0.png

Log in to the database and find that the database instance has started automatically.

6ff28237d82d08e02139fd9c9a0e89bb.png

In another set of RAC, the database on node 2 cannot be started, and the DATA disk group cannot be mounted normally. After checking the ASM instance log, it is found that a disk is missing from the DATA disk group, and the missing disk permission has also changed to root:disk. Follow the same method, modify the group and remount to return to normal.

58a6b7a2d0d5bea9db6819a167400841.png

2023-10-01T09:18:17.259212+08:00
ERROR: /* ASMCMD cguid:cae29900570ecf7ebfd17fbf776d2840 cname:jieke-rac-scan nodename:jieke-rac-87 */ALTER DISKGROUP data MOUNT 
2023-10-01T09:18:17.650518+08:00
ASM Health Checker found 1 new failures
2023-10-01T10:21:30.690039+08:00
SQL> /* ASMCMD cguid:cae29900570ecf7ebfd17fbf776d2840 cname:jieke-rac-scan nodename:jieke-rac-87 */ALTER DISKGROUP DATA MOUNT  
2023-10-01T10:21:30.795687+08:00
NOTE: cache registered group DATA 1/0x36302669
NOTE: cache began mount (not first) of group DATA 1/0x36302669
NOTE: Assigning number (1,0) to disk (/dev/asmdisks/asm-data)
2023-10-01T10:21:31.238030+08:00
GMON querying group 1 at 38 for pid 37, osid 59371
2023-10-01T10:21:31.269748+08:00
NOTE: Assigning number (1,1) to disk ()
2023-10-01T10:21:31.285726+08:00
GMON querying group 1 at 39 for pid 37, osid 59371
2023-10-01T10:21:31.286854+08:00
NOTE: cache dismounting (clean) group 1/0x36302669 (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 59371, image: oracle@jieke-rac-87 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: LGWR not being messaged to dismount
NOTE: cache dismounted group 1/0x36302669 (DATA)
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x36302669
NOTE: cache deleting context for group DATA 1/0x36302669
2023-10-01T10:21:31.386823+08:00
GMON dismounting group 1 at 40 for pid 37, osid 59371
2023-10-01T10:21:31.387863+08:00
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1"

7c00e5caf67c305bb2347c51878f57b8.png

SQL> set lin 1000 pagesize 999 
col PATH for a30 
col NAME for a15 
col FAILGROUP for a15 
select GROUP_NUMBER,DISK_NUMBER,OS_MB/1024,TOTAL_MB/1024,FREE_MB/1024,NAME,FAILGROUP,PATH,FAILGROUP_TYPE,header_status,state from v$asm_disk order by 1; 
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB/1024,FREE_MB/1024,USABLE_FILE_MB/1024,REQUIRED_MIRROR_FREE_MB,HOT_USED_MB,COLD_USED_MB/1024 from v$asm_diskgroup; 
SQL> SQL> SQL> 
GROUP_NUMBER DISK_NUMBER OS_MB/1024 TOTAL_MB/1024 FREE_MB/1024 NAME            FAILGROUP       PATH                           FAILGRO HEADER_STATU STATE
------------ ----------- ---------- ------------- ------------ --------------- --------------- ------------------------------ ------- ------------ --------
           1           0        300           300     100.4375 DATA_0000       DATA_0000       /dev/asmdisks/asm-data         REGULAR MEMBER       NORMAL
           1           1        300           300   100.449219 DATA_0001       DATA_0001       /dev/asmdisks/asm-data01       REGULAR MEMBER       NORMAL
           2           0        100           100   89.7578125 FRA_0000        FRA_0000        /dev/asmdisks/asm-fra          REGULAR MEMBER       NORMAL
           3           0         50            50   49.8671875 TEST_0000       TEST_0000       /dev/asmdisks/asm-mgmt         REGULAR MEMBER       NORMAL
           5           3          3             3    2.6484375 OCR_0003        OCR_0003        /dev/asmdisks/asm-ocr3         REGULAR MEMBER       NORMAL
           5           2          3             3   2.63671875 OCR_0002        OCR_0002        /dev/asmdisks/asm-ocr1         REGULAR MEMBER       NORMAL
           5           1          3             3   2.64453125 OCR_0001        OCR_0001        /dev/asmdisks/asm-ocr2         REGULAR MEMBER       NORMAL


7 rows selected.


SQL> 
GROUP_NUMBER NAME            STATE       TYPE   TOTAL_MB/1024 FREE_MB/1024 USABLE_FILE_MB/1024 REQUIRED_MIRROR_FREE_MB HOT_USED_MB COLD_USED_MB/1024
------------ --------------- ----------- ------ ------------- ------------ ------------------- ----------------------- ----------- -----------------
           1 DATA            MOUNTED     EXTERN           600   200.886719          200.886719                       0           0        399.113281
           2 FRA             MOUNTED     EXTERN           100   89.7578125          89.7578125                       0           0        10.2421875
           3 JIEKEXU         MOUNTED     EXTERN            50   49.8671875          49.8671875                       0           0          .1328125
           5 OCR             MOUNTED     NORMAL             9    7.9296875          2.46484375                    3072           0         1.0703125

After returning from the National Day holiday, I restarted Node 2. After the restart, I found that the disk ownership group changed back to root:disk. Shutting down and restarting the two sets of RAC caused the shared disk group ownership to change. I found that one thing in common is that the changed disks are all in the same group. It is the last disk (ARCH_0001 and DATA_0001), and this last disk was added later, so the problem may be the udev configuration file, but the last time I checked it through the more command in the CRT, there was no problem. This time I opened it with vi It is found that there is a line break in the last line. There is a line break after "OWNER=". There is a line break in the other set and the last line. Use xshell to open it as shown below:

0bc7bc06170e278c04d0a90fbfb9b491.png

Just wrap it and delete it, then reload the udev rule.

# vi /etc/udev/rules.d/99-oracle-asmdisks.rules
# udevadm control --reload-rules
# udevadm trigger

Disks in DROPPED status

One day a few months ago, it was suddenly discovered that an OCR disk in this RAC failed, the status changed to FORCING, and the disk name was named with an underscore _DROPPED_0000_OCR. No abnormalities were found in the ASM and cluster logs. Check that the CRS cluster and database are in normal status. You can also start and shut down the cluster normally, but the status of this disk is abnormal. The query information is shown in the figure below:

a3b220667b29ced523c8503c01c5b06f.png

Solution

There is no problem using lsblk to check the three 3G disks of sdg, sdh and sdi and their corresponding permissions.

# lsblk
NAME              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                 8:0    0  100G  0 disk 
|-sda1              8:1    0    1G  0 part /boot
`-sda2              8:2    0   99G  0 part 
  |-rootvg-lvroot 253:0    0 91.1G  0 lvm  /
  `-rootvg-lvswap 253:1    0  7.9G  0 lvm  [SWAP]
sdb                 8:16   0  100G  0 disk 
`-u01vg-lvu01     253:2    0  100G  0 lvm  /u01
sdc                 8:32   0   50G  0 disk 
sdd                 8:48   0  300G  0 disk 
sde                 8:64   0  100G  0 disk 
sdf                 8:80   0   50G  0 disk 
sdg                 8:96   0    3G  0 disk 
sdh                 8:112  0    3G  0 disk 
sdi                 8:128  0    3G  0 disk 
sdj                 8:144  0  300G  0 disk 
sr0                11:0    1 1024M  0 rom  


[root@jiekexu-rac1 /]# ll /dev/asmdisks/asm-ocr3
lrwxrwxrwx 1 root root 6 Jul  1 16:51 /dev/asmdisks/asm-ocr3 -> ../sdi
[root@jiekexu-rac1 /]# ll /dev/sdi
brw-rw---- 1 grid asmadmin 8, 128 Jul  1 17:12 /dev/sdi

Simply force the third disk to be added to the OCR disk group.

SQL> alter diskgroup OCR add disk '/dev/asmdisks/asm-ocr3' force;


Diskgroup altered.

Check the status of the disk group again and it is normal. It is just added as a new disk. You can see from the added disk name OCR-0003, because the original name is OCR-0000.

8b2a2fb2cefa12f0799ecb16c5cc3fad.png

The full text is complete. I hope it can help you who are reading this. If you think this article is helpful to you, you can share it with your friends and colleagues. Share it with anyone you care about, and learn and make progress together~~~

Welcome to follow my public account [JiekeXu DBA Road] and learn new knowledge together as soon as possible! You can find me at the following three addresses. The other addresses are all pirated and infringing articles that have been crawled from me, and the code formats, pictures, etc. are all messed up, making it inconvenient to read. Welcome to my official account or Mo Tianlun address to follow me, and I will be the first to do so. Get the latest news.

———————————————————————————
Public name: JiekeXu DBAnoji< /span>Motiendar: https://www.modb.pro/u/4347————————————————————————————
CSDN: https://blog.csdn.net/JiekeXu


64f894f362809d54727aa54514a83968.gif

Share several database backup scripts

Oracle table fragmentation check and defragmentation solution

OGG|Oracle GoldenGate 基础2022 年公众号历史文章合集整理
 
  

Several problems encountered by Oracle 19c RAC

OGG|Oracle 数据迁移后比对一致性

OGG|Oracle GoldenGate microservice architecture

Oracle query table space usage is extremely slow

Domestic database | TiDB 5.4 stand-alone quick installation first experience

Oracle ADG standby database shutdown, maintenance process and incremental recovery

Linux environment to build MySQL8.0.28 master-slave synchronization environment

What information can you learn from the domestic database research report and my summary suggestions?

Guess you like

Origin blog.csdn.net/JiekeXu/article/details/133723896