solaris下ORA-27300 ORA-27301 ORA-27302错误的解决办法

今天现场有个数据库,不断重启,无法提供服务。

系统是sunOS

root@solarisldg4:~# uname -a
SunOS solarisldg4 5.11 11.3 sun4v sparc sun4v

root@solarisldg4:~# cat /etc/release 
                            Oracle Solaris 11.3 SPARC
  Copyright (c) 1983, 2015, Oracle and/or its affiliates.  All rights reserved.
                            Assembled 06 October 2015

root@solarisldg4:~# isainfo -v
64-bit sparcv9 applications
        crc32c cbcond pause mont mpmul sha512 sha256 sha1 md5 camellia kasumi 
        des aes ima hpc vis3 fmaf asi_blk_init vis2 vis popc 
32-bit sparc applications
        crc32c cbcond pause mont mpmul sha512 sha256 sha1 md5 camellia kasumi 
        des aes ima hpc vis3 fmaf asi_blk_init vis2 vis popc v8plus div32 mul32 
isainfo -v

查看有32核

prtconf | grep 'Memory' 
Memory size: 98304 Megabytes

root@solarisldg4:~# swap -l
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 303,1        16 33554416 33554416
/dev/zvol/dsk/rpool/swap 303,1  33554448 234881008 234881008
root@solarisldg4:~# swap -s
total: 7944320k bytes allocated + 3596824k reserved = 11541144k used, 157478440k available

本来只有16g的swap,但是现在已经被现场扩展成150g内存了。

原因是因为之前这台数据库主机根目录满过,导致服务器宕机,并且现场怀疑swap过小,导致数据库也会宕机。

后来核查操作系统的日志,在今天早上凌晨0-8点的时候确实出现了大量的swap满的warning:

Aug 15 00:38:51 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 24041 (oracle)
Aug 15 00:39:12 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1639 (oracle)
Aug 15 00:39:56 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 19988 (oracle)
Aug 15 00:40:19 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1719 (oracle)
Aug 15 00:40:21 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1702 (oracle)
Aug 15 00:40:21 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1714 (oracle)
Aug 15 00:40:21 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1704 (oracle)
Aug 15 00:40:21 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1719 (oracle)
Aug 15 00:40:21 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1695 (oracle)
Aug 15 00:40:21 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1700 (oracle)
Aug 15 00:40:25 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1695 (oracle)
Aug 15 00:40:25 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1719 (oracle)
Aug 15 00:40:25 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1702 (oracle)
Aug 15 00:40:25 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1700 (oracle)
Aug 15 00:40:25 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1704 (oracle)
Aug 15 00:40:25 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1714 (oracle)
Aug 15 00:40:31 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1740 (tnslsnr)
Aug 15 00:40:31 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1723 (oracle)
Aug 15 00:40:34 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1740 (tnslsnr)
Aug 15 00:40:34 solarisldg4 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 1723 (oracle)
Aug 15 00:40:34 solarisldg4 tmpfs: [ID 518458 kern.warning] WARNING: /system/volatile: File system full, swap space limit exceeded

到数据库层面:

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show parameter dump

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_core_dump                 string      partial
background_dump_dest                 string      /oracle/product/12c/rdbms/log
core_dump_dest                       string      /oracle/diag/rdbms/cc/cc/cdump
max_dump_file_size                   string      unlimited
shadow_core_dump                     string      partial
user_dump_dest                       string      /oracle/product/12c/rdbms/log
SQL> 

查看alert日志

/oracle/diag/rdbms/cc/cc/trace# view alert_cc.log

找到数据库出问题的时间段,发现很多errors

Wed Aug 15 01:00:02 2018
Process startup failed, error stack:
Wed Aug 15 01:00:02 2018
Errors in file /oracle/diag/rdbms/cc/cc/trace/cc_psp0_1166.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Process J016 died, see its trace file
Wed Aug 15 01:00:02 2018
kkjcre1p: unable to spawn jobq slave process
Wed Aug 15 01:00:02 2018
Errors in file /oracle/diag/rdbms/cc/cc/trace/cc_cjq0_1253.trc:
Wed Aug 15 01:00:06 2018
Process startup failed, error stack:
Wed Aug 15 01:00:06 2018
Errors in file /oracle/diag/rdbms/cc/cc/trace/cc_psp0_1166.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Process J016 died, see its trace file
Wed Aug 15 01:00:06 2018
kkjcre1p: unable to spawn jobq slave process
Wed Aug 15 01:00:06 2018
Errors in file /oracle/diag/rdbms/cc/cc/trace/cc_cjq0_1253.trc:
Wed Aug 15 01:00:11 2018
Process startup failed, error stack:
Wed Aug 15 01:00:11 2018
Errors in file /oracle/diag/rdbms/cc/cc/trace/cc_psp0_1166.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Process J016 died, see its trace file

操作系统错误,errorno是12.这个操作系统错误是分配共享内存错误。

我们可以查看下具体的trc文件/oracle/diag/rdbms/cc/cc/trace/cc_psp0_1166.trc:

*** 2018-08-15 00:39:17.213
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3

*** 2018-08-15 00:39:31.231
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3

*** 2018-08-15 00:39:37.308
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn5

*** 2018-08-15 00:39:51.411
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3

*** 2018-08-15 00:39:56.411
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3

*** 2018-08-15 00:40:02.421
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3

*** 2018-08-15 00:40:09.421
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3

发现除了有errno是12的,还有errno是11

附部分errno意义:

errno.00 is: Success
errno.01 is: Operation not permitted
errno.02 is: No such file or directory
errno.03 is: No such process
errno.04 is: Interrupted system call
errno.05 is: Input/output error
errno.06 is: No such device or address
errno.07 is: Argument list too long
errno.08 is: Exec format error
errno.09 is: Bad file descriptor
errno.10 is: No child processes
errno.11 is: Resource temporarily unavailable[资源临时不可用](连续发送数据时候回出此错,加延时)
errno.12 is: Cannot allocate memory
errno.13 is: Permission denied
errno.14 is: Bad address
errno.15 is: Block device required
errno.16 is: Device or resource busy

至此大概可以猜测,当时的情况是swap耗尽,导致内存无法被转储,内存越来越多,被撑爆了。

至于为啥会出现这种情况,一种可能是swap确实过小,内存98G(sga配置的45g,pga13g),swap只有16G。

还有一种可能就是oracle的bug。于是去metlink上找了一把,还真找到了。

Doc ID 1333824.1
Database Crashes With ORA-27300, ORA-27301, ORA-27302 On Solaris SPARC 
Applies to: 
 Oracle Database - Enterprise Edition - Version 10.2.0.1 to 12.2.0.1 [Release 10.2 to 12.2]
Oracle Solaris on SPARC (64-bit)
 ***Checked for relevance on 01-Jul-2016***

Symptoms

Below errors are displayed in the alert log file when Oracle fails to create a new process on Solaris 10

...
 Process startup failed, error stack:
 Errors in file /ora/log/diag/rdbms/prod/prod/trace/prod_psp0_29654.trc:
 ORA-27300: OS system dependent operation:fork failed with status: 11
 ORA-27301: OS failure message: Resource temporarily unavailable
 ORA-27302: failure occurred at: skgpspawn3
 Tue May 31 06:53:04 2011
 Process m000 died, see its trace file
...

Near the time of the issue, also the OS error log file reports "Resource temporarily unavailable", e.g.
...
May 31 07:16:14 paper inetd[510]: [ID 702911 daemon.error] Unable to fork inetd_start method of instance svc:/network/shell:default: Resource temporarily unavailable
...

Cause

The kernel parameters maxuprc, pidmax and max_nprocs were not set correctly, hence Oracle could not create a new process.
maxuprc =  Maximum number of processes that can be created by any user (old value 29995)
pidmax =   Maximum largest process ID
max_nprocs =  maximum number of processes that can be created on a system (user + system)


Solution

In /etc/system, set the following parameters:

set maxuprc=60000 # Maximum number of processes that can be created by any user (old value 29995)
 set pidmax=70000 # Maximum largest process ID
 set max_nprocs=65000 # maximum number of processes that can be created on a system (user + system)
 set maxusers=4096

NOTE:
/etc/system changes require a reboot to take effect

大意就是配置/etc/system这个文件,增加下面这四项:

set maxuprc=60000 # Maximum number of processes that can be created by any user (old value 29995)
 set pidmax=70000 # Maximum largest process ID
 set max_nprocs=65000 # maximum number of processes that can be created on a system (user + system)
 set maxusers=4096

然后需要重启服务器。

查看本主机该配置文件,确实没有这四项配置,增加吧,后续再观察效果。

猜你喜欢

转载自blog.csdn.net/kadwf123/article/details/81713056