今天一个研发库,在通过客户端工具往数据库里面导数据的时候一个大表(500w数据量)报错了,客户端返回的错误是:
ORA-03113: end-of-file on communication channel
Process ID: 31159
Session ID: 978 Serial number: 1779
后来查询了下alert.log日志,发现有下面的一些错误:
Thread 1 advanced to log sequence 7558 (LGWR switch)
Current log# 1 seq# 7558 mem# 0: /oradata/cc/redo01.log
Wed Oct 17 15:59:03 2018
Errors in file /oracle11/diag/rdbms/cc/cc/trace/cc_ora_29812.trc (incident=153180):
ORA-03137: TTC protocol internal error : [12333] [7] [10] [65] [] [] [] []
Incident details in: /oracle11/diag/rdbms/cc/cc/incident/incdir_153180/cc_ora_29812_i153180.trc
Wed Oct 17 15:59:04 2018
Dumping diagnostic data in directory=[cdmp_20181017155904], requested by (instance=1, osid=29812), summary=[incident=153180].
Wed Oct 17 15:59:05 2018
Sweep [inc][153180]: completed
Sweep [inc2][153180]: completed
Wed Oct 17 15:59:46 2018
Thread 1 cannot allocate new log, sequence 7559
Checkpoint not complete
Current log# 1 seq# 7558 mem# 0: /oradata/cc/redo01.log
Thread 1 advanced to log sequence 7559 (LGWR switch)
Current log# 2 seq# 7559 mem# 0: /oradata/cc/redo02.log
这里面有两个问题,一个是ORA-03137,还有一个是Checkpoint not complete。
alert日志里面大量的都是这样的日志,第一感觉Checkpoint not complete是redo log日志过小或者过少。导致做大批量的检查点时来不及,而redo log日志又要切换,此时发现要被覆盖的这个redo日志里面还有未做检查点的内容。应该说一般发生在未开归档的库里面(这个有待验证)。因为本库就没开归档。
至于这个跟ora-03137错误有没有影响,我就不知道了。不过在metlink上社区里面有篇帖子,跟我的情况类似:
有个回复是通过修改一个叫_optim_peek_user_binds的隐含参数来处理。
这边反正两边解决吧,一个是增加redo日志组,增大redo日志文件大小。现状是只有3个组,每个组1个redo文件,每个文件大小50M。
SQL> select GROUP# , SEQUENCE# ,BYTES/1024/1024,MEMBERS,STATUS from v$log ;
GROUP# SEQUENCE# BYTES/1024/1024 MEMBERS
---------- ---------- --------------- ----------
1 7819 50 1
2 7820 50 1
3 7818 50 1
SQL> select member from v$logfile;
MEMBER
--------------------------------------------------------------------------------
/oradata/cc/redo01.log
/oradata/cc/redo02.log
/oradata/cc/redo03.log
SQL>
alter database add logfile group 4 ('/oradata/cc/redo04a.log','/oradata/cc/redo04b.log') size 1G;
alter database add logfile group 5 ('/oradata/cc/redo05a.log','/oradata/cc/redo05b.log') size 1G;
alter database add logfile group 6 ('/oradata/cc/redo06a.log','/oradata/cc/redo06b.log') size 1G;
alter database add logfile member '/u01/app/oracle/oradata/XXX/redo03_b.log' to group 3;
SQL> select GROUP# , SEQUENCE# ,BYTES/1024/1024,MEMBERS,STATUS from v$log ;
GROUP# SEQUENCE# BYTES/1024/1024 MEMBERS STATUS
---------- ---------- --------------- ---------- ----------------
1 7819 50 1 INACTIVE
2 7820 50 1 ACTIVE
3 7818 50 1 INACTIVE
4 7821 1024 2 ACTIVE
5 7822 1024 2 ACTIVE
6 7823 1024 2 CURRENT
6 rows selected.
alter database drop logfile group 1;
alter database add logfile group 1 ('/oradata/cc/redo01c.log','/oradata/cc/redo01d.log') size 1G;
SQL> select GROUP# , SEQUENCE# ,BYTES/1024/1024,MEMBERS,STATUS from v$log ;
GROUP# SEQUENCE# BYTES/1024/1024 MEMBERS STATUS
---------- ---------- --------------- ---------- ----------------
1 0 1024 2 UNUSED
2 7820 50 1 INACTIVE
3 7818 50 1 INACTIVE
4 7821 1024 2 INACTIVE
5 7822 1024 2 INACTIVE
6 7823 1024 2 CURRENT
6 rows selected.
alter database drop logfile group 2;
alter database drop logfile group 3;
alter database add logfile group 2 ('/oradata/cc/redo02a.log','/oradata/cc/redo02b.log') size 1G;
alter database add logfile group 3 ('/oradata/cc/redo03c.log','/oradata/cc/redo03d.log') size 1G;
SQL> select GROUP# , SEQUENCE# ,BYTES/1024/1024,MEMBERS,STATUS from v$log ;
GROUP# SEQUENCE# BYTES/1024/1024 MEMBERS STATUS
---------- ---------- --------------- ---------- ----------------
1 0 1024 2 UNUSED
2 0 1024 2 UNUSED
3 0 1024 2 UNUSED
4 7821 1024 2 INACTIVE
5 7822 1024 2 INACTIVE
6 7823 1024 2 CURRENT
6 rows selected.
SQL> select member from v$logfile;
MEMBER
--------------------------------------------------------------------------------
/oradata/cc/redo01c.log
/oradata/cc/redo02a.log
/oradata/cc/redo02b.log
/oradata/cc/redo01a.log
/oradata/cc/redo01b.log
/oradata/cc/redo05a.log
/oradata/cc/redo05b.log
/oradata/cc/redo06a.log
/oradata/cc/redo06b.log
/oradata/cc/redo01d.log
/oradata/cc/redo03c.log
MEMBER
--------------------------------------------------------------------------------
/oradata/cc/redo03d.log
12 rows selected.
另一个就是按照metlink上的处理ora-03137,修改隐含参数_optim_peek_user_binds=false。
重启库
当然,按照现有内存配置调了下内核参数。
##########Shared Memory###############
kernel.shmmax=64424509440
kernel.shmmni=4096
kernel.shmall=15728640
##########Semaphore Arrays############
kernel.sem=6144 50331648 4096 8192
##########open file###################
fs.file-max=6815744
##########aio#########################
fs.aio-max-nr=3145728
##########network#####################
net.ipv4.icmp_echo_ignore_broadcasts=1
net.ipv4.icmp_ignore_bogus_error_responses=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.default.rp_filter=1
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=1500
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_keepalive_intvl=60
net.ipv4.tcp_window_scaling=1
net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=4194304
net.core.wmem_max=4194304
net.ipv4.tcp_rmem=8192 262144 4194304
net.ipv4.tcp_wmem=8192 262144 4194304
net.ipv4.ip_local_port_range=9000 65500
##########CORE######################
kernel.core_uses_pid=1
##########Message Queues##############
kernel.msgmax=655360
kernel.msgmni=4096
kernel.msgmnb=1024000
##########vm config###################
vm.min_free_kbytes=2097152
vm.vfs_cache_pressure=200
vm.swappiness=10