死锁分析以及解决(转)

1、发现系统cpu使用率在50%
topas :


Topas Monitor for host: DBSERVER EVENTS/QUEUES FILE/TTY
Fri Jul 2 21:33:13 2010 Interval: 2 Cswitch 1473 Readch 0.0G
Syscall 1452 Writech 0.0G
Kernel 3.6 |## | Reads 174 Rawin 0
User 49.1 |############## | Writes 175 Ttyout 216
Wait 0.0 | | Forks 0 Igets 0
Idle 47.3 |############## | Execs 0 Namei 0
Runqueue 0.5 Dirblk 0
Network KBPS I-Pack O-Pack KB-In KB-Out Waitqueue 0.0
en0 11.1K 5219.0 349.5 234.5 10.9K
en1 0.0 0.0 0.0 0.0 0.0 PAGING MEMORY
lo0 0.0 0.0 0.0 0.0 0.0 Faults 22 Real,MB 16016
Steals 2815 % Comp 80.9
Disk Busy% KBPS TPS KB-Read KB-Writ PgspIn 0 % Noncomp 19.4
hdisk1 0.0 11.0K 44.0 11.0K 0.0 PgspOut 0 % Client 19.4
dac0 0.0 0.0 0.0 0.0 0.0 PageIn 2816
dac0utm 0.0 0.0 0.0 0.0 0.0 PageOut 0 PAGING SPACE
dac1 0.0 0.0 0.0 0.0 0.0 Sios 2816 Size,MB 16384
dac1utm 0.0 0.0 0.0 0.0 0.0 % Used 59.2
hdisk0 0.0 0.0 0.0 0.0 0.0 NFS (calls/sec) % Free 40.7
hdisk2 0.0 0.0 0.0 0.0 0.0 ServerV2 0
ClientV2 0 Press:
Name PID CPU% PgSp Owner ServerV3 0 "h" for help
oracle 655514 26.7 10.2 oracle ClientV3 0 "q" to quit
ftp 1011860 0.5 0.6 root
lrud 16392 0.4 0.5 root
topas 782448 0.1 1.8 oracle
oracle 254012 0.0 17.8 oracle
oracle 385218 0.0 10.1 oracle
dtgreet 123020 0.0 1.3 root
oracle 389312 0.0 8.0 oracle

2、进入sqlplus


执行察看oracle 655514 26.7 10.2 oracle 系统进程对应的sql

SYS@ora10g>select sid,serial#,machine,username,program,sql_hash_value,sql_id, to_char(logon_time,'yyyy/mm/dd hh24:mi:ss') as login_time from v$session
where paddr in (select addr from v$process where spid in ('655514'));

SID SERIAL# MACHINE USERNAME PROGRAM SQL_HASH_VALUE SQL_ID LOGIN_TIME
---------- ---------- --------------- ------------- -------------- ------------------- -------------- --------------
572 28114 APPSERVER NC5X 2201849944 5uhv05f1mv42s 2010/07/02 08:35:58

3、查看该sql
SYS@ora10g>select sql_text from v$sqltext_with_newlines where hash_value = 2201849944 order by piece;

SQL_TEXT
----------------------------------------------------------------
update so_sale set ts='2010-07-02 14:17:09',capproveid = :1, dap
provedate = :2, fstatus = 2, daudittime = '2010-07-02 14:17:09'
where csaleid = :3


4、由于sql中有参数,查看个参数值
SYS@ora10g>select t.HASH_VALUE, t.SQL_ID, t.NAME, t.LAST_CAPTURED, t.WAS_CAPTURED, t.VALUE_STRING, t.VALUE_ANYDATA
from v$sql_bind_capture t where sql_id = '5uhv05f1mv42s';

no rows selected

5、查看系统中有无死锁
SYS@ora10g>select sess.sid,
sess.serial#,
lo.locked_mode
from v$locked_object lo,
dba_objects ao,
v$session sess
where ao.object_id = lo.object_id and lo.session_id = sess.sid; 2 3 4 5 6 7

SID SERIAL# LOCKED_MODE
---------- ---------- -----------
572 28114 3
572 28114 3
572 28114 3
572 28114 3
572 28114 3
sid 和第二步中查出的sid 相同

查看其它锁
select * from V$lock
SYS@ora10g>select sid from V$lock;

SID
----------
660
660
660
661
662
662
662
662
662
662
662

SID
----------
662
662
662
662
662
662
662
662
662
662
662

SID
----------
662
659
572
572
572
572
572
572

30 rows selected.
根据几个sid均查不到 sqltext
SYS@ora10g>select sql_fulltext from v$sqlarea where address in (select sql_address from v$session where sid in (659,660,661,662));

no rows selected

6、根据sid查看死锁的sql


select * where address = (select sql_address from v$session where sid = 572)

结果no rows

select sql_address from v$session where sid = 572;

SQL_ADDRESS
----------------
0700000373C4EA88

7、查询v$open_cursor 视图
SYS@ora10g>select * from v$open_cursor where ADDRESS ='0700000373C4EA88' ;

no rows selected

8、杀死死锁进程
SYS@ora10g>ALTER SYSTEM KILL SESSION '572,28114';

ALTER SYSTEM KILL SESSION '572,28114'
*
ERROR at line 1:
ORA-00031: session marked for kill


9、查看系统几进程
SYS@ora10g>select spid, osuser, s.program
from v$session s,v$process p
where s.paddr=p.addr and s.sid=572
2 3 4 ;

SPID OSUSER
------------ ------------------------------
PROGRAM
------------------------------------------------
655514

10、进入系统
#kill -9 655514
系统恢复正常


11、返回oracle的sqlplus查看锁
SYS@ora10g>select p.spid,a.serial#, c.object_name,b.session_id,b.oracle_username,b.os_user_name from v$process p,v$session a, v$locked_object b,all_objects c where p.addr=a.paddr and a.process=b.process and c.object_id=b.object_id;

no rows selected
死锁解决


其中有两个问题
1、第4步查看含参数的sql找不到参数
考虑任何sql语句执行第一步都会 创建游标(Create a Cursor),而含参数的sql语句是将参数绑定给游标,很有可能是sql执行中的游标已经关闭了。
参考:oracle sql语句执行过程和http://xiarilian12.javaeye.com/blog/574715以及http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/dynviews_2114.htm

2、第5步select sid from V$lock 看到有其它一些sid为什么不能kill
SYS@ora10g>select sid,serial#,username from v$session;

SID SERIAL# USERNAME
---------- ---------- ------------------------------
634 70
635 10 NC5X
636 34 NC5X
637 1
638 2 NC5X
639 1
644 27 IUFO
645 13
650 1
651 1
653 79 NC5X

SID SERIAL# USERNAME
---------- ---------- ------------------------------
654 3 SYS
655 1
656 1
657 1
658 1
659 1
660 1
661 1
662 1
663 1
664 1

SID SERIAL# USERNAME
---------- ---------- ------------------------------
665 1

23 rows selected.

其中USERNAME为空的会话是Oracle的后台进程,对这些会话进行任何操作丢有可能造成宕机或者其它损失。

其它死锁问题收集

Oracle杀死死锁进程

先查看哪些表被锁住了:

 

select b.owner,b.object_name,a.session_id,a.locked_mode
from v$locked_object a,dba_objects b
where b.object_id = a.object_id;

OWNER     OBJECT_NAME     SESSION_ID LOCKED_MODE
------------------------------ -----------------
WSSB SBDA_PSHPFTDT   22 3
WSSB_RTREPOS WB_RT_SERVICE_QUEUE_TAB   24 2
WSSB_RTREPOS WB_RT_NOTIFY_QUEUE_TAB   29 2
WSSB_RTREPOS WB_RT_NOTIFY_QUEUE_TAB   39 2
WSSB SBDA_PSDBDT     47 3
WSSB_RTREPOS WB_RT_AUDIT_DETAIL     47 3

select b.username,b.sid,b.serial#,logon_time 
from v$locked_object a,v$session b
where a.session_id = b.sid order by b.logon_time;

USERNAME   SID   SERIAL# LOGON_TIME
------------------------------ ---------- -------
WSSB_RTACCESS     39     1178 2006-5-22 1
WSSB_RTACCESS     29     5497 2006-5-22 1

杀进程中的会话:

 

alter system kill session 'sid,serial#';
e.g
alter system kill session '29,5497';

如果有ora-00031错误,则在后面加immediate;alter system kill session '29,5497' immediate;

如何杀死oracle死锁进程

1.查哪个过程被锁:

查V$DB_OBJECT_CACHE视图:

SELECT * FROM V$DB_OBJECT_CACHE WHERE OWNER='过程的所属用户' AND CLOCKS!='0';

2. 查是哪一个SID,通过SID可知道是哪个SESSION:

查V$ACCESS视图:

SELECT * FROM V$ACCESS WHERE OWNER='过程的所属用户' AND NAME='刚才查到的过程名';

3. 查出SID和SERIAL#:

查V$SESSION视图:

SELECT SID,SERIAL#,PADDR FROM V$SESSION WHERE SID='刚才查到的SID';

查V$PROCESS视图:

SELECT SPID FROM V$PROCESS WHERE ADDR='刚才查到的PADDR';

4. 杀进程:

(1)先杀ORACLE进程:

ALTER SYSTEM KILL SESSION '查出的SID,查出的SERIAL#';

(2)再杀操作系统进程:

KILL -9 刚才查出的SPID或ORAKILL 刚才查出的SID 刚才查出的SPID。

Oracle的死锁

查询数据库死锁:

 

select t2.username||'   '||t2.sid||'  
 '||t2.serial#||'   '||t2.logon_time||'  
 '||t3.sql_text
from v$locked_object t1,v$session t2,v$sqltext t3
where t1.session_id=t2.sid 
and t2.sql_address=t3.address
order by t2.logon_time;

查询出来的结果就是有死锁的session了,下面就是杀掉,拿到上面查询出来的SID和SERIAL#,填入到下面的语句中:

alter system kill session 'sid,serial#';

一般情况可以解决数据库存在的死锁了,或通过session id 查到对应的操作系统进程,在Unix中杀掉操作系统的进程。

 

SELECT a.username,c.spid AS os_process_id,c.pid 
AS oracle_process_id FROM v$session a,v$process c 
WHERE c.addr=a.paddr and a.sid= and a.serial#= ;

然后采用kill (unix) 或 orakill(windows )。

在Unix中:

 

ps -ef|grep os_process_id
kill -9 os_process_id
ps -ef|grep os_process_id

经常在Oracle的使用过程中碰到这个问题,所以也总结了一点解决方法。

1)查找死锁的进程:

 

sqlplus "/as sysdba"   (sys/change_on_install)
SELECT s.username,l.OBJECT_ID,l.SESSION_ID,s.SERIAL#,
l.ORACLE_USERNAME,l.OS_USER_NAME,l.PROCESS 
FROM V$LOCKED_OBJECT l,V$SESSION S WHERE l.SESSION_ID=S.SID;

2)kill掉这个死锁的进程:

alter system kill session ‘sid,serial#’; (其中sid=l.session_id)

3)如果还不能解决:

 

select pro.spid from v$session ses,
v$process pro where ses.sid=XX and 
ses.paddr=pro.addr;

其中sid用死锁的sid替换:

 

exit
ps -ef|grep spid

其中spid是这个进程的进程号,kill掉这个Oracle进程。

猜你喜欢

转载自mljavalife.iteye.com/blog/1547354