oracle如何处理阻塞会话,library cache lock

问题现象

##### 统一监控告警平台-DB: [触发]
`告警集群:` jq-prod-db-prometheus
`告警描述:` 数据库Oracle堵塞会话超过5分钟,会话ID:1057
`告警环境:` prod
`告警类型:` oracle_prod
`告警实例:` nba_stb
`告警IP:`   11.11.11.11
`实例别名:`  nba_stb
`当前数值:` 2
`告警级别:` 严重
`触发时间:` 2021-03-02 13:33:59
`持续时间:` 10s
`告警次数:` 1
`告警应用:`nba_oracle
`应用负责人:` unknown

`告警集群:` jq-prod-db-prometheus
`告警描述:` 数据库Oracle堵塞会话超过5分钟,会话ID:4137
`告警环境:` prod
`告警类型:` oracle_prod
`告警实例:` nba_stb
`告警IP:`   11.11.11.11
`实例别名:`  nba_stb
`当前数值:` 1
`告警级别:` 严重
`触发时间:` 2021-03-02 13:33:59
`持续时间:` 10s
`告警次数:` 1
`告警应用:`nba_oracle
`应用负责人:` unknown

处理过程

  • 登录11.11.11.11进入oracle,并设置列宽(如果使用plsql则不用设置)
[bzops@nba-db-01 ~]$ sudo -i
[root@nba-db-01 ~]# su - oracle
[oracle@nba-db-01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Mar 2 15:05:36 2021

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> col sample_time for a30
col event for a30
col program for a30
col machine for a40
set lines 200 pages 999
  • 查看当前的阻塞会话是空,说明当前阻塞已经结束,blocking_session代表阻塞者,此字段不为空则说明当前有阻塞
SQL> select program,sql_id,event,blocking_session,sid,last_call_et from v$session where blocking_session is not null;

no rows selected

SQL>
  • 虽然当前阻塞结束了,但是oracle有活动会话历史视图v$active_session_history可以查看最近每个活动会话超过1秒的状态,
    从结果中可以看到,在下午2点16分,不管是后台进程LGWR还是用户进程JDBC的等待事件都是library cache lock,且阻塞者是43231057
    library cache lock是oracle内存结构中shared pool的内存锁,
    library cache是oracle内存中解析各种对象的地方,此等待事件一般说明有对象被大量解析导致争用
    接下来分析阻塞者43231057 在干什么
SQL> select sample_time,sql_id,event,blocking_session,program from v$active_session_history where blocking_session is not null and rownum <20;

SAMPLE_TIME            SQL_ID        EVENT              BLOCKING_SESSION PROGRAM
------------------------------ ------------- ------------------------------ ---------------- ------------------------------------------------
02-MAR-21 02.16.09.315 PM            library cache lock             4323 [email protected] (LGWR)
02-MAR-21 02.16.09.315 PM            library cache lock             1057 JDBC Thin Client
02-MAR-21 02.16.09.315 PM            library cache lock             1057 [email protected] (TNS V1-V3)
02-MAR-21 02.16.08.315 PM            library cache lock             4323 [email protected] (LGWR)
02-MAR-21 02.16.08.315 PM            library cache lock             1057 JDBC Thin Client
02-MAR-21 02.16.08.315 PM            library cache lock             1057 [email protected] (TNS V1-V3)
02-MAR-21 02.16.07.315 PM            library cache lock             4323 [email protected] (LGWR)
02-MAR-21 02.16.07.315 PM            library cache lock             1057 JDBC Thin Client
02-MAR-21 02.16.07.315 PM            library cache lock             1057 [email protected] (TNS V1-V3)
02-MAR-21 02.16.06.315 PM            library cache lock             4323 [email protected] (LGWR)
02-MAR-21 02.16.06.315 PM            library cache lock             1057 JDBC Thin Client
02-MAR-21 02.16.06.315 PM            library cache lock             1057 [email protected] (TNS V1-V3)
02-MAR-21 02.16.05.305 PM            library cache lock             4323 [email protected] (LGWR)
02-MAR-21 02.16.05.305 PM            library cache lock             1057 JDBC Thin Client
02-MAR-21 02.16.05.305 PM            library cache lock             1057 [email protected] (TNS V1-V3)
02-MAR-21 02.16.04.305 PM            library cache lock             4323 [email protected] (LGWR)
02-MAR-21 02.16.04.305 PM            library cache lock             1057 JDBC Thin Client
02-MAR-21 02.16.04.305 PM            library cache lock             1057 [email protected] (TNS V1-V3)
02-MAR-21 02.16.03.305 PM            library cache lock             4323 [email protected] (LGWR)
  • 查看会话4323的历史发现blocking_session为空,说明它不是被别的会话阻塞的,也就说明它是阻塞的源头
    通过其他信息可以看到会话4323是从主机名WORKGROUP\WIN-1DNF2S54TTS的设备上使用plsqldev.exe连上来的,执行的sql是1t7jfygm1b944
SQL> select sample_time,sql_id,event,blocking_session,program,machine from v$active_session_history where session_id=4323  and rownum <20;

SAMPLE_TIME            SQL_ID        EVENT              BLOCKING_SESSION PROGRAM                MACHINE
------------------------------ ------------- ------------------------------ ---------------- ------------------------------ ----------------------------------------
02-MAR-21 02.16.09.315 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.08.315 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.07.315 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.06.315 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.05.305 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.04.305 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.03.305 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.02.305 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.01.305 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.16.00.305 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.59.295 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.58.295 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.57.285 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.56.285 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.55.285 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.54.285 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.53.285 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.52.275 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
02-MAR-21 02.15.51.275 PM      1t7jfygm1b944                             plsqldev.exe           WORKGROUP\WIN-1DNF2S54TTS
  • 查看1t7jfygm1b944的具体sql内容是什么,发现查不出来,oracle一般会把sql语句hash出sql_id便于用来解析,
    sql_id但是没有语句的情况一般是由于解析不成功,所以不在内存里,
    但是如果真的是解析不成功,那么在开启10035事件的情况下,是可以在后台日志里看到解析失败的sql的
    巧了,这个库是开启的
SQL> select sql_fulltext from v$sql where sql_id='1t7jfygm1b944';

no rows selected
  • 顺道看一下会话4323是谁连上来的,等一下去通知他
SQL> select sid,osuser,machine from v$session where machine='WORKGROUP\WIN-1DNF2S54TTS';

       SID OSUSER             MACHINE
---------- ------------------------------ ----------------------------------------
      1972 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      2074 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      2405 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      2458 zsh4305            WORKGROUP\WIN-1DNF2S54TTS
      3226 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      3315 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      3460 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      3508 csh11556           WORKGROUP\WIN-1DNF2S54TTS
      3555 csh11556           WORKGROUP\WIN-1DNF2S54TTS
      3609 CSH8245            WORKGROUP\WIN-1DNF2S54TTS
      3700 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      3748 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      3895 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      3987 csh11556           WORKGROUP\WIN-1DNF2S54TTS
      4137 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
      4227 zsh4305            WORKGROUP\WIN-1DNF2S54TTS
      4281 CSH8245            WORKGROUP\WIN-1DNF2S54TTS
      4323 zsh11571           WORKGROUP\WIN-1DNF2S54TTS
  • 查看后台日志,果然定位到了2点16分的失败解析sql,是一个包含大量in的语句,
    in包含的对象太多,对于oracle的解析工作是个非常大的负担,这种情况应该改写成join
[oracle@nbastb ~]$ less /s01/app/oracle/diag/rdbms/nba_stb/nba_stb/trace/alert_nba_stb.log
Tue Mar 02 14:16:09 2021
PARSE ERROR: ospid=178903, error=1013 for statement:
select
  s.slip_code1,
  s.slip_code2,
  s.outer_order_code outer_order#,
  s.code,
  (s.slip_code2||'_'||sk.ext_code2) vlookup_01,
  s.shop_id,
  s.status,
  e.name,
       nvl(decode(e.name,
            '<CD><EA><B3><C9>',
            'POD',
            '<B4><FD>ϵ<U+0373>ˢ<CD><EA><B3><C9>',
            'Shipment confirmation',
            '<BF><U+2DFF><U+05FC><B1><B8><D6><D0>',
            'Waiting for shipment',
            'ȡ<CF><FB>',
            'Cancel'),
         0) STATUS_T,
  sk.code SKU_CODE,
  sk.supplier_code,
  sk.ext_code1,
  sk.ext_code2,
  sl.requested_qty,
  sl.unit_price,
  sl.total_actual,
  m.province,m.city,
  wh.name WH_NAME,
  to_char(s.create_time + 0, 'yyyy-MM-dd hh24:mi:ss') create_time,
  to_char(s.tb_payment_time + 0, 'yyyy-MM-dd hh24:mi:ss') tb_payment_time,
  to_char(s.cancel_time + 0, 'yyyy-MM-dd hh24:mi:ss') cancel_time,
  to_char(s.delivery_time + 0, 'yyyy-MM-dd hh24:mi:ss') delivery_time,
  s.sys_memo,
  s.to_wh_remark,
  s.is_cod_order,
  m.trans_time_type,
  s.trans_code
from
  nba.t_so_sales_order s
  left join nba.t1 m on m.id = s.id
  left join nba.t2 wt on wt.ref_slip_code = s.code
  left join nba.t3 sl on s.id = sl.so_id
  left join nba.t4 sk on sk.ext_code1 = sl.extension_code
  LEFT JOIN nba.t5 wh ON s.wh_id = wh.id,
  nba.t6 e
where
  1 = 1
  and s.shop_id in (4422, 4423)
  and wt.workflow_Id = 3
  and e.node_no = wt.current_node_no
  And to_char(S.Create_Time, 'yyyy mmdd hh24mi') >= '2021 0223 0000'
  -- And to_char(S.Create_Time, 'yyyy mmdd hh24mi') < '2020 0901 0000'
  -- and s.code like 'B%'
  and s.slip_code2 in ('NBA1185360744',
'NBA1186058875',
'NBA1186058891',
'NBA1186058880',
'NBA1186058874',
'NBA1186070646',
'NBA1186068022',
'NBA1186128429',
'NBA1186142312',
'NBA1186142062',
'NBA1186141961',
'NBA1186141905',
'NBA1186143256',
'NBA1186142919',
'NBA1186143035',
'NBA1186142819',
'NBA1186142067',
'NBA1186143571',
'NBA1186143575',
'NBA1186144320',
'NBA1186144239',
'NBA1186144248',
'NBA1186142180',
'NBA1186142155',
'NBA1186142080',
'NBA1186146271',
'NBA1186145407',
'NBA1186145779',
'NBA1186146081',
'NBA1186145458',
'NBA1186145997',
'NBA1186145481',
'NBA1186144368',
'NBA1186146097',
'NBA1186146061',
'NBA1186146085',
'NBA1186146019',
'NBA1186145971',
'NBA1186144654',
'NBA1186145984',
'NBA1186145148',
'NBA1186144718',
'NBA1186145708',
'NBA1186145818',
'NBA1186144845',
'NBA1186144664',
'NBA1186146049',
'NBA1186145585',
'NBA1186144448',
'NBA1186145518',
'NBA1186145835',
'NBA1186145544',
'NBA1186145232',
'NBA1186146089',
'NBA1186146277',
'NBA1186145785',
'NBA1186145773',
'NBA1186144753',
'NBA1186146032',
'NBA1186144655',
'NBA1186146052',
'NBA1186144499',
'NBA1186144703',
'NBA1186144775',
'NBA1186146038',
'NBA1186144536',
'NBA1186144890',
'NBA1186146099',
'NBA1186144493',
'NBA1186146116',
'NBA1186145258',
'NBA1186144729',
'NBA1186145513',
'NBA1186145522',
'NBA1186146253',
'NBA1186146100',
'NBA1186145482',
'NBA1186144352',
'NBA1186144699',
'NBA1186145537',
'NBA1186145219',
'NBA1186145995',
'NBA1186144656',
'NBA1186145480',
'NBA1186145803',
'NBA1186145977',
'NBA1186145400',
'NBA1186146110',
'NBA1186144765',
'NBA1186145998',
'NBA1186146259',
'NBA1186145970',
'NBA1186146034',
'NBA1186145991',
'NBA1186145515',
'NBA1186145332',
'NBA1186144832',
'NBA1186145778',
'NBA1186145987',
'NBA1186145357',
'NBA1186146008',
......

解决办法

  • 通过刚才找到的osuser也就是工号确定是谁,然后让他把in改成join

猜你喜欢

转载自blog.51cto.com/15122892/2648142
今日推荐