Solve a performance failure caused by row cache lock

The address of the reprinted article: http://www.oracleonlinux.cn/2012/06/row-cache-lock-performance-tuning/

This article is well written and explains the performance failures caused by the cache in great detail.

Received an urgent email: It was said that the production system of a customer was at around 9:30 this morning, and the response of the entire system in the production environment was very slow. It may take half a minute to query a ticket list, and sometimes the login interface is stuck for a long time before responding.

       This is a set of 10.2.0.5.0 two-node RAC database. Normally the system is running normally, but now it is suddenly slow. Well, for me, an optimization rookie, I still start with the AWR report. The analysis steps and solutions are given below:

1 It can be seen from the report header that within one hour of the database performance problem, the DB Time reached 240 minutes, (DB Time)/Elapsed=3.93, indicating that there should be a problem with the database.

 

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 10638 05-June-12 09:00:00 54 5.8
End Snap: 10639 05-June-12 10:01:03 102 5.8
Elapsed: 61.04 (mins)61.04 (mins)61.04 (mins)
DB Time:  

 

 

 

 

 

2 From the Load Profile section of the report, you can see that the user makes up to 680 calls per second, indicating that there must be a problem.

Load Profile

 

Per Second Per Transaction
Redo size: 12,939.46 6,115.60
Logical reads: 67,323.18 31,819.06
Block changes: 53.22 25.15
Physical reads: 1.02 0.48
Physical writes: 4.72 2.23
User calls: 679.70 321.25
Parses: 90.49 42.77
Hard parses: 0.35 0.16
Sorts: 1.94 0.92
Logons: 0.08 0.04
Executes: 316.75 149.71
Transactions: 2.12

3 Continue to analyze the report. From the Top 5, we can see that the first waiting event is row cache lock, and the average waiting time of this waiting event reaches 2128ms.

The ROW CACHE LOCK wait event is a wait event related to the shared pool, which is caused by the access to the dictionary buffer. Usually the direct solution can be solved by increasing the shared pool, but it is not effective in all scenarios.

Top 5 Timed Events

 

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
row cache lock 2,736 5,822 2,128 40.4 Concurrency
CPU time 4,305 29.9
gc cr block busy 2,293 2,633 1,148 18.3 Cluster
gc buffer busy 1,569 1,096 698 7.6 Cluster
enq: TX - row lock contention 2,029 998 492 6.9 Application

4 Continuing the analysis, it is found that the time-consuming loading sequence sequence ranks second in the time-based statistical information.

Time Model Statistics

  • Total time in database user-calls (DB Time): 14406.5s
  • Statistics including the word "background" measure background process time, and so do not contribute to the DB time statistic
  • Ordered by % or DB time desc, Statistic name

 

Statistic Name Time (s) % of DB Time
sql execute elapsed time 14,188.01 98.48
sequence load elapsed time 6,900.83 47.90
DB CPU 4,304.59 29.88
PL/SQL execution elapsed time 20.64 0.14
parse time elapsed 10.25 0.07
hard parse elapsed time 6.00 0.04
PL/SQL compilation elapsed time 1.17 0.01
hard parse (sharing criteria) elapsed time 1.07 0.01
repeated bind elapsed time 0.80 0.01
hard parse (bind mismatch) elapsed time 0.64 0.00
connection management call elapsed time 0.31 0.00
failed parse elapsed time 0.00 0.00
DB time 14,406.50
background elapsed time 2,115.75
background cpu time 20.52

5     看到最耗时的竟然是一条再简单不过的SQL语句,SELECT SEQ_NEWID.NEXTVAL FROM DUAL,取序列的值,竟然会如此的耗时?

SQL ordered by Elapsed Time

  • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
  • % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
  • Total DB Time (s): 14,406
  • Captured SQL account for 223.8% of Total

 

Elapsed Time (s) CPU Time (s) Executions Elap per Exec (s) % Total DB Time SQL Id SQL Module SQL Text
6,910 0 281 24.59 47.96 1gd7ancd2px8m FC.EdiService.Import.exe SELECT SEQ_NEWID.NEXTVAL FROM ...

6    再看字典缓冲区的统计信息:取序列值一共287次,就失败了43.9%,看来的确是取序列值的地方出现问题,也就解释了为什么上一步骤中的分许出的那条SQL会如此耗时,因为差不多有一半的情况下都没有取到序列的值。

Dictionary Cache Stats

  • "Pct Misses" should be very low (< 2% in most cases)
  • "Final Usage" is the number of cache entries being used

 

Cache Get Requests Pct Miss Scan Reqs Final Usage Mod Reqs PctMiss
dc_awr_control 65 1.54 0 1 1
dc_database_links 304 0.00 0 0 1
dc_global_oids 155 0.00 0 0 24
dc_histogram_data 74,704 0.25 0 0 5,612
dc_histogram_defs 71,400 0.26 0 0 4,945
dc_object_ids 29,398 0.01 0 0 1,136
dc_objects 3,912 0.23 0 0 860
dc_profiles 150 0.00 0 0 1
dc_rollback_segments 17,789 0.10 0 0 56
dc_segments 8,927 0.06 0 4 896
dc_sequences 287 43.90 0 279 3
dc_tablespace_quotas 2 50.00 0 0 2
dc_tablespaces 8,954 0.00 0 0 25
dc_usernames 1,082 0.00 0 0 8
dc_users 13,991 0.00 0 0 31
outstanding_alerts 326 77.91 0 23 54

7   到此,解决问题的基本思路已经出来了,通过将序列缓存到内存中,基本可以解决问题。通过查看生产系统上的该序列的信息,发现创建序列的语句如下:

1

2

3

4

5

6

7

-- Create sequence

create sequence SEQ_NEWID

minvalue 1000

maxvalue 9999

start with 1000

increment by 1

nocache;

8   调整序列,使之cache到内存中,alter sequence SEQ_NEWID cache 3000;

/*

sequence cache介绍

CACHE的主要用途是
在获取第一次的sequence时,预先创建出一部分sequence并存放在缓存中

例如:
CREATE SEQUENCE
SEQ_UNIQUEKEY AS INT
START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 99999999
CYCLE
CACHE 300
ORDER;

当创建成功时,在缓存中已经存在了1~300号sequence,每次执行NEXTVAL时,都会使用缓存中的seq。当seq增加到300时,下一次的NEXTVAL命令就会重新产生301~600号seq并存放于缓存。

若,当前连接中断,并重新连接数据库时,NEXTVAL命令取得的值就是在缓存中的最小值。
例如:
NEXTVAL.......(执行100次,下次seq为101)
执行,commit,disconnect命令断开连接(或者发生异常,强制commit)
则,下次连接到数据库后,执行NEXTVAL命令取得的seq就为301,101~300号自动作废。

 

但是因为数据库是rac环境,如果调整为cache,如果应用对序列有顺序要求的这种,调整就会有问题。

比如cache 40,

instance 1 使用1-40

instance 2使用41-80

和开发问过应用对序列顺序要求后,确认应用对sequence的顺序没有太大依赖,所以可以调整sequence的cache。

*/

Guess you like

Origin blog.csdn.net/yabignshi/article/details/113891722
Recommended