解决一则row cache lock引起的性能故障

转载的文章地址：http://www.oracleonlinux.cn/2012/06/row-cache-lock-performance-tuning/

此文章写的很好，非常详细的解说了cache引起的性能故障。

收到紧急邮件：说某客户生产系统在今天早上9:30左右，生产环境整个系统操作响应很慢，查询一票单子可能要耗时半分钟，有时候登录界面也卡住很久才响应。

这是一套10.2.0.5.0的双节点RAC数据库，平时系统运行较为正常，现在突然变慢。好了，对于我这个优化菜鸟来讲，还是从AWR报告入手。下面给出分析步骤和解决办法：

1 从报告头中看到，在数据库出现性能问题的一个小时内，DB Time达到240分钟，(DB Time)/Elapsed=3.93，说明数据库应该是存在问题的。

Snap Id	Snap Time	Sessions	Cursors/Session
Begin Snap:	10638	05-6月 -12 09:00:00	54	5.8
End Snap:	10639	05-6月 -12 10:01:03	102	5.8
Elapsed:	61.04 (mins)61.04 (mins)61.04 (mins)
DB Time:

2 从报告的Load Profile节，看到用户的每秒调用高达680次，说明肯定存在问题了。

Load Profile

Per Second	Per Transaction
Redo size:	12,939.46	6,115.60
Logical reads:	67,323.18	31,819.06
Block changes:	53.22	25.15
Physical reads:	1.02	0.48
Physical writes:	4.72	2.23
User calls:	679.70	321.25
Parses:	90.49	42.77
Hard parses:	0.35	0.16
Sorts:	1.94	0.92
Logons:	0.08	0.04
Executes:	316.75	149.71
Transactions:	2.12

3 继续分析报告，从Top 5中看到排在第一的等待事件是row cache lock,并且该等待事件的平均等待达到2128ms。

ROW CACHE LOCK等待事件是一个共享池相关的等待事件，是由于对于字典缓冲的访问造成的。通常直接的解决办可以通过调大共享池来解决，但是，并非在所有场景下都凑效。

Top 5 Timed Events

Event	Waits	Time(s)	Avg Wait(ms)	% Total Call Time	Wait Class
row cache lock	2,736	5,822	2,128	40.4	Concurrency
CPU time	4,305	29.9
gc cr block busy	2,293	2,633	1,148	18.3	Cluster
gc buffer busy	1,569	1,096	698	7.6	Cluster
enq: TX - row lock contention	2,029	998	492	6.9	Application

4 继续分析，发现基于时间的统计信息中，加载序列sequence的耗时排在了第二位。

Time Model Statistics

Total time in database user-calls (DB Time): 14406.5s
Statistics including the word "background" measure background process time, and so do not contribute to the DB time statistic
Ordered by % or DB time desc, Statistic name

Statistic Name	Time (s)	% of DB Time
sql execute elapsed time	14,188.01	98.48
sequence load elapsed time	6,900.83	47.90
DB CPU	4,304.59	29.88
PL/SQL execution elapsed time	20.64	0.14
parse time elapsed	10.25	0.07
hard parse elapsed time	6.00	0.04
PL/SQL compilation elapsed time	1.17	0.01
hard parse (sharing criteria) elapsed time	1.07	0.01
repeated bind elapsed time	0.80	0.01
hard parse (bind mismatch) elapsed time	0.64	0.00
connection management call elapsed time	0.31	0.00
failed parse elapsed time	0.00	0.00
DB time	14,406.50
background elapsed time	2,115.75
background cpu time	20.52

5 看到最耗时的竟然是一条再简单不过的SQL语句，SELECT SEQ_NEWID.NEXTVAL FROM DUAL，取序列的值，竟然会如此的耗时？

SQL ordered by Elapsed Time

Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
% Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
Total DB Time (s): 14,406
Captured SQL account for 223.8% of Total

Elapsed Time (s)	CPU Time (s)	Executions	Elap per Exec (s)	% Total DB Time	SQL Id	SQL Module	SQL Text
6,910	0	281	24.59	47.96	1gd7ancd2px8m	FC.EdiService.Import.exe	SELECT SEQ_NEWID.NEXTVAL FROM ...

6 再看字典缓冲区的统计信息：取序列值一共287次，就失败了43.9%，看来的确是取序列值的地方出现问题，也就解释了为什么上一步骤中的分许出的那条SQL会如此耗时，因为差不多有一半的情况下都没有取到序列的值。

Dictionary Cache Stats

"Pct Misses" should be very low (< 2% in most cases)
"Final Usage" is the number of cache entries being used

Cache	Get Requests	Pct Miss	Final Usage	Mod Reqs
dc_awr_control	65	1.54	1	1
dc_database_links	304	0.00	0	1
dc_global_oids	155	0.00	0	24
dc_histogram_data	74,704	0.25	0	5,612
dc_histogram_defs	71,400	0.26	0	4,945
dc_object_ids	29,398	0.01	0	1,136
dc_objects	3,912	0.23	0	860
dc_profiles	150	0.00	0	1
dc_rollback_segments	17,789	0.10	0	56
dc_segments	8,927	0.06	4	896
dc_sequences	287	43.90	279	3
dc_tablespace_quotas	2	50.00	0	2
dc_tablespaces	8,954	0.00	0	25
dc_usernames	1,082	0.00	0	8
dc_users	13,991	0.00	0	31
outstanding_alerts	326	77.91	23	54

7 到此，解决问题的基本思路已经出来了，通过将序列缓存到内存中，基本可以解决问题。通过查看生产系统上的该序列的信息，发现创建序列的语句如下：

-- Create sequence

create sequence SEQ_NEWID

minvalue 1000

maxvalue 9999

start with 1000

increment by 1

nocache;

8 调整序列，使之cache到内存中，alter sequence SEQ_NEWID cache 3000;

sequence cache介绍

CACHE的主要用途是
在获取第一次的sequence时，预先创建出一部分sequence并存放在缓存中

例如：
CREATE SEQUENCE
SEQ_UNIQUEKEY AS INT
START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 99999999
CYCLE
CACHE 300
ORDER;

当创建成功时，在缓存中已经存在了1~300号sequence,每次执行NEXTVAL时，都会使用缓存中的seq。当seq增加到300时，下一次的NEXTVAL命令就会重新产生301~600号seq并存放于缓存。

若，当前连接中断，并重新连接数据库时，NEXTVAL命令取得的值就是在缓存中的最小值。
例如：
NEXTVAL.......（执行100次，下次seq为101）
执行，commit，disconnect命令断开连接（或者发生异常，强制commit）
则，下次连接到数据库后，执行NEXTVAL命令取得的seq就为301，101~300号自动作废。

但是因为数据库是rac环境，如果调整为cache，如果应用对序列有顺序要求的这种，调整就会有问题。

比如cache 40，

instance 1 使用1-40

instance 2使用41-80

和开发问过应用对序列顺序要求后，确认应用对sequence的顺序没有太大依赖，所以可以调整sequence的cache。

解决一则row cache lock引起的性能故障

猜你喜欢