reliable message
Table of Contents
1 phenomenon
Colleague feedback process Sqlloader a particularly slow to load data, usually a few minutes runs out of things, this time ran three and a half hours did not finish. Query the database session, information is as follows:
SID USER_NAME EVENT --- ---------- ------- 1962 STG reliable message
Take this opportunity to look at it this wait event.
2 Reliable the Message Analysis
2.1 Event Description
On the MOS interpretation reliable message as follows:
When a process sends a message using the 'KSR' intra-instance broadcast service, the message publisher waits on this wait-event until all subscribers have consumed the 'reliable message' just sent. The publisher waits on this wait-event for up to one second and then re-tests if all subscribers have consumed the message, or until posted. If the message is not fully consumed the wait recurs, repeating until either the message is consumed or until the waiter is interrupted.
Description This event is a publish waiting for a message waiting party appears. When the message queue message is not all read, it will wait for this event. After checking that the document, wait for this event it is for a variety of channel. Different channel for different situations. There are also different solutions. And most of BUG, need to patch, or upgrade to a higher version .workaround, basically restart the instance, or related functions shut down.
2.2 View Channel
The most serious problem of channel queries from gv $ channel_waits view. 1, you can immediately determine there is a problem with one or more channel. The method although it could be, but a bit of trouble.
-
method 1
SELECT CHANNEL, SUM(wait_count) sum_wait_count FROM GV$CHANNEL_WAITS GROUP BY CHANNEL ORDER BY SUM(wait_count) DESC;
Query Example:
CHANNEL SUM_WAIT_COUNT ---------------------------------------------------------------- -------------- Result Cache: Channel 15436686 RBR channel 9393 kxfp control signal channel 7357 MMON remote action broadcast channel 3070 obj broadcast channel 1731 service operations - broadcast channel 2 kill job broadcast - broadcast channel 2 parameters to cluster db instances - broadcast channel 2 quiesce channel 2
From the results above, you can see the "Result Cache: Channel", is the most problematic channel.
-
Method 2
select to_char(p1, 'XXXXXXXXXXXXXXXX') event_param, count(*), sum(time_waited/1000000) time_waited from gv$active_session_history where event = 'reliable message' group by to_char(p1, 'XXXXXXXXXXXXXXXX') order by time_waited*count(*) desc; -- 取出影响最大的内存地址 select name_ksrcdes from x$ksrcdes where indx in (select name_ksrcctx from x$ksrcctx where addr in (&1)); Enter value for 1: '7ACD8AA60','7ACD8FA88' old 3: where indx in (select name_ksrcctx from x$ksrcctx where addr in (&1)) new 3: where indx in (select name_ksrcctx from x$ksrcctx where addr in ('7ACD8AA60','7ACD8FA88')) NAME_KSRCDES ---------------------------------------------------------------- Result Cache: Channel RBR channel
From the results above point of view, have been explicitly targeted to the question "Result Cache: Channel" above is just one example of multiple channel queries. Examples of this need, only the first addr = '7ACD8AA60' can.
3 solution
3.1 Result Cache: Channel
The following three elections:
- Database update to 12.2 or 12.1.0.2.0 Patchset
- Application patch 18,416,368
-
workaround
SQL> alter system set result_cache_max_size=0 scope=both sid='*';
After modifying parameters, instances need to restart.
3.2 RBR channel
Affects Version: 11.2.0.3
Bug 15826962 High "reliable message" wait due to "RBR channel"。
The safest way is to come to process trace, or system trace, and then compared with MOS document, or open the SR, the Oracle service staff to help determine.
In the following versions, patches been fixed:
- 11.2.0.4 (Server Patch Set)
- 11.2.0.3.12 (Oct 2014) Database Patch Set Update (DB PSU)
- 11.2.0.3 Bundle Patch 19 for Exadata Database
- 11.2.0.3 Patch 34 on Windows Platforms
- 11.2.0.3 Patch 23 on Windows Platforms
So the solution is to upgrade or patch.
3.3 kxfp control signal channel
- Affects Version
- 12.1.0.2
- (no term)
-
In fact, analysis of the phenomenon here is not just waiting for this one channel serious. Examples are as follows:
SQL> select CHANNEL,sum(wait_count) sum_wait_count from GV$CHANNEL_WAITS group by CHANNEL order by sum(wait_count) 2 3 4 / CHANNEL SUM_WAIT_COUNT ---------------------------------------------------------------- -------------- Flashback RVWR init channel 2 quiesce channel 3 PMON actions channel 6 Broker IQ Result Channel 24 kill job broadcast - broadcast channel 54 parameters to cluster db instances - broadcast channel 137 GEN0 ksbxic channel 1035 Flashback Marker channel 1546 LCK0 ksbxic channel 2669 service operations - broadcast channel 7033 MMON remote action broadcast channel 78046 kxfp remote slave spawn channel 157850 Result Cache: Channel 242303 RBR channel 1595647 obj broadcast channel 4105387 kxfp control signal channel 5582125
In addition you can see, there are outside kxfp control signal channel obj broadcast channel. The two other several times and even dozens of times.
At the same time, it is recommended to do a hang analyze. View trace file contains the following:
ervice name: SYS$BACKGROUND Current Wait Stack: 1: waiting for 'CSS group membership query'
If yes, explain CSS group membership queries appeared blocked, under normal circumstances should be very fast.
These two phenomena, which can determine whether Oracle BUG: 20470877.
- (no term)
-
Solution only update patch
Patch 20470877: LONG WAITS FOR "RELIABLE MESSAGE" AFTER A FEW DAYS OF UPTIME
- (no term)
-
workaround
Restart instance
Created: 2019-12-26 Thu 13:39