reliable message

reliable message

1 phenomenon

Colleague feedback process Sqlloader a particularly slow to load data, usually a few minutes runs out of things, this time ran three and a half hours did not finish. Query the database session, information is as follows:

SID   USER_NAME  EVENT
---   ---------- -------
1962  STG        reliable message

Take this opportunity to look at it this wait event.

2 Reliable the Message Analysis

 

2.1 Event Description

On the MOS interpretation reliable message as follows:

When a process sends a message using the 'KSR' intra-instance broadcast service,
 the message publisher waits on this wait-event until all subscribers have
consumed the 'reliable message' just sent. The publisher waits on this wait-event
 for up to one second and then re-tests if all subscribers have consumed the
message, or until posted. If the message is not fully consumed the wait recurs,
repeating until either the message is consumed or until the waiter is interrupted.

Description This event is a publish waiting for a message waiting party appears. When the message queue message is not all read, it will wait for this event. After checking that the document, wait for this event it is for a variety of channel. Different channel for different situations. There are also different solutions. And most of BUG, ​​need to patch, or upgrade to a higher version .workaround, basically restart the instance, or related functions shut down.

2.2 View Channel

The most serious problem of channel queries from gv $ channel_waits view. 1, you can immediately determine there is a problem with one or more channel. The method although it could be, but a bit of trouble.

  • method 1

    SELECT CHANNEL,
      SUM(wait_count) sum_wait_count
    FROM GV$CHANNEL_WAITS
    GROUP BY CHANNEL
    ORDER BY SUM(wait_count) DESC;
    

    Query Example:

    CHANNEL                                                          SUM_WAIT_COUNT
    ---------------------------------------------------------------- --------------
    Result Cache: Channel                                                  15436686
    RBR channel                                                                9393
    kxfp control signal channel                                                7357
    MMON remote action broadcast channel                                       3070
    obj broadcast channel                                                      1731
    service operations - broadcast channel                                        2
    kill job broadcast - broadcast channel                                        2
    parameters to cluster db instances - broadcast channel                        2
    quiesce channel                                                               2
    

    From the results above, you can see the "Result Cache: Channel", is the most problematic channel.

  • Method 2

    select to_char(p1, 'XXXXXXXXXXXXXXXX') event_param,
     count(*), sum(time_waited/1000000) time_waited
    from gv$active_session_history
    where event = 'reliable message'
    group by to_char(p1, 'XXXXXXXXXXXXXXXX')
    order by time_waited*count(*) desc;
    -- 取出影响最大的内存地址
    select name_ksrcdes
     from x$ksrcdes
     where indx in (select name_ksrcctx from x$ksrcctx where addr in (&1));
    Enter value for 1: '7ACD8AA60','7ACD8FA88'
    old   3:  where indx in (select name_ksrcctx from x$ksrcctx where addr in (&1))
    new   3:  where indx in (select name_ksrcctx from x$ksrcctx where addr in ('7ACD8AA60','7ACD8FA88'))
    
    NAME_KSRCDES
    ----------------------------------------------------------------
    Result Cache: Channel
    RBR channel
    

    From the results above point of view, have been explicitly targeted to the question "Result Cache: Channel" above is just one example of multiple channel queries. Examples of this need, only the first addr = '7ACD8AA60' can.

3 solution

 

3.1 Result Cache: Channel

The following three elections:

  • Database update to 12.2 or 12.1.0.2.0 Patchset
  • Application patch 18,416,368
  • workaround

    SQL> alter system set result_cache_max_size=0 scope=both sid='*';
    

    After modifying parameters, instances need to restart.

3.2 RBR channel

Affects Version: 11.2.0.3

Bug 15826962 High "reliable message" wait due to "RBR channel"。

The safest way is to come to process trace, or system trace, and then compared with MOS document, or open the SR, the Oracle service staff to help determine.

In the following versions, patches been fixed:

  • 11.2.0.4 (Server Patch Set)
  • 11.2.0.3.12 (Oct 2014) Database Patch Set Update (DB PSU)
  • 11.2.0.3 Bundle Patch 19 for Exadata Database
  • 11.2.0.3 Patch 34 on Windows Platforms
  • 11.2.0.3 Patch 23 on Windows Platforms

So the solution is to upgrade or patch.

3.3 kxfp control signal channel

Affects Version
12.1.0.2
(no term)

In fact, analysis of the phenomenon here is not just waiting for this one channel serious. Examples are as follows:

SQL> select CHANNEL,sum(wait_count) sum_wait_count

from GV$CHANNEL_WAITS group by
CHANNEL order by sum(wait_count)   2    3
 4  /

CHANNEL                                                          SUM_WAIT_COUNT
---------------------------------------------------------------- --------------
Flashback RVWR init channel                                                   2
quiesce channel                                                               3
PMON actions channel                                                          6
Broker IQ Result Channel                                                     24
kill job broadcast - broadcast channel                                       54
parameters to cluster db instances - broadcast channel                      137
GEN0 ksbxic channel                                                        1035
Flashback Marker channel                                                   1546
LCK0 ksbxic channel                                                        2669
service operations - broadcast channel                                     7033
MMON remote action broadcast channel                                      78046
kxfp remote slave spawn channel                                          157850
Result Cache: Channel                                                    242303
RBR channel                                                             1595647
obj broadcast channel                                                   4105387
kxfp control signal channel                                             5582125

In addition you can see, there are outside kxfp control signal channel obj broadcast channel. The two other several times and even dozens of times.

At the same time, it is recommended to do a hang analyze. View trace file contains the following:

ervice name: SYS$BACKGROUND
     Current Wait Stack:
      1: waiting for 'CSS group membership query'

If yes, explain CSS group membership queries appeared blocked, under normal circumstances should be very fast.

These two phenomena, which can determine whether Oracle BUG: 20470877.

(no term)

Solution only update patch

Patch 20470877: LONG WAITS FOR "RELIABLE MESSAGE" AFTER A FEW DAYS OF UPTIME

(no term)

workaround

Restart instance

Author: halberd.lee

Created: 2019-12-26 Thu 13:39

Validate

Guess you like

Origin www.cnblogs.com/halberd-lee/p/12101572.html