Comprehensive analysis of the classification, discovery and optimization of Oracle wait events


http://click.aliyun.com/m/21917/
Abstract: The author introduces Han Feng, database architect of CreditEase Technology R&D Center. Proficient in a variety of relational databases, worked in Dangdang, TOM Online and other companies, served as the chief DBA, database architect and other positions of many companies, many years of first-line database architecture, design, development experience. Author of the book "SQL Optimization Best Practices".

About the Author
Han Feng, Database Architect of CreditEase Technology R&D Center. Proficient in a variety of relational databases, worked in Dangdang, TOM Online and other companies, served as the chief DBA, database architect and other positions of many companies, many years of first-line database architecture, design, development experience. Author of the book "SQL Optimization Best Practices".



  1. The origin of the waiting


event You may be a little strange. Why do you say waiting for the event? First, we talked about the indicator system. In fact, it is precisely because of the development of the indicator system that the introduction of the waiting event is made. To sum up, Oracle's index system has roughly gone through the following three stages:



the main reference index



based on hit rate, and the main optimization entry basis based on various hit rates, such as "library cache hit radio". But this method has great drawbacks. A system with a hit rate of 99% is not necessarily better than a 95% system. In the old Oracle version, this method is often used, such as 8i, 9i and so on.



Taking waiting events as the main reference indicator



Taking various waiting events as the basis for optimizing the entry, the common ones are "db file sequential read" and so on. It can be intuitively understood that the database has mainly experienced those waits over a period of time. These "bottlenecks" are often the starting point for our optimization. Widely used in 10g and 11g versions.



Taking the time model as the main reference indicator



Take the overall consumption of various resources as the basis for optimizing the entry. You can get a holistic view of how your database is being consumed over time. It is more general than waiting for events. Common such as "DB Time". Oracle is continuously strengthening this work.



It can be seen from the above three stages that the introduction of waiting events is precisely to solve the many drawbacks of using the hit rate as an indicator. Compared with the later time model, wait events observe Oracle's behavior in a more intuitive and fine-grained way, and are often used as an important entry for optimization. The time model, on the other hand, focuses more on the overall and systematic understanding of the database running state. The focus of the two is different.



  Second, the classification of waiting events


Let us first start with the classification of waiting events to understand waiting events. From a large classification point of view, waiting events can be divided into two parts: idle and non-idle. In the non-idle waiting event, it can be further divided into fine categories.



The following method can be used to observe the number of waiting events and their approximate classification (the following statements are run in the 11g environment).



20161101103051868.jpg

20161101103102757.jpg



The waiting event whose WAIT_CLASS is "Idle" is idle, and the others are non-idle waiting events.

1) Distinguish - idle and non-idle waiting events
Idle waiting events mean that Oracle is waiting for some kind of work, such as after logging in with sqlplus, but no further commands are issued. At this time, the session is in SQL*Net message from/to client Waiting for the event status, waiting for the user to issue a command, generally do not pay too much attention to this part of the event when diagnosing and optimizing the database.



Non-idle wait events, specifically for Oracle activities, refer to waits that occur during database tasks or applications running. These wait events should be paid attention to and studied when adjusting the database.



2) Waiting for event classification description
Administrative class - Administrative

wait events such as these are caused by DBA administrative commands that require the user to be in a wait state (eg, rebuild indexes).



Application class - Application

such wait events are caused by user application code (eg, lock waits).



Cluster class-Cluster

such wait events are related to the real application cluster RAC resources (for example, gc cr block busy wait events).



Commit confirmation class - Commit

such wait events include only one kind of wait event - after executing a commit command, wait for a redo log write confirmation (that is, log file sync).



Concurrency class - Concurrency

such wait events are caused by internal database resources (such as latches).



Configuration class - Configuration

such wait events are caused by improper configuration of the database or instance (eg, redo log file size is too small, shared pool size, etc.).



Idle class - Idle

wait events like this means that the session is not active, waiting for work (eg, sql*net messages from client).



Network class

- some wait events related to Network and network environment (such as sql* net more data to dblink).



Other classes - Other

such wait events are usually rare (such as wait for EMON to spawn).



Scheduler class - Scheduler

Such wait events are related to resource management (eg resmgr: become active').



System I/O class - System I/O

such wait events are caused by I/O operations of background processes (such as DBWR wait - db file paralle write).



User I/O class - User I/O

wait events like this are usually caused by user I/O operations (such as db file sequential read).





  Third, understand the waiting event


Each waiting event indicates an active state of the database. As can be seen from the above query, the system has built-in many wait events, and you can learn about each wait event through the data dictionary V$EVENT_NAME. The following is an illustration of one of the most common wait events.



20161101103128488.jpg

20161101103142928.jpg



This waiting event "db file sequential read", literally translated as "data file sequential read", is a waiting event belonging to the "User I/O" category. It is usually a read operation related to a single data block, most of the time reading an index block or reading a data block by index, this wait will be logged. This event indicates a large amount of waits on a single data block, a high value is usually due to poor join order between tables, or the use of non-selective indexes. By correlating this wait with other known issues in the statspack report (such as inefficient sql), by checking to make sure index scans are necessary, and by making sure the join order for multi-table joins is adjusted, DB_CACHE_SIZE can determine that this event occurs Frequency of.



The wait event contains three parameters, namely:



file#: the absolute file number of the file to be read by oracle

block#: the starting data block block number to be read from this file

blocks: The number of blocks read. Usually 1, which means a single block read.



Through the above parameters, the associated data dictionary can determine the object that waits for the event (that is, the hotspot object is found). Then, according to different situations, targeted solutions are carried out.



The more you know about wait events, the better you can understand how the database operates, which in turn improves your overall optimization capabilities. Later, I will introduce common wait events.



  4. Observing waiting events The


system has built-in views, through which you can understand the overall (system-level), local (session-level) waiting events and the classification statistics of various types of events. The following is a description of some of the main views.



1. For the waiting events supported by the v$event_name



system, you can view information such as the category of the waiting event and the meaning of the parameters.



2. v$system_wait_class



displays the instance-wide time totals for each registered wait class.



Statistics of wait event categories (system level). From this view, you can get a global view of the type of system that waits more for operations.



3. v$system_event



waits for event statistics (system level). Expanded, it provides a summary of each wait event since the instance was started. Often used to obtain historical images of system waiting information. By obtaining the increment of waiting items through two snapshots, the waiting items of the system during this period can be determined.



The main fields include:



TOTAL_WAITS





The total wait times for this wait event since the database was started to the present.



TIME_WAITED





The total wait time (unit: hundredths of a second) for this wait event. This data represents the sum of the total wait events for this wait event in all sessions (including those that have ended and are remaining connected) since the database was started.



AVERAGE_WAIT







The average wait time for this wait event (unit: hundredths of a second).

time_waited/total_waits



TOTAL_TIMEOUTS





The total wait times out for this wait event.

SQL – View top-level events by waiting time



20161101103159169.jpg

20161101103219318.jpg



4. v$session_event



is similar to v$system_event, which records the cumulative value of each waiting event in its life cycle. Compared with the former, session_id information is added. This information is also accumulated in v$system_event at the same time. Note that when a session is re-established, the statistics will be set to 0.



5. v$session_wait, v$session



active session is waiting for resource or event information. This view was merged with the v$session view in 10g. This is a critical view for finding performance bottlenecks. It provides what the session is currently waiting for in the database under any circumstances. When the system has performance problems, this view can be used as a starting point to point out the direction to explore the problem.



It is important to note that when the wait no longer exists, the history of those waits that were previously present in the session will also disappear, making postmortem diagnosis very difficult. V$SESSION_EVENT provides accumulated but not very detailed data. Historical information can be obtained through the history view v$session_wait_history.



The main fields include:



The event the EVENT



session is currently waiting for, or the last event it was waiting for.



WAIT_TIME



The time the session waits for events (unit: hundredths of a second).

Value>0: The last waiting time (unit: 10ms), currently not in the waiting state.

Value=0: The session is waiting for the current event.

Value=-1: The last waiting time is less than 1 statistical unit, and it is not currently waiting.

Value=-2: Time statistics state is not set to available, currently not in waiting state.



STATE



waiting state (provides an explanation of the wait_time and second_in_wait fields)



1) waiting:

SESSION is waiting for this event.

2) waited unknown time:

Since the timed_statistics value is set to false, time information cannot be obtained. Indicates that a wait occurred, but for a short time.

3) wait short time:

Indicates that waiting has occurred, but since the time is very short and does not exceed one time unit, there is no record.

4) Waited knnow time:

If the session waits and then gets the required resources, it will enter this state from waiting.



WAIT_TIME/SECOND_IN_WAIT



Wait_time and Second_in_wait field values ​​are related to state.



1) state=waiting

wait_time is useless, the second_in_wait value is the actual wait time (unit: seconds).

2) state=wait unknow time

wait_time and second_in_wait are useless.

3) state=wait short time

wait_time and second_in_wait are useless.

4) state=waiting known time

The wait_time value is the actual waiting time (unit: seconds), and the second_in_wait value is useless.



6. v$session_wait_history



records the last n waiting events of the session, that is, the history of v$session_wait. The default is to record 10 times, which can be modified.



7. The view of v$event_histogram



records the histogram distribution of waiting events, so that you can have a further understanding of the specific distribution of a waiting event. In the v$session_event or v$system_event view, the accumulated information and the average value of waiting are recorded, and the time consumed by individual waiting cannot be known.



The relationship between session wait events and views is summarized as follows:



. Only one wait event occurs in a session at a time. If other wait events are seen, it simply means that the wait occurred on the next time slice. There is only one wait at a time.

The wait_time and second_in_wait fields in v$session_wait are in seconds, while the time_waited and average_wait fields in v$session_event are in hundredths of a second.

After the wait event for v$session_wait ends, the statistics of v$session_event will change.

The statistics of v$session_wait are of little significance because the information changes in real time.

When the wait event in v$session_wait ends, the value of the seconds_in_wait field in v$session_wait is copied to the time_waited field in v$session_event, and the average_time field in the v$session_event view is also modified.



  Five, common waiting events


Oracle has a lot of waiting events, and there are some differences in different versions. Some common wait events are described below. Hope it can help you in your daily work.



1. Reason for buffer busy waits



:

When a session reads data blocks from disk to memory, it needs to find free memory space in memory to store these data blocks. When there is no free space in memory, it will generate This wait. In addition, there is also a situation where the session needs to construct the pre-image of the data block at a certain time when doing consistent reading. At this time, it is necessary to apply for a memory block to store these newly constructed data blocks. If such a memory block cannot be found in the memory, this wait event will also occur.



Parameter meaning:

File#

Waiting to access the file id number of the data block

Blocks

Waiting to access the data block number

Id

Before 10g, this value indicates the cause of the waiting event; after 10g, it indicates the type of the waiting event.



Optimization direction: Depending on the category that generates this wait event, the optimization direction is also different.



data block

The general optimization direction is to optimize SQL, reduce logical reads, physical reads; or reduce the size of a single block of stored data.



Data segment header The

general optimization direction is to increase FREELISTS and FREELIST GROUPS. Make sure that the gap between FCTFREE and PCTUSED is not too small, thus minimizing block loops for FREELIST.



Undo block The

general optimization direction is the application program, and the data object is used off-peak.



Undo the segment header

If the database system manages the UNDO segment, no intervention is generally required. If it is self-managed, the number of transactions per rollback segment can be reduced.



2. The cause of the buffer latch



:

The storage location of the data block in the memory is recorded in a Hash list. When a session needs to access a data block, it first searches the Hash list, obtains the address of the data block from the list, and then accesses the required data block through this address. This list oracle will use a latch to protect it. completeness. When a session needs to access this list, it needs to obtain a latch. Only in this way can it be guaranteed that the list will not change during the browsing of this session. If the list is too long, it will take too long for the session to search the list, leaving other sessions in a waiting state. The same data block is frequently accessed, which is what we usually call the hot block problem.



Parameter meaning: The virtual address in the SGA of the latch applied by the

latch addr

session.

The index value in the chain#

buffer chains hash list. When the value of this parameter is equal to 0xffffff, the current session is waiting for an LRU latch.



Optimization direction:

The optimization directions that can be considered include using multiple buffer pools to create more buffer chains or using the parameter db_block_lru_latches to increase the number of latches so that more sessions can obtain latches. These two methods can be used at the same time.



3. Reasons for db file sequential read



:

It is usually a read operation related to a single data block. In most cases, when an index block is read or a data block is read through an index, this wait will be recorded. May show that the tables are not joined in order, or indexed indiscriminately. This value is mostly normal for a high-transaction, well-tuned system, but in some cases it can indicate a problem in the system. This wait statistic should be linked to known issues in performance reports (such as inefficient SQL). Check index scans to ensure that each scan is necessary, and check join order for multi-table joins.



DB_CACHE_SIZE is also a determinant of how often these waits occur. Hash-area connections in question should appear in PGA memory, but they also consume a lot of memory, causing a lot of waits on sequential reads. They may also come in the form of direct path read/write waits.



Parameter meaning:

file#

represents the absolute file number of the file to be read by oracle

block# The number of blocks read

from the starting data block block number

blocks

read from this file. Usually 1, which means a single block read.



Optimization direction:

This waiting event does not necessarily mean that there must be a problem. If you can determine that there is a problem, you can follow the following optimization ideas.



Modify the application to avoid sql with a lot of IO, or reduce its frequency.

Increase the data buffer to improve the hit rate.

Adopt a better disk subsystem to reduce the response time of a single IO and prevent the appearance of physical bottlenecks.



4. The reason for the occurrence of db file scattered read



:

This is a waiting event caused by a user operation. When the user issues an SQL operation that requires reading multiple data blocks for each I/O, this waiting event will be generated. The most common two are: In this case, full table scan and index fast scan. The scattered (divergence) in this name may lead many people to think that it reads data blocks in a scattered way. In fact, on the contrary, when such a wait event occurs, the operation of SQL is to read data blocks sequentially. , such as FTS or IFFS. In fact, scattered here refers to the way the read data blocks are stored in memory. After they are read into memory, they are stored in memory in a scattered manner, not contiguous.



Parameter meaning:

file#

represents the absolute file number of the file to be read by oracle.

block#

The starting data block block number to start reading from this file.

The number of blocks

read by blocks.



Optimization direction:

This situation usually shows waits related to full table scans. When full table scans are limited to memory, they rarely go into contiguous buffers, but are scattered throughout the buffer memory. If this number is large, it indicates that the table cannot find indexes, or that only a limited number of indexes can be found. Although it may be more efficient to perform full table scans than index scans under certain conditions, it is a good idea to check whether these full table scans are necessary if such waits occur.



5. Reasons for direct path read



:

This wait event occurs when the session reads the data block directly into the PGA instead of the SGA. The read data is usually private data of the session, so it does not need to be placed in the SGA as shared data, because there is no need to do so. significance. These data are usually data from temporary segments, such as SQL sorting data in a session, data generated in the middle of parallel execution, and sorted data generated by Hash join and Merge join, because these data are only used for SQL operations in the current session. Makes sense, so it doesn't need to be put into SGA. When the direct path read wait event occurs, it means that a large amount of temporary data is generated on the disk, such as sorting, parallel execution and other operations, or it means that there is insufficient free space in the PGA.



In 11g, the full table scan may use the direct path read method, bypassing the buffer cache, such a full table scan is a physical read. In 10g, it is read through gc buffer, so there is no problem of direct path read.



Parameter meaning:

file#

file number

first block#

The starting block number to read

block count The

first block is the starting point, and the number of physical blocks read continuously.



Optimization direction:

With this waiting event, several situations need to be distinguished. One direction is to increase the sorting area and other means, and the other direction is to reduce the amount of read IO or determine whether it is more efficient to read through the buffer. 6. Cause



of direct path write : Occurs when Oracle writes data directly from PGA to data files or temporary files, which can bypass SGA. Most common in disk sorting. In this case, you should find the data file with the most frequent operations (if it is sorted, it is likely to be a temporary file), and distribute the load.









Parameter meaning:

file#

file number

first block#

The starting block number of the read

block count The

first block is the starting point, and the number of physical blocks written continuously.



Optimization direction: reduce the scale of IO writing.



7. The reason for the occurrence of library cache lock



:

This wait event occurs when different users are in the shared pool due to resource contention caused by concurrent operations on the same database object. For example, when a user is performing a DDL operation on a table, if other users want to access the table, a library cache lock wait event will occur, and it will not continue to operate until the DDL operation is completed.



Parameter meaning:

Handle address

The address of the loaded object.

Lock address

The address of the lock.

Mode

is the data fragment of the loaded object.

Namespace

The name of the namespace of the loaded object in the v$db_object_cache view.



Optimization direction: The optimization direction is to view locked objects and reduce contention.



8. Reasons for library cache pin



:

This wait event, like the library cache lock, is a wait event caused by concurrent operations in the shared pool. Generally speaking, if Oracle wants to recompile some objects such as pl/sql or views, these objects need to be pinned to the shared pool. If this object is held by other objects at this time, a wait for the library cache pin will occur.



Parameter meaning:

Handle address

The address of the loaded object.

Lock address

The address of the lock.

Mode

is the data fragment of the loaded object.

Namespace

The name of the namespace of the loaded object in the v$db_object_cache view.



Optimization direction: The optimization direction is to view locked objects and reduce contention.



9. Reason for log file sync



:

This is a waiting event caused by user session behavior. When a session issues a commit command, the LGWR process writes the redo log generated by the transaction to the disk from the log buffer to ensure that the information submitted by the user is safely recorded in the database. After the session sends the commit command, it needs to wait for LGWR to successfully write the redo generated by the transaction to the disk before continuing the subsequent operations. This waiting event is called log file sync. When a large number of log file sync wait events occur in the system, you should check whether there are users in the database doing frequent commit operations. Such wait events usually occur on OLTP systems. There are many small transactions in the OLTP system. If these transactions are submitted frequently, it may cause a large number of log file sync waiting events.



Optimization direction:

Improve LGWR performance, try to use fast disks,

use batch submission,

and use options such as nologging/unrecoverable appropriately.



10. Reason for SQL*Net message from client



:

Indicates that the foreground server process is waiting for the client to respond. This wait event is caused by waiting for the response of the user process, and it does not indicate that the database is abnormal. This latency often occurs if the network fails.



11. Reason for SQL*Net message to client



:

This wait event occurs when the server sends a message to the client. When the server sends a message to the client and waits, the possible reason is that the client is too busy to receive the message sent by the server in time, or it may be a network problem that prevents the message from being sent from the server to the client.



This article is from the Yunqi community partner "DBAplus", the original release time: 2016-11-01
http://click.aliyun.com/m/21917/

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326612619&siteId=291194637