oracle internal : Understanding and Tuning Buffer Cache and DBWR

APPLIES TO:

Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Information in this document applies to any platform.

PURPOSE

This article describes what issues affect the performance of the buffer cache and DBWR (database writer) in Oracle releases 7.1 to 9.2 inclusive and onward. The notes here are particularly important if your system shows any of the following:

Latch contention for the 'cache buffers lru chain' or the "cache buffer chain" latch
Large "Average Write Queue" length
Lots of time spent waiting for "write complete waits"
Lots of time spent waiting for "free buffer waits" or "Buffer busy waits"

DETAILS

What is the buffer cache ?

Oracle keeps copies of database blocks in an area of the SGA known as the buffer cache. The cache may hold more than one copy of a block from different points in time, and may contain 'dirty' blocks - ie: blocks which have been updated but not yet flushed back to disk. The database writer/s (DBWR or DBWn processes) are responsible for writing dirty blocks to disk while any user session can read blocks into the cache.

All blocks in the buffer cache are on an LRU (least recently used) list - when a process needs a free buffer it scans from the LRU end of this list for a non-dirty buffer that it can use. The 'cache buffers lru chain' latch/es serialize operations on the LRU list/s.
for a description

Evaluating Buffer cache Activity

The buffer cache hit ratio measures how many times a required block was found in memory rather than having to execute a expensive read operation on disk to get the block.

Querying the V$SYSSTAT view is possible to obtain the statistical information useful for tuning the buffer cache. To calculate this ratio you must consider the Oracle version your are running on. It is recommended to have a hit ratio higher than 80% before increasing the buffer cache size.

See Note:33883.1 for a detailed explanation on how to calculate this ratio on each Oracle version.

Buffer Cache Advisory

In Oracle 9i it is possible to use the "Dynamic Buffer Cache Advisory feature" which enables and disables statistic gathering for predicting the behavior of the buffer cache with different cache sizes. Keep in mind that enabling this feature causes CPU and memory overheads in the system, hence it is recommended to use it only while the system is being monitored rather than keeping it enabled permanently.

V$DB_CACHE_ADVICE view is populated when the DB_CACHE_ADVICE parameter is enabled. This view shows the simulated miss rates for a range of potential buffer cache sizes. Each cache size simulated has its own row in this view, with the predicted physical I/O activity that would take place for that size. The DB_CACHE_ADVICE parameter is dynamic, so the advisory can be enabled and disabled dynamically to allow you to collect advisory data for specific workloads.

See Note:148511.1 - Oracle9i NF: Dynamic Buffer Cache Advisory, to learn how to configure and use this feature.

Tuning the Buffer cache:

Tuning the buffer cache is not limited to determining the hit ratio and increasing the DB_BLOCK_BUFFERS parameter
When tuning the buffer pool, avoid the use of additional buffers that contribute little or nothing to the cache hit ratio.
A common mistake is to continue increasing the value of DB_BLOCK_BUFFERS. Such increases have no effect if you are doing full table scans or other operations that do not use the buffer cache.

Before of doing this, is necessary to determine what latch or wait event is involved to then tune the database or the application accordingly. (See section "Most common Buffer Cache Waits and Latches" of this document for further details)

From the point of view of tuning the buffer cache there are 3 main things you can influence:

How often blocks are placed in the cache. Each block placed into the cache moves other blocks down the LRU
How quickly DBWR can clear dirty buffers from the cache Especially if the dirty buffers are causing waits
The size of the buffer cache Including the use of multiple buffer pools in Oracle8

In this article we list the things that can influence performance in these three areas.

Input to the cache
The following sections describe things that affect the rate at which buffers in the cache are used.
Execution Plans
By far the biggest factor influencing the rate at which buffers are used are the execution plans of the SQL statements issued by the application. If a statement performs 10000 block gets rather than 100 when executing a statement it will not only be slower itself but can affect other users. Eg: Consider a statement like:
        select * from employee where empid=1023 and gender='MALE';
If EMPLOYEE is a large table and this statement always uses the GENDER index rather than the EMPID index then you scan LOTS of blocks (from the GENDER index) causing them to be read into the cache (or moved up the LRU if already there). This pushes other blocks, including dirty blocks, further down the LRU list.
Sorting and Sort Parameters
Sorting within Oracle is described in detail in other articles. Here we focus on the impact on the buffer cache and DBWR. Sort operations use up to SORT_AREA_SIZE bytes of memory. If a sort can be performed within this amount of memory and the result set fits within SORT_AREA_RETAINED_SIZE then there is no need to start writing blocks to disk. If more sortspace is needed it is then necessary to use a TEMPORARY segment on disk to accommodate the intermediate sort runs and/or the sort results . There are 2 ways sort blocks can be sent to disk:

via the buffer cache (and then DBWR)

Using sort direct writes

The first of these is often the default, and in Oracle 7.1 is the only option available. Sort blocks are placed into the buffer cache thus aging all other blocks. When the sort blocks reach the LRU end of the least recently used list DBWR will flush them to disk. This can impact the performance for everyone else as private sort blocks (sort blocks are of no use to anyone else) can flood DBWR and age blocks in the cache more quickly.
In Oracle 7.2 the parameter SORT_DIRECT_WRITES can be set to cause processes to write sort blocks direct to disk avoiding the buffer cache. It is generally desirable to set SORT_DIRECT_WRITES to TRUE to ensure sort blocks do not impact the buffer cache.

Note 1: Setting SORT_DIRECT_WRITES=TRUE causes additional memory to be allocated for the session.

Note 2: In a lightly loaded environment SORT_DIRECT_WRITES may cause an individual job to take slightly longer ! Consider that when blocks are placed in the buffer cache it is acting almost like a memory extension to that process as "GETs" of sort blocks may be satisfied from the cache rather than from disk.

Since Oracle 8.1 the parameter SORT_DIRECT_WRITES became obsolete and direct writes are always used for sort operations that do not fit into the sort area size. See Note:135223.1

CACHED Tables and Full Table Scans
It is possible to mark tables within Oracle to be cached using the CACHE option of the CREATE or ALTER TABLE commands. Provided the table is within CACHE_SIZE_THRESHOLD then this causes the blocks of full scans of that table to be placed at the MRU (most recently used) end of the LRU aging other blocks in the cache. Inappropriate caching of tables can thus contribute to problems with the buffer cache - take care with this option.

The parameter CACHE_SIZE_THRESHOLD is obsoleted in 8.1. In Oracle8i the use of multiple buffer pools is intended to allow objects to be cached without using this parameter.

In Oracle9i onwards Oracle internally decides whether to cache blocks from table scans in the buffer cache (or not) based on the cache size and the number of blocks expected to be visited. More details on this can be read in Note:787373.1.

Data Clustering
Data clustering can affect the number of blocks that queries have to visit to find the result set. This can be as simple as the PCTFREE and PCTUSED attributes causing fewer rows per block than is optimal for the application.
A less obvious issue which can affect the IO rates is how well data is clustered physically. Eg: Assume that you frequently fetch rows from a table where a column is between two values via an index scan. If there are 100 rows in each index block then the two extremes are:

Each of the table rows is in a different physical block (100 blocks need to be read for each index block)

The table rows are all located in the few adjacent blocks (a handful of blocks need to be read for each index block)

Pre-sorting or reorganizing data can help to tackle this in severe situations. Adding an extra column to an index may eliminate data block access altogether if the queries use this extra column frequently in their WHERE clause.
Parallel Query
Some operations can be performed much faster using the parallel query features of Oracle7/8/9. These may benefit the buffer cache as parallel queries can perform direct reads avoiding the buffer cache - ie: They can read direct from disk into private process memory. Nb: This only occurs for certain access paths in a query.
Increase DBWR throughput
DBWR throughput is very platform and version specific so only general observations can be made here. The following items may influence the rate at which DBWR can clear blocks from the cache:

Physical disk attributes (stripe size, speed, layout etc..)

Raw devices versus File System Files

Spreading written data across more disks/files

Using Asynchronous writes where available

Using multiple database writers where asynch. IO is not available. DB_WRITERS in Oracle7, DBWR_IO_SLAVES in Oracle8/9.

Using multiple DB Writer gatherer processes in Oracle8 DB_WRITER_PROCESSES

Setting _DB_BLOCK_WRITE_BATCH to a large number. This parameter is obsoleted in 8.1.

Using the "Multiple buffer pools" feature in Oracle8 and Higher. See Note:135223.1

Note that there many port specific issues which affect the optimal setup for DBWR on a given platform. These range from choosing a DB_BLOCK_SIZE which is a multiple of the page size used by the operating system for IO operations to configuring Asynchronous IO correctly.

Buffer Cache Size and Configuration

Other things that affect the performance of the buffer cache include:

- DB_BLOCK_BUFFERS is the actual size of the buffer cache itself. Be careful when changing the size of the buffer cache as it affects memory requirements and may also affect whether a table is classed as a small table or a large table for caching during full table scans (small tables are placed at the MRU end of the LRU).
- In 9i, the parameter DB_CACHE_SIZE specifies the size of the DEFAULT buffer pool for buffers with the primary block size. 9i allows to configure buffer caches of different block size. DB_nK_CACHE_SIZE specifies the size of the cache for the nK buffers. The value of nk should be other than DB_BLOCK_SIZE.
- DB_BLOCK_LRU_LATCHES allows multiple LRU chains in the buffer cache from Oracle 7.3 onwards. This generally defaults to a sensible value but can be set explicitly in the init.ora file. On Oracle7 DB_BLOCK_LRU_LATCHES should be set to 2 * number of CPU's. On Oracle8 set it to 2 * number of CPU's * number of Buffer Pools configured.
- In Oracle8 it is possible to have up to 3 separate buffer caches. This is described in the standard documentation and is configured at object level using the BUFFER_POOL storage attribute.
- DB_BLOCK_LRU_STATISTICS should not be set to TRUE
- In Oracle9 this parameter cannot be manually set anymore (not even using _DB_BLOCK_LRU_STATISTICS). It is now hard coded to be 1/2 CPU_COUNT for each buffer cache (DEFAULT, KEEP, RECYCLE, and nK caches).

Most common Buffer Cache Waits and Latches

Latches:

Please refers to Note:22908.1 for a complete discussion on detecting and resolving latch contention.

Cache buffer chain latch:

- This latch is acquired when searching for data blocks cached in the SGA. Since the Buffer cache is a chain of blocks, each of this chains is protected by a child of this latch when needs to be scanned. Contention in this latch can be caused by very heavy access to a single block. This would require the application to be reviewed. As of Oracle8i there are many hash buckets to each latch and so there will be lots of buffers under each latch.

Cache buffers LRU chain latch:

- Processes need to get this latch when they need to move buffers based on the LRU block replacement policy in the buffer cache. Contention in this latch can be avoided implementing multiple buffer pools or increasing the number of LRU latches with the parameter DB_BLOCK_LRU_LATCHES (The default value is generally sufficient for most systems). SQL tuning can affect this as well by reducing data blocks visited by a query.

The behavior of this latch can be affected when extended statistics are enabled using the parameters DB_BLOCK_LRU_EXTENDED_STATISTICS and DB_BLOCK_LRU_STATISTICS (These parameters were removed in Oracle8i)

Wait events:

Buffer busy wait:

- This event is commonly caused when multiple session are trying to read the same block or multiple session waiting for a change to complete in the same block. Block contention corrective actions depends on the type of block involved. Query on V$WAITSTAT and X$KCBFWAIT to detect the hottest blocks breaking down by the type of block. To reduce buffer busy waits on:

data blocks:

- - - Reduce number of rows per block whether changing pctfree/pctused or reducing the DB_BLOCK_SIZE.
    - Check for 'right-hand-indexes' (indexes that get inserted into at the same point by many processes). You can use reverse key indexes to distribute the different information.

See Note:155971.1 for a detailed case-study on how to diagnose and resolve intensive random access performance problems.

segment header:

- - - Use freelists or increase of number of freelists.
    - Extent size too small can cause contention on the header when the table grows regularly. Consider increasing the extent size for the table.

undo header:

- - - Add more rollback segments to reduce the number of transaction per rollback segment.
    - Reduce the value of the parameter TRANSACTIONS_PER_ROLLBACK_SEGMENT

undo block:

- - - Consider making rollback segments larger in exclusive mode

Free buffer wait:

- This will mostly occur because of DBWR not writing out buffers fast enough. Please refers to the section "

Increase DBWR throughput

" to improve the speed of this process.

Oracle Bugs

The first diagnostic step to resolve this behavior is to apply the latest patchset available in your platform. Most of the buffer cache issues related to BUGs can be avoided by applying these patchsets. The following table summarize the most common BUGs related to buffer cache problems, possible workarounds and the patchset that fixes the problem.

BUG	Description	Workaround	Fixed
Bug:2079526	free buffer waits / LRU latch contention possible on write intensive systems.	Not available	8174, 9013, 9201
Bug:1967363	Increased index block gets / "cache buffer chains" contention in 8i/9i.	Not available	8173, 9013, 9201
Bug:2268098	If you shrink a buffer pool by a certain amount and later try to grow the shared pool by the same amount and the grow may fail with an "insufficient memory" error.	Not available	9014, 9201