1. Description of the problem:
A business-critical systems on new customers, when the pressure before the cook line testing, application concurrency can not achieve concurrent indicators on the front of the line and the response time of the indicator. When the measured pressure curve TPS very unstable, as follows:
2. analysis:
From the above knowledge may know:
ORACLE LGWR process in only one, since all processes are required before the commit process to help inform lgwr the previously generated in the log buffer to modify the process of recording (change vector) written to disk.
When a large number of processes to simultaneously please help write lgwr process, the queuing situation arises.
In highly concurrent online transaction OLTP system, lgwr process a single process has the potential to become a major bottleneck, especially in the case of online journals can not write IO performance problems.
Therefore, we need to check the status of lgwr process.
Where gv $ session write log was observed by two nodes RAC lgwr process, the results shown below:
can be seen:
Ø RAC (database cluster) two nodes, only one node log file parallel write wait appears, which represents lgwr process is waiting on the disk online log file for writing.
Ø In the case of state is waiting, the node 1 log file parallel waiting seq # is 35693, but seconds_in_wait reached 21 seconds. Simply put, the process of writing a lgwr IO needs 21 seconds!
This means that, when the pressure test all concurrent processes must take place to wait, to complete this process and other lgwr of IO, notify the LGWR process can continue to help change the brush vector log buffer, so the pressure curve from the TPS measurement point of view, is unstable , there has been a substantial attenuation.
At this point, we can be sure, IO subsystem in question
It is important to the optical fiber under investigation IO path, SAN switches, storage and performance given situation. ,
Taking into account the client side managed storage team / department may not admit evidence of slow database IO, and in order to allow the other side to increase the intensity of the investigation, far from the state to allow customers to issue the following command to view the IO multi-pathing software, and the results shown below :
1 node appears on the apparent IO ERROR, and continues to grow!
Continue to check node 2, node 2 is not found on any IO ERROR!
The gv $ session with only one process and other log file parallel write is written entirely consistent.
3. reasons
In the face of evidence of iron, the customer's storage team no longer struggle, but began in earnest in the investigation and one by one, finally satisfactorily resolved after replacing the fiber optic cable problem. The following is a fiber optic line again after replacing the pressure measured waiting for the event!
4. The issue is resolved
TPS measured pressure curve from the original wave form
Become a good profile as