Remember the fault handling process of Oracle session sharing mode

Brief description of the fault
The HIS database of the XXX Eighth People's Hospital was switched from the original old server to the new server at around 11:00 on July 13. After the switch was completed, the database version and the host operating system version remained unchanged. After running for about 20 days, starting from August 4th, the front desk operator responded that there was a delay in the response of the business program, and then the delay gradually increased. On August 5th, after careful diagnosis, the technical engineer found that there were a large number of disk reads, and adopted the method of increasing the memory of the SGA database to reduce the disk reads. However, after the adjustment, on the morning of August 6, the application was almost paralyzed and unable to operate, but the background database was very idle. After restarting the database business program, the freeze disappeared, but after ten or twenty minutes, the phenomenon reappeared. After eliminating the failure of the network and the business program itself, considering the urgency of the accident, restarting the database was taken to temporarily solve the problem. On the afternoon of August 6th, the engineer arrived at the site and checked carefully and confirmed that the fault was caused by the connection method. After adjusting some parameters, it returned to normal. After increasing the SGA, the response speed of the business program was significantly accelerated, and some report queries even increased by a dozen times.

Related knowledge points There are two ways for
front-end applications to connect to the database: exclusive mode and shared mode.
Exclusive mode, also known as exclusive mode, one client connection corresponds to one server process, one-to-one processing. In the shared mode, multiple client processes correspond to one server process, and there is a process scheduler on the server side to manage.
The following is a brief analysis of the working principle of the sharing mode: the
sharing mode is mainly divided into 6 steps (as shown in Figure 1-1):
1. The user initiates a connection request
2. The monitoring and monitoring request returns to the scheduling process address. At this time, the user process and the The idle scheduling process is directly connected
3. The user initiates an operation request (such as dml), ​​and the scheduling process is responsible for putting the request into the request queue
4. The idle shared service process receives the request from the request queue and processes it in the background
5. The shared service process is processed Request, put the result into the corresponding queue
6. The corresponding scheduling process returns the result to the front-end user from the corresponding queue

Figure: 1-1

Detailed diagnosis and analysis process of the problem There are two strange phenomena in
this fault:
First, when the background server and database are very idle, the foreground business system is very stuck or even paralyzed.
Second , every time the database is restarted, ten times It is normal within a few minutes, but it is very stuck after that.
From this, we can judge that when the business program is connected normally, the problem may occur that the request sent by the business program is not processed by the background service process in time, and the XXX eighth The front-end business program of the People's Hospital adopts a shared connection mode (multiple client processes correspond to one server process). Further check the relevant parameter configuration in the shared mode of the his database:
his database is configured to initialize 5 shared service processes (maximum 15) and 15 scheduling processes. Since the database has been restarted, the busyness of the specific process cannot be confirmed. We can only pass dBA_HIST_RESOURCE_LIMIT for related clues.
From the above query results, we can see that max_shared_servers has reached the threshold.

From the above query results and the awr report, we can roughly determine that the shared service process cannot process user requests in the request queue in time. This also explains very well that whenever the database is restarted, the shared service process still handles user requests in a timely manner with a small number of requests. With the maximum number of requests and the processing of large transactions (such as reports), the shared service process reaches Threshold, unable to respond in time, eventually leading to front desk business stuck.

Troubleshooting:
Method 1
Adjust the values ​​of max_shared_servers and shared_servers to increase the number of shared servers
Method
2 Change the client connection mode to exclusive mode
Solution:
Due to the huge workload of the C/S architecture to adjust the client connection mode, the values ​​of max_shared_servers and shared_servers are increased from the original 15 and 5 to 64 and 25, respectively.
1. Follow-up observation:
After the observation during the peak period in the morning of August 7, the response speed of the front-end business programs was within the normal range, and the response speed of some reports even increased by several times. Some indicators of the subsequent database peak period are as follows:

from Judging from the return results during the peak period, the service process is relatively idle and has been in a state of idle waiting for requests. The number of processes has also been maintained at the initial 25, and there is no upward trend.

Fault summary
From this fault analysis, with the increase of business volume and the execution of reports during peak periods, the number of service processes in the default shared mode cannot meet the current needs. The shared mode is mainly aimed at OLTP systems with high concurrency and small transactions. In other words, it is recommended to use the exclusive mode of connection in the case of sufficient memory today. The main suggestions are as follows:
1. Change the connection mode of the report client to exclusive mode
2. Increase the number of service processes in shared mode
3. Optimize some SQL to reduce the processing time of a single service check

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326361162&siteId=291194637