Remember an analysis of the performance problems of online products

There is a service that is called frequently in the production environment, and the performance is not very good. The performance is that on average 5 or 6 calls out of 500 are so slow that the client times out, which means the failure rate is one percent. This is especially noticeable in high concurrency situations.

This question came before me. Let's call this service A for now.

What else is there to do, start with the log.

First of all, there are many ways to filter out the calls related to A from the massive logs. I used the regular expression matching in vim. This question is not listed. I will have the opportunity to talk about regular expressions in the future. The filtered log records important information such as the call occurrence time, end time, call status, and SessionID of A.

Second, let’s analyze it, the service starts at 22:15:37 in the evening and ends at 22:20:01 in less than five minutes in total

Total calls:

491

average value:

8.185484

Standard deviation:

17.575592

Number of calls less than 5 seconds:

421

Number of calls greater than 5 seconds:

70

The number of failed calls

6

Minimum call time:

0.32004

maximum call time

64.9905

In less than 5 minutes, there were nearly 500 calls, of which 421 were called normally, and most of the calls were within 0 to 2 seconds. Among them, 6 failures are database failures, and the error messages are: 1) failed: [GetCustomer] failed: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached . 

 

Continuing to analyze the log, it can be concluded that the return time of the service call is in the order of 30 seconds and 60 seconds after the database error occurs, which is much higher than the normal level.

 
 
<!--[endif]-->

It is also found that the length of the return time of the service A has a significant relationship with the number of concurrency.

 

 The horizontal axis of the graph is the time axis, and the vertical axis has two indicators. The blue represents the execution time of the service, and the red represents the concurrency. As can be seen from the figure, the delay of the service time in the two stages is due to the sudden increase in the concurrency. caused by large.

 

Third, analyze the code

    Why does the service call time increase exponentially when the concurrency suddenly increases? Have to go back to code analysis.

 

    This code first closes the current Customer and reloads it according to the existing parameters. The C# lock mechanism is used here to limit the number of threads entering this code according to UserId and CustomerId. However, since the customers opened by different users are not necessarily the same, the number of threads accessing the code should not be limited, so a large number of database requests flood into the database connection pool. Both CloseCustomer and LoadCustomer are very time-consuming operations, resulting in many threads not being able to connect to the database, in a waiting state, or even timeout and fail to return.

 

in conclusion,

Based on the above analysis, I have come to two conclusions

1. Increase the size of the database connection pool

2. LoadCustomer is not only used by this service, but also used by other services, but this code is really a performance bottleneck and needs to be refactored.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326981195&siteId=291194637